I'm a founder of an AI company and this is precisely how we operate. We have our own pre-trained LLM'm and we bring these to the customer's data , deep behind their corporate firewall and fine tune it on data that will never be available publicly. Although we are a young company (barely two years old) we've had some success and I'm more convinced than ever that to the extent generative AI "revolutionizes" the enterprise, it will be through these use cases where smaller models are fine tuned on proprietary data. The foundational LLM's that are all the rage now are essentially trained on the same data - i.e. the internet - and while they have impressive capabilities for generating a wide range of responses, they are generally pretty terrible if you show them data that doesn't look like something they've seen before - and, on top of this, most of the time you can't do this anyway because enterprises do not want their sensitive data leaving their security perimeter to go to a cloud based LLM.

Sep 14
at
4:13 PM