Alberto Romero (@thealgorithmicbridge): "I had an answer ready until you said "offline and self-hosted" and then "run on a consumer PC." That really makes it much harder. As you surely know models are getting much cheaper, but it's much harder to achieve the efficiency that Anthropic, Google, and OpenAI are obtaining w…"

The app for independent voices

Jun 24, 2024

I had an answer ready until you said "offline and self-hosted" and then "run on a consumer PC." That really makes it much harder. As you surely know models are getting much cheaper, but it's much harder to achieve the efficiency that Anthropic, Google, and OpenAI are obtaining with models that you self-host. I don't think you can make Llama run as efficiently.

First, the API companies compete with one another so they're incentivized to go as low as possible, even *too low* while they try to make up for the costs somewhere else or because they have someone else's money (OpenAI has Microsoft's and Anthropic Google and Amazon's). Second, Meta isn't worried about making Llama inference efficient because they're merely training it for you to download it and do whatever. But those aren't finetuned or adapted to your use case. You have to do that yourself.

Anyway, if you're not willing to relax some of your requisites, I'd say Meta's models are the way to go. Mistral also. The thing is your use case (knowledge management on large documents) isn't well-handled by models that are too small and could run well on a consumer PC (even if it's high end). I'm sorry I'm not able to give you a satisfying answer!!

Jun 24, 2024

6:43 PM

The app for independent voices

Log in or sign up