In Apr 2024 what is the most efficient way to fine-tune an LLM?
In particular we are trying to understand performance vs. cost trade-offs. We don't have a budget to train from scratch.
We are working with a proprietary data set on the order of 100M tokens and are looking to fine-tune a general purpose language model and also create task-specific models based on the same corpus.
Any help would be appreciated!
A single A100 or H100 with 80GB VRAM can fine tune 70B open models (and obviously scaling out to many nodes/GPUs is faster, or can use much cheaper GPUs for fine tuning smaller models.)
The localllama Reddit sub at https://www.reddit.com/r/LocalLLaMA/ is also an awesome community for the GPU poor :)