Session overview
A practical session on fine-tuning open-source language models on proprietary data, including the dataset preparation, training, and evaluation workflow we use in client engagements.
What we cover
- When fine-tuning is the right answer. Decision framework for choosing fine-tuning vs RAG vs prompting.
- Base model selection. Llama, Qwen, Mistral, DeepSeek — practical considerations for picking a starting point.
- Dataset preparation. Curation, formatting, edge cases, the 80% of the work that's least glamorous.
- LoRA and QLoRA training. Hyperparameter choices, training duration, hardware requirements.
- Evaluation harnesses. Building the test set, defining rubrics, automated and LLM-as-judge scoring.
- Common failure modes. Catastrophic forgetting, memorisation, reward hacking, distribution mismatch.
Live demonstration
The session includes a live walkthrough of fine-tuning a 7B model on a small proprietary dataset — from raw data through to evaluation. Total elapsed wall-clock time visible to the audience.
Reference materials
The workflow demonstrated is documented in our fine-tuning guide. Background context on choosing between fine-tuning and RAG is in our decision tree blog post.
Q&A topics
- Sizing the dataset for a target capability.
- Continuous fine-tuning vs episodic retraining.
- Comparing fine-tuned open-source models with frontier APIs.
- Cost optimisation strategies.
Recording
Contact us to request the recording.