Small language models

Technology Radar

Last updated : Apr 02, 2025

Apr 2025

Trial

The recent announcement of DeepSeek R1 is a great example of why small language models (SLMs) continue to be interesting. The full-size R1 has 671 billion parameters and requires about 1,342 GB of VRAM in order to run, only achievable using a "mini cluster" of eight state-of-the-art NVIDIA GPUs. But DeepSeek is also available "distilled" into Qwen and Llama — smaller, open-weight models — effectively transferring its abilities and allowing it to be run on much more modest hardware. Though the model gives up some performance at those smaller sizes, it still allows a huge leap in performance over previous SLMs. The SLM space continues to innovate elsewhere, too. Since the last Radar, Meta introduced Llama 3.2 at 1B and 3B sizes, Microsoft released Phi-4, offering high-quality results with a 14B model, and Google released PaliGemma 2, a vision-language model at 3B, 10B and 28B sizes. These are just a few of the models currently being released at smaller sizes and definitely an important trend to continue to watch.

Oct 2024

Trial

Large language models (LLMs) have proven useful in many areas of applications, but the fact that they are large can be a source of problems: responding to a prompt requires a lot of compute resources, making queries slow and expensive; the models are proprietary and so large that they must be hosted in a cloud by a third party, which can be problematic for sensitive data; and training a model is prohibitively expensive in most cases. The last issue can be addressed with the RAG pattern, which side-steps the need to train and fine-tune foundational models, but cost and privacy concerns often remain. In response, we’re now seeing growing interest in small language models (SLMs). In comparison to their more popular siblings, they have fewer weights and less precision, usually between 3.5 billion and 10 billion parameters. Recent research suggests that, in the right context, when set up correctly, SLMs can perform as well as or even outperform LLMs. And their size makes it possible to run them on edge devices. We've previously mentioned Google's Gemini Nano, but the landscape is evolving quickly, with Microsoft introducing its Phi-3 series, for example.

Published : Oct 23, 2024