Nvidia Corp. released a lightweight language model, Mistral-NeMo-Minitron 8B, which can outperform comparably sized neural networks across various tasks. The model is available on Hugging Face under an open-source license.
Mistral-NeMo-Minitron 8B is a scaled-down version of Nvidia’s Mistral NeMo 12B, developed in collaboration with Mistral AI SAS. Nvidia employed machine learning techniques like pruning and distillation to create this smaller model. Pruning reduces a model’s hardware requirements by removing unnecessary components, while distillation transfers an AI’s knowledge to a more hardware-efficient neural network.
As a result, Mistral-NeMo-Minitron 8B has 4 billion fewer parameters than the original.
Mistral’s hardware-efficient AI models
These techniques not only enhance the efficiency of the model but also ensure it can run on an Nvidia RTX-powered workstation while excelling at multiple benchmarks for AI-powered chatbots, virtual assistants, content generators, and educational tools, according to Nvidia executive Kari Briski.
Additionally, three new language models with hardware efficiency in mind have been released. The most compact among them is Phi-3.5-mini-instruct, featuring 3.8 billion parameters and capable of processing prompts with up to 128,000 tokens worth of data. According to a benchmark test, Phi-3.5-mini-instruct outperforms models like Llama 3.1 8B and Mistral 7B in certain tasks.
Also released is Phi-3.5-vision-instruct, which can perform image analysis tasks such as explaining charts, and Phi-3.5-MoE-instruct, a larger model with 60.8 billion parameters. The latter model’s design activates only a tenth of its parameters when processing a prompt, significantly reducing hardware requirements for inference. These releases demonstrate an ongoing trend towards more efficient and accessible AI models, capable of running on hardware with limited capacity while still delivering high-quality outputs.