Microsoft Phi-3-Mini
- 25 Apr 2024
Why is it in the News?
A few days after Meta unveiled its Llama 3 Large Language Model (LLM), Microsoft recently unveiled the latest version of its ‘lightweight’ AI model – the Phi-3-Mini.
What is Phi-3-Mini?
- Phi-3 refers to a series of language models developed by Microsoft, with Phi-3-mini being a notable addition.
- Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens, designed to be as powerful as larger models while being small enough to be deployed on a phone.
- Despite its compact size, Phi-3-mini boasts impressive performance, rivaling that of larger models such as ChatGPT-3.5.
- Furthermore, Phi-3-mini can be quantized to 4 bits, occupying approximately 1.8GB of memory, making it suitable for deployment on mobile devices.
- The model’s training data, a scaled-up version of the one used for Phi-2, is composed of heavily filtered web data and synthetic data, contributing to its remarkable capabilities.
Advantages and Challenges of Phi-3-Mini:
- Phi-3-mini exhibits strengths in its compact size, impressive performance, and the ability to be deployed on mobile devices.
- Its training with high-quality data and chat-finetuning contribute to its success. This allows it to rival larger models in language understanding and reasoning.
- However, the model is fundamentally limited by its size for certain tasks.
- It cannot store extensive “factual knowledge,” leading to lower performance on tasks such as TriviaQA.
- Nevertheless, efforts to resolve this weakness are underway, including augmentation with a search engine and exploring multilingual capabilities for Small Language Models.
- Safety: Phi-3-mini was developed with a strong emphasis on safety and responsible AI principles, in alignment with Microsoft’s guidelines.
- The approach to ensuring safety involved various measures such as safety alignment in post-training, red-teaming, and automated testing.
- It also involved evaluations across multiple categories of responsible AI (RAI) harm.
How is Phi-3-Mini Different From LLMs?
- Phi-3-mini is the Small Language Model (SLM). Simply, SLMs are more streamlined versions of large language models.
- When compared to Large Language Model (LLM), smaller AI models are also cost-effective to develop and operate, and they perform better on smaller devices like laptops and smartphones.
- SLMs are great for “resource-constrained environments including on-device and offline inference scenarios.
- Such models are good for scenarios where fast response times are critical, say for chatbots or virtual assistants.
- Moreover, they are ideal for cost-constrained use cases, particularly with simpler tasks.
- While LLMs are trained on massive general data, SLMs stand out with their specialisation.
- Through fine-tuning, SLMs can be customised for specific tasks and achieve accuracy and efficiency in doing them.
- Most SLMs undergo targeted training, demanding considerably less computing power and energy compared to LLMs.
- SLMs also differ when it comes to inference speed and latency.
- Their compact size allows for quicker processing and their cost makes them appealing to smaller organisations and research groups.