Microsoft Phi-3-Mini

  • 25 Apr 2024

Why is it in the News?

A few days after Meta unveiled its Llama 3 Large Language Model (LLM), Microsoft recently unveiled the latest version of its ‘lightweight’ AI model – the Phi-3-Mini.

What is Phi-3-Mini?

  • Phi-3 refers to a series of language models developed by Microsoft, with Phi-3-mini being a notable addition.
  • Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens, designed to be as powerful as larger models while being small enough to be deployed on a phone.
  • Despite its compact size, Phi-3-mini boasts impressive performance, rivaling that of larger models such as ChatGPT-3.5.
  • Furthermore, Phi-3-mini can be quantized to 4 bits, occupying approximately 1.8GB of memory, making it suitable for deployment on mobile devices.
  • The model’s training data, a scaled-up version of the one used for Phi-2, is composed of heavily filtered web data and synthetic data, contributing to its remarkable capabilities.

Advantages and Challenges of Phi-3-Mini:

  • Phi-3-mini exhibits strengths in its compact size, impressive performance, and the ability to be deployed on mobile devices.
    • Its training with high-quality data and chat-finetuning contribute to its success. This allows it to rival larger models in language understanding and reasoning.
  • However, the model is fundamentally limited by its size for certain tasks.
    • It cannot store extensive “factual knowledge,” leading to lower performance on tasks such as TriviaQA.
    • Nevertheless, efforts to resolve this weakness are underway, including augmentation with a search engine and exploring multilingual capabilities for Small Language Models.
  • Safety: Phi-3-mini was developed with a strong emphasis on safety and responsible AI principles, in alignment with Microsoft’s guidelines.
    • The approach to ensuring safety involved various measures such as safety alignment in post-training, red-teaming, and automated testing.
    • It also involved evaluations across multiple categories of responsible AI (RAI) harm.

How is Phi-3-Mini Different From LLMs?

  • Phi-3-mini is the Small Language Model (SLM). Simply, SLMs are more streamlined versions of large language models.
  • When compared to Large Language Model (LLM), smaller AI models are also cost-effective to develop and operate, and they perform better on smaller devices like laptops and smartphones.
  • SLMs are great for “resource-constrained environments including on-device and offline inference scenarios.
  • Such models are good for scenarios where fast response times are critical, say for chatbots or virtual assistants.
  • Moreover, they are ideal for cost-constrained use cases, particularly with simpler tasks.
  • While LLMs are trained on massive general data, SLMs stand out with their specialisation.
  • Through fine-tuning, SLMs can be customised for specific tasks and achieve accuracy and efficiency in doing them.
  • Most SLMs undergo targeted training, demanding considerably less computing power and energy compared to LLMs.
  • SLMs also differ when it comes to inference speed and latency.
  • Their compact size allows for quicker processing and their cost makes them appealing to smaller organisations and research groups.