Mixtral of Experts

Mistral AI continues its mission to deliver the best open models to the developer community. Moving forward in AI requires taking new technological turns beyond reusing well-known architectures and training paradigms. Most importantly, it requires making the community benefit from original models to foster new inventions and usages.
Mixtral of Experts

Mixtral of Experts Mistral AI has released Mixtral 8x7B, a high-performance sparse mixture of experts (SMoE) model with open weights, licensed under Apache 2.0. This model significantly outperforms Llama 2 70B and matches or exceeds GPT-3.5 on many benchmarks, offering a superior cost-performance ratio with significantly faster inference. Mixtral 8x7B is capable of handling a 32k token context, supports multiple languages, excels at code generation, and can be fine-tuned for instruction following, achieving a score of 8.3 on MT-Bench.

  • Mixtral 8x7B is a new open-weight sparse mixture of experts (SMoE) model released by Mistral AI under the Apache 2.0 license.
  • It outperforms Llama 2 70B on most benchmarks and matches or exceeds GPT-3.5 performance, with 6x faster inference.
  • The model has 46.7B total parameters but uses only 12.9B parameters per token, balancing performance with efficiency.
  • Capabilities include handling 32k tokens, supporting English, French, Italian, German, and Spanish, strong code generation, and achieving an 8.3 score on MT-Bench when fine-tuned for instruction following.
  • Mixtral shows less bias on the BBQ benchmark and more positive sentiments on BOLD compared to Llama 2.
  • Mistral AI is also offering Mixtral 8x7B Instruct, optimized for instruction following, and the model can be deployed using an open-source stack with vLLM and Skypilot. Continue reading https://foxvector.com/articles/d686156a-07bf-4a83-a448-8ae45b3ad423
Write a comment
No comments yet.