Mistral 7B
Mistral AI team is proud to release Mistral 7B, the most powerful language model for its size to date.
Mistral 7B Mistral AI has released Mistral 7B, a 7.3B parameter language model that demonstrates superior performance across various benchmarks compared to larger models like Llama 2 13B and approaches CodeLlama 7B performance on code. It utilizes Grouped-query attention (GQA) and Sliding Window Attention (SWA) for faster inference and handling of longer sequences at a reduced cost. The model is released under the Apache 2.0 license, allowing for unrestricted use, including local deployment and integration with major cloud platforms.
- Mistral 7B is a 7.3B parameter language model.
- It outperforms Llama 2 13B on all benchmarks and Llama 1 34B on many.
- It approaches CodeLlama 7B performance on code while maintaining English task proficiency.
- Features Grouped-query attention (GQA) for faster inference.
- Uses Sliding Window Attention (SWA) for efficient handling of longer sequences.
- Released under the Apache 2.0 license for unrestricted use.
- Can be deployed locally, on cloud platforms (AWS, GCP, Azure), and on HuggingFace.
- A fine-tuned chat version, Mistral 7B Instruct, outperforms Llama 2 13B chat.
- Mistral 7B performs equivalently to a Llama 2 model over 3x its size in terms of cost/performance for reasoning, comprehension, and STEM.
- Sliding window attention results in a linear compute cost of O(sliding_window.seq_len) and reduces cache memory usage. Continue reading https://foxvector.com/articles/cb0469f0-93eb-4be3-b9ca-a52996c60899
No comments yet.
Write a comment