Mistral 7B

On 10/10/2023, researchers from the newly formed Mistral.ai introduced a paper titled “Mistral 7B”, which described a 7B language model.

The model leverages grouped-query attention and sliding window attention for improved inference speed and reduced memory consumption.
The authors evaluated on a wide variety of tasks that can be categorized as: commonsense reasoning, world knowledge, reading comprehension, math, code, popular aggregated results.
The following Figure from the paper shows that Mistral 7B outperforms Llama-2-7B and Llama-2-13B.

_config.yml

Unfortunately, the paper provides no details on the datasets used to pretrain the Mistral model.

Written on December 3, 2023