Cohere launches open weights, multilingual AI model Aya 23

Join us in returning to NYC on June 5th to collaborate with executive leaders in exploring comprehensive methods for auditing AI models regarding bias, performance, and ethical compliance across diverse organizations. Find out how you can attend here.

Today, Cohere for AI (C4AI), the non-profit research arm of Canadian enterprise AI startup Cohere, announced the open weights release of Aya 23, a new family of state-of-the-art multilingual language models.

Available in 8B and 35B parameter variants (parameters refer to the strength of connections between artificial neurons in an AI model, with more generally denoting a more powerful and capable model). Aya 23 comes as the latest work under C4AI’s Aya initiative that aims to deliver strong multilingual capabilities.

Notably, C4AI has open sourced Aya 23’s weights. These are a type of parameter within an LLM, and are ultimately numbers within an AI model’s underlying neural network that allow it determine how to handle data inputs and what to output. By having access to them in an open release like this, third-party researchers can fine tune to the model to fit their individual needs. At the same time, it falls short of a full open source release — wherein the training data and underlying architecture would also be released. But it is still extremely permissive and flexible, on the order of Meta’s Llama models.

Aya 23 builds on the original model Aya 101 and serves 23 languages. This includes Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian and Vietnamese

VB Event

The AI Impact Tour: The AI Audit

Join us as we return to NYC on June 5th to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Request an invite

According to Cohere for AI, the models expand state-of-the-art language modeling capabilities to nearly half of the world’s population and outperform not just Aya 101, but also other open models like Google’s Gemma and Mistral’s various open source models, with higher-quality responses across the languages it covers.

Breaking language barriers with Aya

While large language models (LLM) have thrived over the last few years, most of the work in the field has been English-centric.

As a result, despite being highly capable, most models tend to perform poorly outside of a handful of languages – particularly when dealing with low-resource ones. 

According to C4AI researchers, the problem was two-fold. First, there was a lack of robust multilingual pre-trained models. And secondly, there was not enough instruction-style training data covering a diverse set of languages.

To address this, the non-profit launched the Aya initiative with over 3,000 independent researchers from 119 countries. The group initially created the Aya Collection, a massive multilingual instruction-style dataset consisting of 513 million instances of prompts and completions, and then used it to develop an instruction fine-tuned LLM covering 101 languages.

The model, Aya 101, was released as an open source LLM back in February 2024, marking a significant step forward in massively multilingual language modeling with support for 101 different languages.

But it was built upon mT5, which has now become outdated in terms of knowledge and performance.

Secondly, it was designed with a focus on breath – or covering as many languages as possible. This shared the model’s capacity so widely that its performance on a given language lagged.

Now, with the release of Aya 23, Cohere for AI is moving to balance for breadth and depth. Essentially, the models, which are based on Cohere’s Command series of models and the Aya Collection, focus on allocating more capacity to fewer – 23 – languages, thereby improving generation across them. 

When evaluated, the models performed better than Aya 101 for the languages it covers as well as widely used models like Gemma, Mistral and Mixtral on an extensive range of discriminative and generative tasks.

“We note that relative to Aya 101, Aya 23 improves on discriminative tasks by up to 14%, generative tasks by up to 20%, and multilingual MMLU by up to 41.6%. Furthermore, Aya 23 achieves a 6.6x increase in multilingual mathematical reasoning compared to Aya 101. Across Aya 101, Mistral, and Gemma, we report a mix of human annotators and LLM-as-a-judge comparisons. Across all comparisons, the Aya-23-8B and Aya-23-35B are consistently preferred,” the researchers wrote in the technical paper detailing the new models.

Available for use right away

With this work, Cohere for AI has taken another step towards high-performing multilingual models.

To provide access to this research, the company has released the open weights for both the 8B and 35B models on Hugging Face under the Creative Commons attribution-noncommercial 4.0 international public license.

“By releasing the weights of the Aya 23 model family, we hope to and empower researchers and practitioners to advance multilingual models and applications,” the researchers added. Notably, users can even try out the new models on the Cohere Playground for free.

Source link