SustainableResponsible & Sustainable AI

Popularity, Bias, and Sovereignty: What the Compar:IA Rankings Reveal

The French observatory Compar:IA recently unveiled a user-generated ranking of the artificial intelligence models most popular among French-speaking internet users.1 Presented as a collective evaluation effort, this initiative has sparked widespread interest… but also significant criticism. While it reflects the public’s growing curiosity about generative AI, it also raises a key question: can we truly measure a model’s quality based on user votes? This ranking, which places Mistral Medium 3.1 at the top ahead of models from Google and Alibaba, stands out more for its participatory nature than for its methodological rigor.

A ranking based on preferences, not performance

Unlike technical benchmarks (such as MMLU or GSM8K), which measure the consistency, accuracy, or robustness of models,2 the Compar:IA ranking is based on subjective votes. Each user is presented with two anonymous responses from two different AIs and then chooses the one they find clearest or most convincing. While this method has the merit of involving the general public, it has several limitations:

  • It focuses more on the models' linguistic fluency and writing style than on their accuracy.
  • It favors "consumer-grade" AI, which is capable of generating more user-friendly responses, at the expense of more specialized models.
  • It does not allow for verification of the factual accuracy of the information generated or its scientific relevance.
    In other words, it is a barometer of perception, not a tool for assessing the cognitive or analytical capabilities of AI.

The results: Mistral in the lead, but the podium should be taken with a grain of salt

Compar:IA's 2025 rankings crown the French Mistral Medium 3.1 model, followed by Google's Gemini 2.5 Flash and Gemini 2.0 Flash models.3 Trailing behind are Alibaba’s Qwen 3 Max and DeepSeek v3 from the Chinese company DeepSeek.
While these results are symbolically encouraging for the French and European scenes, they do not necessarily reflect the reality of technical performance. Indeed:

  • Although Google and OpenAI's models top international rankings, they rank in the middle of the pack here.
  • GPT-OSS-120B (OpenAI) ranks only 7th, and Claude 4.5 Sonnet (Anthropic) is outside the top 10.
  • The performance metrics evaluated by Compar:IA are heavily influenced by the language of the prompts and the cultural context of the voters, who are predominantly French-speaking.
    The results should therefore be interpreted with caution: they reflect linguistic and cultural identity considerations rather than a scientific ranking of the models.

A participatory methodology… but a biased one

The “blind” voting system implemented by Compar:IA was designed to reduce name-recognition bias. However, several limitations remain. The tests do not account for the factual accuracy of responses or measure generative biases (ideological, cultural, or linguistic). AI researchers point out that a model can be “likable” without being effective: a persuasive or empathetic AI is not necessarily accurate.
Linguistic bias is also significant: models trained on French-language corpora, such as those from Mistral, have a clear advantage. Furthermore, emotional perception often influences preference: a warmer tone, a familiar turn of phrase, or a narrative style can be perceived as indicators of quality.
In short, Compar:IA helps us understand the public’s expectations regarding generative AI, but it does not measure their actual performance. 4

Why AI-generated rankings provide a more reliable assessment

On the aivancity blog, the General AI rankings are based on a radically different approach: scientific, comparative, and measurable.5 These analyses evaluate models based on objective criteria such as:

  • the accuracy and consistency of responses on benchmark tests;
  • the ability to reason logically and mathematically;
  • robustness in the face of prompt variations or complex contexts;
  • multimodal versatility (text, code, images, audio);
  • and energy efficiency in architectural design.

These rigorous evaluations help identify models that are truly effective and suitable for professional or academic use. Unlike Compar:IA, they do not aim to measure popularity, but rather to rank models based on facts, not perceptions.

All General AI rankings can be viewed in the AI Tools, which regularly compares the latest models based on technical, ethical, and energy-related criteria.

Toward more transparent and complementary indicators

The success of the Compar:IA rankings highlights a broader phenomenon: the general public’s growing desire to understand and compare AI systems. But it also underscores the need for digital literacy training regarding evaluation tools.
Rather than pitting a citizen-driven approach against scientific expertise, it would be more effective to integrate the two. A participatory evaluation can enrich the perception of use, while a technical analysis ensures the reliability and reproducibility of results.
Ultimately, European institutions, through the AI Act, will encourage the creation of hybrid indicators: trust barometers that integrate technical performance, sustainability, and model transparency.6

Conclusion: Popularity does not equal performance

The Compar:IA rankings reflect a genuine interest in artificial intelligence and a collective desire to understand its applications. However, in the absence of a scientific methodology, they cannot be considered a reliable measurement tool. They provide a snapshot of user preferences, not a ranking of the best AI systems.

For a rigorous analysis of models, aivancity’s Generative AI rankings remain the most comprehensive and unbiased benchmark available in the French-speaking market. By combining academic expertise, technical criteria, and ethical considerations, they go beyond mere popularity to provide a truly informed view of artificial intelligence performance.

For further reading on the topics of evaluation and sovereignty in the field of artificial intelligence:

1. Compar:IA. (2025). Citizen Observatory on Artificial Intelligence – 2025 Rankings. 2. Hendrycks, D. et al. (2021). Measuring Massive Multitask Language Understanding (MMLU). arXiv.

3. Le Journal du Net. (2025). Mistral beats Google and OpenAI in the Compar:IA rankings. 4. Ministry of Culture. (2025). Note on the methodology for participatory evaluations of AI models.
https://www.culture.gouv.fr

5. aivancity. (2025). General AI Rankings – AI Tools Category.
https://www.aivancity.ai/blog/category/outils-ia/

6. European Commission. (2024). AI Act – European Regulation on Artificial Intelligence.
https://digital-strategy.ec.europa.eu

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

We don't send spam! Please see our privacy policy for more information.

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

We don't send spam! Please see our privacy policy for more information.

Related posts
Technological Advances in AIResponsible & Sustainable AI

Artificial Intelligence Enters the Industrial Phase: Red Hat Unveils Its Open-Source Inference Server

Red Hat, an IBM subsidiary and a global leader in open-source software, recently announced the launch of Red Hat AI Inference Server, a platform designed to deploy artificial intelligence models at scale in hybrid or multi-cloud enterprise environments.
The AI Clinic

Would you like to submit a project to the AI Clinic and work with our students?

Leave a comment

Your email address will not be published. Required fields are marked with *