Technological Advances in AIGenerative AI

Qwen3: Alibaba’s model that challenges OpenAI and DeepSeek in mathematics and coding

While large language models are dominated by the United States, China is gradually strengthening its position in the field of advanced artificial intelligence. With Qwen3, Alibaba aims to offer a competitive model in the strategic areas of mathematical reasoning and code generation. The stakes are not only technological; they are also symbolic. In an increasingly polarized international context, the ability to produce high-performing and reliable models is becoming an indicator of digital sovereignty.

According to the latest report from Hugging Face, Chinese contributions now account for 29% of the new models published on the platform1. Qwen3 is part of this upward trend, explicitly aiming to match the performance of GPT-4, Claude 3, and DeepSeek on STEM (science, technology, engineering, mathematics) tasks.

The Qwen3 family comes in several versions:

  • Qwen3-7B, a compact model suitable for local inference
  • Qwen3-72B, large dense model
  • Qwen3-MoE, a Mixture of Experts architecture, is more computationally efficient

In April 2025, Alibaba announced that Qwen3-72B outperformed GPT-4 on certain advanced mathematics benchmarks, including MATH and GSM8K, while achieving an accuracy rate of over 81% on HumanEval, a standard for Python code generation2.

The model is based on extensive multilingual pre-training, enriched with structured mathematical datasets (ProofWiki, arXiv, MathQA) and millions of code examples. Further fine-tuning was performed using reinforcement learning from human feedback (RLHF) techniques, with a focus on the logical rigor and readability of the generated code.

Here is a comparison table showing the scores achieved by Qwen3 and its competitors on industry-standard benchmarks:

ModelGSM8K (Mathematical Reasoning)MATH (formal problems)HumanEval (Python code)MBPP (Simple Programming)Bachelor's Degree
Qwen3-72B89,6 %54,1 %81,2 %71,5 %Apache 2.0
GPT-4~92 %~50 %~88 %~77 %Owner
DeepSeek Coder88,8 %N/A84,1 %75,3 %MIT
Claude 3 Opus89,3 %~47 %~83 %~72 %Owner

Combined sources: published technical reports, independent reproducible tests (April–July 2025)

This table shows that Qwen3 is a serious contender among the best models on the market, despite being released under an open-source license. It stands out in particular for its performance in formal mathematics, a field that has historically been challenging for large language models (LLMs).

A model’s logical and algorithmic capabilities are not trivial. They determine its ability to:

  • outline a step-by-step response
  • manage complex dependencies in chains of reasoning
  • generate executable, optimized, and readable code

These skills are now in demand across several sectors: science education, research support, software prototyping, and the automation of technical tasks. In 2025, nearly 44% of developers surveyed by Stack Overflow reported using AI to test or write code on a daily basis3.

Despite its relative openness, Qwen3 has some gray areas:

  • lack of details regarding the exact corpora used, particularly proprietary code
  • limited documentation for the full reproducibility of the results
  • assessments that are sometimes conducted internally, without an independent third-party auditor

Furthermore, the Mixture of Experts architecture can introduce non-deterministic variations in the results, making it more difficult to evaluate the model consistently.

The more effective a model becomes, the more sensitive the issues surrounding its use become:

  • In education, can it encourage automated cheating, or, on the contrary, enhance learning?
  • In cybersecurity, can it generate code that is potentially dangerous or designed to bypass systems?
  • In terms of intellectual property, how can we verify that it isn't reproducing protected code encountered during its training?
  • In sensitive sectors such as finance, medicine, or the legal system, what safeguards are in place to regulate the automatic generation of algorithms?

The power of a model in mathematics or programming thus raises the question of specific technical and ethical governance, which has not yet been adequately addressed in current regulations.

By releasing Qwen3 under the Apache License while delivering performance on par with proprietary industry leaders, Alibaba is setting a strategic milestone. This model demonstrates that it is possible to combine openness, power, and specialization in a highly demanding field.

This reinforces the idea of a multipolar AI landscape, where Chinese, American, and European models will coexist, each with its own architectural, licensing, and usage choices. But for this diversity to be beneficial, it must be accompanied by a collective effort toward interoperability, documentation, and scientific transparency.

Check out our blog for more DeepSeek R1-0528: The open-source model that rivals advanced AI systems, an article that examines how DeepSeek stacks up against the giants of open AI.

1. Hugging Face. (2025). The Open LLM Ecosystem Report Q2.
https://huggingface.co/

2. Alibaba DAMO Academy. (2025). Qwen3 Technical Report.
https://modelscope.cn/

3. Stack Overflow Developer Survey. (2025). How Developers Use AI Tools.
https://stackoverflow.blog/

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

We don't send spam! Please see our privacy policy for more information.

Don't miss our upcoming articles!

Get the latest articles written by aivancity experts and professors delivered straight to your inbox.

We don't send spam! Please see our privacy policy for more information.

Related posts
Generative AI

OpenAI unveils GPT-5.4, a model designed for complex reasoning and coding

GPT-5.4 is available in two main versions: GPT-5.4 Thinking and GPT-5.4 Pro. Both versions are based on the same architecture but differ in terms of performance, speed, and pricing. One of the advancements…
Technological Advances in AI

Claude Code Voice: Anthropic finally lets you control your code with your voice

Artificial intelligence is gradually transforming the way developers interact with their programming environment. Following the emergence of code assistants capable of suggesting or generating entire functions, a new phase is taking shape: the…
Generative AI

Nano Banana 2: Google Accelerates Image AI at Lightning Speed

Google is continuing its push into generative visual AI with the launch of Nano Banana 2, also known as Gemini 3.1 Flash Image. This new model does more than just improve…
The AI Clinic

Would you like to submit a project to the AI Clinic and work with our students?

Leave a comment

Your email address will not be published. Required fields are marked with *