Site icon aivancity blog

Qwen3: Alibaba’s model that challenges OpenAI and DeepSeek in mathematics and coding

While large language models are dominated by the United States, China is gradually strengthening its position in the field of advanced artificial intelligence. With Qwen3, Alibaba aims to offer a competitive model in the strategic areas of mathematical reasoning and code generation. The stakes are not only technological; they are also symbolic. In an increasingly polarized international context, the ability to produce high-performing and reliable models is becoming an indicator of digital sovereignty.

According to the latest report from Hugging Face, Chinese contributions now account for 29% of the new models published on the platform1. Qwen3 is part of this upward trend, explicitly aiming to match the performance of GPT-4, Claude 3, and DeepSeek on STEM (science, technology, engineering, mathematics) tasks.

The Qwen3 family comes in several versions:

In April 2025, Alibaba announced that Qwen3-72B outperformed GPT-4 on certain advanced mathematics benchmarks, including MATH and GSM8K, while achieving an accuracy rate of over 81% on HumanEval, a standard for Python code generation2.

The model is based on extensive multilingual pre-training, enriched with structured mathematical datasets (ProofWiki, arXiv, MathQA) and millions of code examples. Further fine-tuning was performed using reinforcement learning from human feedback (RLHF) techniques, with a focus on the logical rigor and readability of the generated code.

Here is a comparison table showing the scores achieved by Qwen3 and its competitors on industry-standard benchmarks:

ModelGSM8K (Mathematical Reasoning)MATH (formal problems)HumanEval (Python code)MBPP (Simple Programming)Bachelor's Degree
Qwen3-72B89,6 %54,1 %81,2 %71,5 %Apache 2.0
GPT-4~92 %~50 %~88 %~77 %Owner
DeepSeek Coder88,8 %N/A84,1 %75,3 %MIT
Claude 3 Opus89,3 %~47 %~83 %~72 %Owner

Combined sources: published technical reports, independent reproducible tests (April–July 2025)

This table shows that Qwen3 is a serious contender among the best models on the market, despite being released under an open-source license. It stands out in particular for its performance in formal mathematics, a field that has historically been challenging for large language models (LLMs).

A model’s logical and algorithmic capabilities are not trivial. They determine its ability to:

These skills are now in demand across several sectors: science education, research support, software prototyping, and the automation of technical tasks. In 2025, nearly 44% of developers surveyed by Stack Overflow reported using AI to test or write code on a daily basis3.

Despite its relative openness, Qwen3 has some gray areas:

Furthermore, the Mixture of Experts architecture can introduce non-deterministic variations in the results, making it more difficult to evaluate the model consistently.

The more effective a model becomes, the more sensitive the issues surrounding its use become:

The power of a model in mathematics or programming thus raises the question of specific technical and ethical governance, which has not yet been adequately addressed in current regulations.

By releasing Qwen3 under the Apache License while delivering performance on par with proprietary industry leaders, Alibaba is setting a strategic milestone. This model demonstrates that it is possible to combine openness, power, and specialization in a highly demanding field.

This reinforces the idea of a multipolar AI landscape, where Chinese, American, and European models will coexist, each with its own architectural, licensing, and usage choices. But for this diversity to be beneficial, it must be accompanied by a collective effort toward interoperability, documentation, and scientific transparency.

Check out our blog for more DeepSeek R1-0528: The open-source model that rivals advanced AI systems, an article that examines how DeepSeek stacks up against the giants of open AI.

1. Hugging Face. (2025). The Open LLM Ecosystem Report Q2.
https://huggingface.co/

2. Alibaba DAMO Academy. (2025). Qwen3 Technical Report.
https://modelscope.cn/

3. Stack Overflow Developer Survey. (2025). How Developers Use AI Tools.
https://stackoverflow.blog/

Exit mobile version