Back

Satyam Kumar

"Turning visions int... • 18h

ANT Group Uses Domestic Chips to train AI models and cut COSTS Ant Group is relying on Chinese-made semiconductors to train artificial intelligence models to reduce costs and lessen dependence on restricted US technology, according to people familiar with the matter. The Alibaba-owned company has used chips from domestic suppliers, including those tied to its parent, Alibaba, and Huawei Technologies to train large language models using the Mixture of Experts (MoE) method. The results were reportedly comparable to those produced with Nvidia’s H800 chips, sources claim. While Ant continues to use Nvidia chips for some of its AI development, one sources said the company is turning increasingly to alternatives from AMD and Chinese chip-makers for its latest models. The development signals Ant’s deeper involvement in the growing AI race between Chinese and US tech firms, particularly as companies look for cost-effective ways to train models. The experimentation with domestic hardware reflects a broader effort among Chinese firms to work around export restrictions that block access to high-end chips like Nvidia’s H800, which, although not the most advanced, is still one of the more powerful GPUs available to Chinese organisations. Ant has published a research paper describing its work, stating that its models, in some tests, performed better than those developed by Meta. Bloomberg News, which initially reported the matter, has not verified the company’s results independently. If the models perform as claimed, Ant’s efforts may represent a step forward in China’s attempt to lower the cost of running AI applications and reduce the reliance on foreign hardware. MoE models divide tasks into smaller data sets handled by separate components, and have gained attention among AI researchers and data scientists. The technique has been used by Google and the Hangzhou-based startup, DeepSeek. The MoE concept is similar to having a team of specialists, each handling part of a task to make the process of producing models more efficient. Ant has declined to comment on its work with respect to its hardware sources. Training MoE models depends on high-performance GPUs which can be too expensive for smaller companies to acquire or use. Ant’s research focused on reducing that cost barrier. The paper’s title is suffixed with a clear objective: Scaling Models “without premium GPUs.” [our quotation marks] The direction taken by Ant and the use of MoE to reduce training costs contrast with Nvidia’s approach. CEO Officer Jensen Huang has said that demand for computing power will continue to grow, even with the introduction of more efficient models like DeepSeek’s R1. His view is that companies will seek more powerful chips to drive revenue growth, rather than aiming to cut costs with cheaper alternatives. Nvidia’s strategy remains focused on building GPUs with more cores, transistors, and memory. According to the Ant Group paper, training one trillion tokens – the basic units of data AI models use to learn – cost about 6.35 million yuan (roughly $880,000) using conventional high-performance hardware. The company’s optimised training method reduced that cost to around 5.1 million yuan by using lower-specification chips. Ant said it plans to apply its models produced in this way – Ling-Plus and Ling-Lite – to industrial AI use cases like healthcare and finance. Earlier this year, the company acquired Haodf.com, a Chinese online medical platform, to further Ant’s ambition to deploy AI-based solutions in healthcare. It also operates other AI services, including a virtual assistant app called Zhixiaobao and a financial advisory platform known as Maxiaocai. “If you find one point of attack to beat the world’s best kung fu master, you can still say you beat them, which is why real-world application is important,” said Robin Yu, chief technology officer of Beijing-based AI firm, Shengshang Tech. Ant has made its models open source. Ling-Lite has 16.8 billion parameters – settings that help determine how a model functions – while Ling-Plus has 290 billion. For comparison, estimates suggest closed-source GPT-4.5 has around 1.8 trillion parameters, according to MIT Technology Review. Despite progress, Ant’s paper noted that training models remains challenging. Small adjustments to hardware or model structure during model training sometimes resulted in unstable performance, including spikes in error rates.

0 replies8 likes
1

More like this

Recommendations from Medial

Image Description
Image Description

Chamarti Sreekar

Passionate about Pos... • 2m

Bhavish Aggarwal’s AI startup, Krutrim AI, has begun hosting Chinese GenAI company DeepSeek’s open-source models on its cloud platform. Five models, ranging from 8 billion to 70 billion tokens, are now live on Indian servers at the world’s lowest p

See More
7 replies25 likes
9

V

 • 

The NineHertz • 4m

Elon Musk's demand for chips was straining the chip giant's supply chain, a sales lead for Nvidia told colleagues in an email obtained by The Wall Street Journal. Nvidia has also "greatly expanded the available supply" of its chips, the spokesperson

See More
0 replies4 likes
Image Description

Comet

#uiux designer #free... • 9m

Nvidia is expected to generate $12 billion in revenue from the sale of artificial intelligence chips in China this year, despite facing export controls imposed by the US. These controls have significantly impacted. Nvidia's business in one of the

See More
1 replies6 likes
Image Description
Image Description

SamCtrlPlusAltMan

 • 

OpenAI • 11m

To the Chinese Authorities: Suck my Large Language Models.

11 replies11 likes
Image Description

Jainil Prajapati

Turning dreams into ... • 1m

India should focus on fine-tuning existing AI models and building applications rather than investing heavily in foundational models or AI chips, says Groq CEO Jonathan Ross. Is this the right strategy for India to lead in AI innovation? Thoughts?

2 replies3 likes

Sunil Huvanna

Building AI Applicat... • 9m

Luma AI has dropped an surprising BombShell! Thought Sora & other Chinese models r on the claim of supremacy in text-to-video models. Luma has been one of my favourites amongst tools that I have so far used like Runway, Pika Labs and Moon valley. Th

See More
0 replies5 likes

Shivam Sharma

AI & ML engineer • 2m

Leading Chinese AI chip manufacturer Cambricon Technologies has announced its first-ever quarterly profit. This significant event coincides with growing US export restrictions that limit Chinese businesses' access to cutting-edge technology. The acco

See More
0 replies3 likes

Harshajit Sarmah

Founder & Editor of ... • 7m

Krutrim AI Cloud offers free access to developers until Diwali 2024, backed by a ₹100 crore investment to support India’s AI ecosystem. Since its launch, Krutrim AI Cloud has handled 250 billion API calls, generated 1 trillion tokens, and attracted

See More
0 replies4 likes
Image Description

Mr BIZENIUS

Mr. Bizenius • 5m

OpenAI has for months been working with Broadcom to create an AI chip for running models, which could arrive as soon as 2026, reports Reuters. Meanwhile, OpenAI plans to use AMD chips through Microsoft's Azure cloud platform for model training.

1 replies3 likes

Sweekar Koirala

startups, technology... • 8m

OpenAI has announced the launch of SearchGPT, a prototype AI-powered search engine. SearchGPT aims to provide direct answers to user queries by integrating AI language models with real-time web information. SearchGPT is currently a "prototype." It ut

See More
0 replies6 likes

Download the medial app to read full posts, comements and news.