News on Medial

IIT-Madrasโ€™ AI4Bharat Unveils IndicVoices, Offers Access To 7,300 Hours Of Speech Datasets

Inc42Inc42 ยท 1y ago
IIT-Madrasโ€™ AI4Bharat Unveils IndicVoices, Offers Access To 7,300 Hours Of Speech Datasets
Medial

AI4Bharat, a research lab at IIT-Madras, has released an open-source speech dataset called IndicVoices funded by the Ministry of Electronics and Information Technology's (MeitY) Bhashini initiative. The dataset includes 7,348 hours of audio from 16,237 speakers in 22 Indian languages. The project aims to collect 17,000 hours of voice data from over 400 districts across India to build the country's first automatic speech recognition (ASR) model that covers all 22 languages listed in the Indian Constitution's eighth schedule. The initiative will enable the training of ASR models to transcribe Indian languages and facilitate governance delivery in multiple languages.

Related News

Download the medial app to read full posts, comements and news.