Learning with India: AI language assistants for digital inclusion
When it comes to artificial intelligence (AI), German and Indian interests are aligned. A visit to the subcontinent highlights why it is essential for AI systems to be trained in multiple languages.
At first glance, these two locations appear worlds apart: the Indian Institute of Science (IISc) located in the global innovation hub of Bangalore and a basic health centre in the state of Uttar Pradesh. What connects the team led by IISc professor Prasanta Kumar Ghosh and the community nurses (auxiliary nurse midwives, ANMs), is the use of native-language AI chatbots that aim to improve care for pregnant women and newborn babies in remote areas.
Pilot project with 2,000 community nurses
In Uttar Pradesh, the non-profit organisation ARMMAN has launched a pilot project featuring a multilingual AI assistant for health care professionals. It provides real-time answers via text or voice message on topics such as high-risk pregnancies, explains Amrita Mahale from ARMMAN: ‘We have already involved 2,000 community nurses, who are taking care of more than 400,000 pregnant women.’
One of these women describes her symptoms to ANM Meena Yadav: dizziness and heart palpitations. Yadav takes her mobile phone and asks the ARMMAN chatbot a question in Hindi, in a regional dialect. Within seconds, she has a response: suspected anaemia, and recommendations on how to deal with it. Clear, comprehensible, and with a sound medical basis. For many community nurses, this is essential. They work far from clinics and doctors, and need to respond quickly.
AI needs the right language data in order to work for everyone
The team at the Indian Institute of Science fed the language AI tool with real voice data and created hundreds of hours of voice recordings in nine Indian languages: Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magahi, Maithili, Marathi, and Telugu. Over 900 million people speak these languages. However, diverse non-English datasets are often lacking.
To address this, the BMZ initiative ‘FAIR Forward – AI for All’ collaborated with India’s Ministry of Electronics and Information Technology (MeitY), scientific bodies and civil society.
Voice recordings were produced following consultations with native speakers. FAIR Forward also worked with partners to ensure that the data was made publicly accessible. ‘Open-source voice and language data help maximise last-mile digital empowerment and impact. By adapting and training them for specific use cases, we can build effective voice applications – making our partnerships more impactful,’ says Bhavika Nanawati, AI advisor at FAIR Forward.
‘FAIR Forward promotes freely available, open-source datasets and AI models to put into practice the commitments of the Hamburg Declaration on Responsible AI for the SDGs.’
The BMZ initiative ‘FAIR Forward – AI for All’ promotes open language AI in international cooperation, among other things. Language data and AI models have been developed for many languages, enabling inclusive digital services and greater digital participation in local languages. They are used in agriculture, health care, education and by authorities, for example. Alongside India, partner countries are Ghana, Indonesia, Kenya, Rwanda, South Africa, and Uganda.
GIZ implements FAIR Forward on behalf of the German Federal Ministry for Economic Cooperation and Development.
To ensure the datasets could be used in practice, FAIR Forward worked with governmental bodies such as BHASHINI, India’s national multilingual AI platform. The goal was to develop and provide digital public goods. ‘Without FAIR Forward, this project would not have been possible in many ways. They recognised the value of the language data and technology and supported us,’ says Professor Ghosh from IISc.
Indian-German AI networking
FAIR Forward also connected Indian partners with German research institutions, such as the German Research Center for Artificial Intelligence (DFKI), pooling expertise and computing power.
‘FAIR Forward’s work in India advances artificial intelligence for everyone and strengthens collaboration between Germany and India in the field of AI.’
Back at the health centre in Uttar Pradesh. The community nurse puts down her phone. With the information from the chatbot, she has a stronger basis for advising the patient. ‘This makes us more confident,’ she says.
AI developers can also learn from the ARMMAN pilot project. ‘We deliberately started with Hindi, a language we expected to perform the strongest. If something is amiss here, we know we need to fine-tune it in other languages,’ says Amrita Mahale. The Indian innovation hub ARTPARK is now organising a hackathon to develop an AI system for all new IISc languages.
What is being created here could have far-reaching effects. Open language data as a foundation for AI that reaches everyone – in their own language and daily life, and at the right time. This could also benefit the AI scene in Germany, as Simon Ostermann from DFKI points out in an interview : ‘Open language data from the Global South broadens the empirical base. It helps develop AI systems that can be used globally.’