Artificial intelligence (AI) is incredibly versatile. Language-based AI, for example, can be used to share information in a targeted, personalised way and reach people who cannot read. But there’s a problem. AI can only work when it is 'fed' and trained with data. Suitable language data from African and Asian nations has so far been a scarce resource. The 'FAIR Forward – Artificial Intelligence for All' initiative aims to close this gap and reduce social inequality by providing fair and open access to language data. The project has already seen some initial successes in Rwanda. Millions of people there will soon be able to use a chatbot to receive coronavirus advice in their local language.
'Good morning Alexa, turn the light on and play Ed Sheeran', 'Hey Google, set a timer for ten minutes' – language-based artificial intelligence (AI) is no longer science fiction. But we have by no means taken full advantage of its potential. Digital assistants that react to voice commands can also be used in sustainable development. We have the technology already, but for a machine to react and talk like a human, it has to be trained in the appropriate language. And it is exactly this kind of important language data that has so far been lacking for many languages. Currently, the data is predominately gathered and used by big companies like Google and Amazon. Local languages in Africa and Asia are less relevant to these companies and so are often neglected.
As part of the German Government’s AI strategy, the FAIR Forward – Artificial Intelligence for All project is working to ensure that the necessary training data is compiled and opened up for others to use. Doing so will enable the information to be utilised for sustainable purposes and improve the lives of many people. The project is focusing on the languages of African and Asian countries in particular, which have so far been under-represented in AI development. Commissioned by the German Development Ministry (BMZ) and implemented by the Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) GmbH, the global project is currently active in Ghana, Rwanda, Uganda, South Africa and India. It is working with local companies to gather data that can then be used to promote the development of digital applications and products such as voice assistants.
Due to the lack of local data, AI-based voice assistants have so far mostly only spoken languages such as English, French and German. But how and where can we collect other languages? FAIR Forward is supporting the Mozilla Foundation and its Common Voice platform, which enables people to 'donate' voice recordings in their native language. On the platform, anyone can record and listen to sentences and check the pronunciation. So far, it has collected datasets in 60 languages and many thousands of hours of voice recordings in African languages. This success is largely due to the close cooperation between GIZ, Mozilla and local partners, such as in Rwanda. 'Kinyarwanda, our official language, has become the fastest-growing dataset and second-largest open voice dataset in the world. With the help of volunteers, more than 2,000 hours of content has been recorded,' says Audace Niyonkuru, founder of the start-up Digital Umuganda.
The development of language-based AI is particularly valuable in Rwanda as almost 30 per cent of its citizens are illiterate. Voice assistants are therefore incredibly useful for them. The data gathered in the Kinyarwanda language, which is spoken by more than 12 million people, will soon be used to provide all the country’s citizens with information on topics such as health, without them having to be able to read. This includes information about the coronavirus. The first language-based chatbot is already ready to launch. 'With the help of language-based AI, people can use their smartphones or other mobile devices to communicate with the bot in Kinyarwanda, ask questions about COVID-19 and get the information they need. It’s very easy to use and free to call,' explains Niyonkuru. In future, the chatbot will be able to provide information on other diseases too, such as malaria and HIV.
The GIZ FAIR Forward initiative also began working with Mozilla, Makerere University in Kampala and other partners to collect language data in Uganda in November 2020. Programmes for nine more Indian languages are planned to launch in 2021. But gathering the language data is just the start. Openly available AI training data strengthens the entire local digital environment and promotes innovation. Providing open access to the data enables developers from other regions to access it too and adapt the information for their own, local purposes. The data gathered is incredibly versatile and can be used in areas as diverse as interactive citizen participation, apps that identify plant varieties and diseases, and chatbots that can answer questions on sustainable agriculture. This open form of data gathering can benefit digital transformation, particularly in the Global South, and also plays a role in democratising artificial intelligence.
Last update: February 2021