Interview

‘This research is highly relevant for Europe’

Many AI models reach their limits with underrepresented languages. Dr Simon Ostermann from the German Research Center for Artificial Intelligence (DFKI) is working on this issue. In this interview, the researcher discusses why open language data from the Global South is not only an act of fairness but also a genuine gain in knowledge for AI research.

Interview: Brigitte Spitz
Illustration of a bearded man with glasses at a laptop, surrounded by symbols and words in various languages representing communication and translation.
Portrait of a smiling man with short brown hair, ear studs, and a dark blue shirt in front of a neutral background.
Dr Simon Ostermann from the German Research Center for Artificial Intelligence in Saarbrücken.

Why are open-source language projects an important research topic for the DFKI?

Open-source language projects are particularly interesting for low-resource languages. They question existing assumptions of AI research as many common models and methods have been developed for a few dominant languages and are not easily transferable. Our work with low-resource languages opens up new research questions, for example, on the robustness of language models, on low-data scenarios and on language diversity and multilinguality. Such projects thus contribute directly to further developing fundamental AI methods. Open-source language data from the Global South broadens the empirical basis significantly. It helps reduce distortions in models and develop AI systems that can be used globally.

So German and European institutions benefit from this as well?

Especially for Europe, the research based on such language data is highly relevant because many of the languages spoken in the EU are still underrepresented. German and European research institutions benefit because it allows them to develop models that are more realistic, fair and scientifically robust. Furthermore, new possibilities for comparison are emerging beyond language families and cultural contexts.

FAIR Forward wants to strengthen open-source and trustworthy AI systems. How is this achieved?

FAIR Forward adheres to principles such as openness, transparency and sustainability in handling training data. Documentation, data quality and ethical responsibility are significantly emphasised. In my view, this creates new common standards that facilitate international collaboration by clearly defining expectations and working methods across countries and continents. At the same time, it lowers barriers for long-term cooperation between research, civil society, and public actors.

Where do you see potential for closer cooperation between German research institutions and partners in countries like India, where FAIR Forward cooperates with the Indian Institute of Science, for example?

I see good potential in jointly developing and maintaining open-source datasets and language models that have an international, multilingual and multicultural focus right from the outset. Establishing joint research infrastructures, for example, for compute resources and data platforms, also offers opportunities for sustainable cooperation. In addition, bilateral research projects can help to bring various different perspectives on AI systems together and to develop new methodological approaches together – although, naturally, this always depends on whether the right support mechanisms are available.

Loading