Bridging the linguistic divide: Inclusivity through AI translation in Uganda

Bridging the linguistic divide: Inclusivity through AI translation in Uganda

Language barriers can be a significant challenge for individuals and communities, particularly in situations where clear communication is essential. In a linguistically diverse country like Uganda where over 40 local languages are spoken across the country, language barriers pose a daily challenge. This is especially true in situations such as accessing services in government facilities, being displaced by emergencies, or carrying out projects, surveys, or business.

Imagine being in a hospital or government office and not being able to understand the signs and forms, and not knowing how to ask anyone for help. Language technology promises new ways of addressing communication barriers and accelerating access to information and services in the digital era for citizens and decision makers. However, language technology for African languages remains largely underdeveloped and thus the potential of these new technologies is barely harnessed in our contexts. African languages are generally termed ‘low-resourced language’ in the AI space because of the limited amount of language datasets available from which to develop machine learning models for translation and entity recognition. In 2020, with support from the Hewlett Foundation, Sunbird AI commenced the African language technology project. We created the largest ever multilingual parallel text dataset of Ugandan languages, with translations in Acholi, Ateso, Luganda, Lugbara and Runyankole.

Much of the work in developing new language technology is in collecting training data. Researchers have studied the harms that can come from indiscriminately using text from the internet to train models: it can result in systems which are biased, offensive or harmful. We set out to collect data from scratch in five local languages as well as English, across a set of topics of local importance (e.g. health, agriculture, society). We worked with the Makerere University AI Lab in collecting 25,000 sentences translated across all languages, which is available as the Sunbird AI Language Translation (SALT) dataset.

Real-time translation with Sunbird system

Text translation can help with many kinds of practical situations. For individuals, translation can help while travelling, or for coordinating business, finding out the news, learning a language, or interacting with authorities. We are especially interested in how it can help organisations, though. For example, government departments can use translation to make their information more accessible to more people; development initiatives can analyse feedback from their target beneficiaries in their preferred languages.

It can also be useful for understanding citizens’ opinions on policy and service delivery. During the COVID pandemic, the public health emergency required new measures and policies to be put in place all over the world that changed the way we worked, studied and lived. These “standard operating procedures” (SOPs) had a great effect on the lives of Ugandans. We worked with the National COVID-19 Task Force to understand these effects. In particular, how these new measures and policies aimed at controlling the spread of COVID-19 affect people in Uganda. How did the people react when the SOPs were put in place and what were the emerging issues from the public that needed to be prioritised by the government? Our team looked at insights from social media about the practical measures that the Ugandan government was putting in place. It became clear that much of the most useful information is in local languages, which have little or no NLP or translation support.

Text to Speech Version.
Speech technology is important because local languages are more often spoken rather than written. It’s common for somebody to be able to speak a language but not to be able to read or write in it. Text-to-speech (TTS) models are normally trained by having a voice actor in a studio make recordings of sentences. Sunbird AI has been able to train Luganda TTS in a new way, using crowdsourced data from Common Voice. The model was trained using the voices of hundreds of individuals. This is to our knowledge the only existing TTS model for any Ugandan language, and is freely and openly available.

Our resources are available for you to use;
Live translation system:
Technical report with the details of collecting text data and training translation models.
SALT dataset for training and evaluation translation models.
Open source models for translation and speech generation.