Classification of environmental noise at the DataFest Africa 2022 Hackathon

Classification of environmental noise at the DataFest Africa 2022 Hackathon

Still buzzing about our first ever chance to participate in the DataFest Africa 2022 Hackathon.

Under this year’s theme “Big Data, Little Data and everything in between ” we were glad to provide noise data sets for participants to train a classification model that classifies noise audios into various categories and even more excited about partnering with Pollicy and Zindi Africa on making this challenge come to life. But before we go further you’re probably wondering, why noise?

If you walk up to your local trading center, you’re likely to be met with a loud fusion of different sounds. Some vendors trying to sell groceries to passersby, a truck reversing, a preacher with a loudspeaker, a herbal specialist going on about the advantages of their treatments. The closer you are to the city, the louder the area. Unlike air or water pollution, there’s no immediate substantive evidence of the negative consequences the exposure to high levels of noise does to our psyche, nervous system, productivity levels and general way of life. Pollicy’s Director of Programs, Gilbert Beyamba shares, “Noise pollution, like all other forms of pollution, is an emerging environmental health hazard in African cities, with potentially complex spatial and temporal patterns. This is further challenging due to the limited local data which is a barrier to the formulation and evaluation of patterns to reduce noise pollution.”  This leaves concerned stakeholders such as city planners, engineers and policy makers with no point of reference. 

Here at Sunbird AI, we believe we could fill this gap in Uganda by developing AI-powered tools to remotely collect, test and monitor noise in order to empower enforcement institutions such as the Kampala Capital City Authority (KCCA) and the National Environment Management Authority (NEMA) with real time, accurate and contextualized data to regulate noise levels. We piloted this project in Entebbe city, installing 7 sensors across different locations. 

Our team of engineers built autonomous solar powered noise sensors that pick-up different types of noises. These noises are then assessed to know whether the levels exceed the statutory thresholds of decibels (dBs. We then share access to a public dashboard that continuously shows noise levels across Entebbe and empower community champions as well as technical officers on how to read and develop actionable plans.

Some key terms to note;

1. dB- A decibel is a unit used to measure sound pressure levels.

2. Exceedances- Total number of times noise levels go beyond a given threshold in a given area.


About the Hackathon.
The hackathon that ran from July 7th to 26th, was exclusively open to African countries lasted six weeks and with over 100 participants finding solutions on how to classify noise categories using machine learning. The first, second and third place solutions won $500, $300 and $200 respectively. Sunbird AI’s goal is to integrate parts of the winning solution into our autonomous sensors to enable classification of noise on the edge. 

It was interesting for the participants to see the realization that noise, especially in capital cities, not just in Uganda but across Africa has the potential of causing serious harm. It was good to see that participants interacted with a fairly new type of data with the winning solution managing to achieve 88% accuracy on this machine learning task. 

“It is indeed a problem that needs to be addressed. Deeper and further analysis and processing should be done on the data in order to provide more insights.” Dan Ofula, a Machine Learning Engineer and Data Scientist who emerged winner remarks. 

The second runner up Youssef FADHLOUN an Embedded Telecommunications Engineer and Data Scientist went ahead to suggest ways the new noise datasets produced can be used, “Use the data to create a database of sound levels and locations for future reference. This can be used to inform policy makers so that they can come up with measures to reduce noise pollution.” 

Zindi’s competition lead Amy Bray who was in charge of coordinating and monitoring the noise challenge on the Zindi platform shared, “You don’t think about noise pollution in Africa. Africa has noisy cities. That’s it. However this data and this challenge has shown me that it doesn’t have to be like this. Why can’t Africa have the same systems other countries have, even better, why can’t Africa have solutions created by Africans for Africa that are tailored to her needs.” Lucky for you! This data will be on the Zindi platform and other data scientists interested in this problem can look at it. This data isn’t limited to noise classification and can be used to explore sound data.

We’re hoping that the solutions shared will spur the conversation on how as Africans, we can continue leveraging better data technology to mitigate some of our most pressing challenges, more specifically, challenges that are detrimental to our wellbeing over the long term. 

Sunbird AI and ioTec Limited to provide inclusive financial services

Sunbird AI and ioTec Limited to provide inclusive financial services

We’re excited to officially embark on our partnership with ioTec Limited, a financial technology company and software consultancy firm building secure enterprise software solutions to unlock economic potential by bringing services closer to individuals and businesses through intuitive, simple, secure, and instant digital channels.

ioTec limited aims at building a customer-centric and interoperable platform to democratize access to services, strengthen service-linkages between financial service providers, manufactures and SMEs while subsidizing the high platform maintenance costs. Consequently, accelerating business growth, improving quality of life and the economy at large.

Mr. Kenneth Kwesiga, ioTec’s CEO stated; “We are particularly excited about this partnership because it is strategically aligned with our objective of advancing the financial services sector through increasing access and reach of services equitably. We strongly believe that by collaborating with Sunbird AI to apply evidence-based artificial intelligence models to existing financial services data, we are able to unveil uncharted addressable markets – particularly those regarded as un-credit worthy by conventional methods that require collateral.” 

Dr. Ernest Mwebaze, Sunbird AI’s Executive Director reflects on this partnership, “We’re definitely looking forward to synergising with ioTec limited. The commendable efforts they are making in extending financial services aligns with the work we do in building practical systems to ensure there is a clear benefit to society. At Sunbird AI, we are intent on co-creating ethically sound and inclusive solutions with our partners and this collaboration with ioTec comes right on time.” 

Find out more about the work ioTec is doing here.

Wondering how you can collaborate with Sunbird AI? Check out opportunities for collaboration here.

Inside Sunbird AI’s official launch

Sunbird AI launch panel

Inside Sunbird AI’s official launch

Sunbird AI launch collage

Sunbird AI aims at providing high quality, practical, inclusive and afro-centric Artificial Intelligence systems. We officially opened our doors in August 2019 with a focus on developing open-source applications centered on improving citizen well-being by providing and using better data technology to inform and guide policy decisions.

With the on-set of the pandemic, the opportunity to congregate key stakeholders in the AI space and introduce Sunbird AI was extremely difficult. With the continuous easing of restrictions, we finally got a chance to extensively share what we were all about, what we stood for, our ongoing projects and engage with key figures in the public and private sector, in development as well as government.

Our Executive Director, Dr.Ernest Mwebaze kicked us off with a plenary address, highlighting the three ongoing projects in Environmental Sensing, African Language Technology and the Green Mini-Grid site identification. Sunbird hopes to lead the efforts in AI for social good, in his words, We really want to see whether we can build a portfolio of cases that exemplify the use of AI for social good. Have a look at his speech here.

Sunbird AI tech team

The second segment of our evening was an enlightening discussion on AI and evidence-informed policy. Our moderator Martin Mubangizi of Pulse Lab Uganda guided the conversation on how different leaders in the AI space, such as Peter Kahiigi of Cente-Tech, Susan Sermumaga of UNICEF and Dr. Joyce Nabende of Makerere AI lab are leveraging Artificial Intelligence to create evidence-based enabling systems.

Susan Serumaga emphasized how language technology can be effective on service delivery feedback platforms such as U-report, particularly translating peoples’ queries and concerns to the commonly spoken local languages. Peter Kahiigi highlighted the importance of data to provide financial services and innovative business solutions to the rural poor. Joyce Nabende shared how AI can be used to support smallholder farmers to easily find solutions to some of their commonly faced challenges.

Follow this link to view this important panel discussion.

Sunbird AI launch panel

Our guest of honor Dr. Aminah Zawedde, Permanent Secretary of the Ministry of ICT and National Guidance officiated the launch. She remarked on the usefulness of Sunbird AI’s African language technology and its contribution to equity in communication, more importantly during public service delivery. Take a look at Dr. Zawedde’s closing remarks.

Make sure to also have a go at our translation platform here and send us your feedback!

Sunbird AI launch tweets

Parallel Text Language Corpus Dataset for Key Ugandan Languages

Parallel Text Language Corpus Dataset for Key Ugandan Languages

One of the ways that artificial intelligence shapes society is through language technology. Neural networks that can process language are the basis for being able to search the web, translate between languages, provide recommendations and carry out large scale analysis of text.

Machine translation has for a long time been the de facto NLP task. Unfortunately many African languages have not benefited from the advances of NLP because of limited language resources. Uganda is home to 43 languages and dialects, with most of them more spoken than written.

To contribute to language resources in Africa we set out to collect parallel text sentences for the 5 top languages in Uganda sufficient to enable a start on the machine translation task for Ugandan languages. Our approach was to build on existing efforts in this regard and make this the principal dataset for Ugandan language resources.

Uganda has 43 local/native languages used by large sections of the population. The map shows the spatial distribution of the languages and dialects spoken in Uganda. Within regions there are relatively similar languages/dialects with people able to understand and speak across these dialects, but these differences become more pronounced between different regions.


Africa is a very linguistically diverse continent, with at least 1500 languages (compared to around 200 in Europe), for most of which no AI language technology has ever been developed. Almost all current effort on AI language technology is focused on English and a handful of other languages.

Another issue with the way this technology has evolved is that existing large models are generally trained with text trawled from the Internet, and have shown a tendency to reflect the harmful biases and divisiveness common in online speech.

Starting with our work in 2020 on social media monitoring, we’re interested in showing what’s possible with language AI technology in African languages, and how it can be done responsibly and inclusively.


This project started in October 2020 and was completed in Jan 2021. It was structured as a collaboration between Sunbird AI and the Makerere AI lab. Makerere AI lab has as its strengths, a solid track record of doing applied AI research and interfacing with the eventual clients of the research particularly government agencies in Uganda and direct beneficiaries like small holder farmers. Makerere AI lab is also located in Makerere University the leading university in Uganda and has access to a pool of good graduate and undergraduate research students.


We collected a multilingual parallel text language corpus of 60,000 language phrases/sentences comprising of 10,000 English sentences/phrases and their corresponding translations in five under-resourced languages in Uganda: Luganda, Lugbara, Runyakitara, Acholi and Ateso.

The English sentences were obtained from the following data sources to capture the variety and context of use of language. The most likely use of this corpus will be for applications in these same source domains or similar.

  • Social media (Facebook and Twitter)
  • English Transcripts from radio data
  • Online newspapers, articles, blogs and websites, e.g., Uganda Legal Information Institute (ULII).
  • Text contributions from the Makerere University NLP community.
  • Farmer responses from surveys.

To mitigate privacy and copyright concerns, we only used some of the sources as motivation for creation of similar but different phrases taking account of any privacy and bias concerns e.g. for social media, removing identifying tags and removing explicit references to sensitive attributes like religion and politics. The dataset can be downloaded here.

Interview on ‘The Groove with Crystal: Podcast’

Interview on ‘The Groove with Crystal: Podcast’

On 17th May 2021, Ernest Mwebaze, one of our directors and founders, was hosted by one of Uganda’s radio legends, Crystal Newman. In this podcast they discussed how we are using Artificial Intelligence to curb noise pollution in Kampala.

You can listen to it here: The Groove Podcast – Sunbird

Sunbird AI Sets Out to Help Curb Noise Pollution in Kampala

Sunbird AI Sets Out to Help Curb Noise Pollution in Kampala

The prevalent exposure of Ugandans to noise pollution persists and continues to be unabated because of failures in the monitoring and control framework in the country.

Yet, there is evidence on the relationship between noise pollution and its effects on human health, and the general environment. While KCCA and NEMA had made efforts to control noise pollution within the capital city and courts have weighed in on the issue, there is a need to approach the problem with more innovative and effective ways. These should empower citizens to be at the forefront of noise monitoring and control using simple technology tools.

Further, it is Sunbird AI’s belief that agencies mandated to control and monitor noise pollution should be empowered with technological tools that enable response and enforcement in more efficient ways.

Against this background Sunbird AI, an artificial intelligence firm/company has developed tools to support monitoring, and enforcement of noise pollution controls in Uganda using artificial intelligence. Sunbird AI envisages that these tools will empower the public to be vigilant actors in detecting and reporting noise pollution.

Ten noise collection agents have been dispatched to 66 parishes within the 5 divisions of Kampala city. An additional 100 will be dispatched at the end of the month of May 2021. These agents will enable assessments on general levels of exposure and provide the requisite data regulatory agencies need for decision-making on generating best practices.

Ernest Mwebaze, Sunbird AI’s Director, shared that, ‘identifying areas of high noise pressure is a key element for an effective environmental management and for mitigating impacts, identifying noise hotspots and areas of potential conflicts helps gather baseline knowledge on noise-producing human activities and mapping these areas.’

‘Currently, our focus is limited to more urban and industrialized towns because they have more population and so more human activities going on that are highly considered to be responsible for the high noise production. For example, in Kampala and Wakiso, owing to the level of industrialization, the population, and traffic dynamics (road, air, and railway), the noise pollution continues to increase, unabated,’ Ernest added.

Sunbird AI’s Lydia Sanyu training the noise collection agents on how to use the artificial intelligence to capture noise levels

Equipping citizens to detect, report, and control noise pollution would go a long way in empowering Ugandan citizens to participate and be part of decision-making on a critical issue that affects their lives and health. For the general public to be involved in the regulation of noise pollution and requires the necessary technology that will help them sense and measure their personal exposure to noise in their everyday environments. With Sunbird AI artificial intelligence technology, this is now possible.

Although KCCA and NEMA monitor and control noise pollution, there is a paucity of data and trends documented, and it is reported that there are no established systems to manage and track noise pollution data. This poses challenges and risks in noise pollution monitoring and designing mitigation mechanisms. Sunbird AI is keen on supporting KCCA and NEMA with artificial intelligence to manage and track noise pollution data.

Sunbird AI 2020 Annual Report

Sunbird AI 2020 Annual Report

Good news: the 2020 Sunbird AI annual report is out.

Our annual report contains the work we did as an organization last year. Considering that 2020 was dominated by a global pandemic with lockdowns, curfews and stay-at-home orders among other things, we started out as a completely remote team. In the absence of the ability to move about physically to coordinate our initially planned projects, we turned to the work we could do at that moment: aiding in the analysis of COVID-related data, on social media and on radio.

We also began research and implementation of AI language technology for five Ugandan languages, starting with a dataset of translations from English to these languages.

You can read more about these projects and about our organisation by downloading the report here: Sunbird 2020 Annual Report.

Happy reading!

COVID-19 Analysis (March 2021)

COVID-19 Analysis (March 2021)

March 2021 has been a special month: it marks a year since the world began to see the effects of a rapidly spreading global pandemic. From freezing air travel to closing schools to lockdowns to curfews, the pandemic began to change the way we lived our lives. Worse still were the hospitalizations and deaths of so many people, the losses among our families and friends.

One year later, we are still living through these issues in some form. The rollout of vaccinations offers some hope, but the effects of the pandemic are far from over.

For most of the second half of 2020, we worked with the Ministry of Health (Uganda) to do social media analysis on public discussions about COVID-19. At this one year landmark, the Ministry of Health requested a follow-up analysis to find out what the Ugandan public generally thinks of the current state of events including the recently dropping numbers of recorded COVID-19 cases in Uganda, the rollout of vaccinations, the continuing curfew, and any other COVID-related issues.

We did the analysis on social media data (Twitter and Facebook), over a period of about 12 months, starting from March 2020 to the end of February 2021. 

To carry out analysis, we developed a two-part system:

  1. A pipeline to fetch tweets from the Twitter API and posts from Facebook (through the CrowdTangle API) and store them for analysis.
  2. A machine learning model (our BERT classifier named SunBERT classifier) that we trained using these tweets/posts to predict whether a tweet is COVID-related or not.

Using Twitter’s new Academic Research API, we collected over 1.9 million Ugandan tweets in the period between March 2020 and February 2021. Using the SunBERT Classification Model we developed, we found that approximately 50,000 out of the 1.9 million tweets were related to COVID-19, and most of those were in during March and April 2020. Below is the monthly distribution of COVID-related tweets from our analysis:

Let’s look at an analysis of COVID-related tweets in Uganda over this time period:

It is evident that the discussion of COVID-related issues on Twitter was very high when the pandemic had just begun in March 2020 and has been steadily falling as the Ugandan public has become less and less interested in the pandemic discussion.

Comparing this trend alongside the number of new COVID cases in Uganda (according to Worldometers) reveals a surprising lack of correlation between the two. For example, there was a spike of cases in November but people had got tired of discussing COVID by then, as shown in the graph below:


Despite the relatively few tweets about COVID-19 in Uganda, there were still quite a number of interesting ones that revealed some underlying sentiments that Ugandans had about the pandemic. Let’s explore this in the following case study:


In February 2021, only around 0.6% of tweets in Uganda were related to Covid-19.

Of these, some messages were expressing the pandemic in Uganda to be over, or not to be of significance, as the examples below show:

“Uganda Covid free 💪🏽”

“Now we register only 12 new cases of Covid 19? Small small 12?”

“What was the fear for. Covid was really hyped”


There were also some other themes of discussion that came up repeatedly. Below are a few examples:


“Why is there still a curfew in Uganda?”

“When will curfew be lifted? Asking on behalf of everyone.”

“Sincerely tweeting, why do we still have curfew in Uganda????”



“Are we sure vaccines are safe anyway?”

“Do we really need #COVID19 vaccine as UGANDA?”



“Testing for covid19 in uganda… it’s like a privilege!”

“I’ve tested for covid 8 times since covid came. never tested positive. tonight i sit here to think about my ka money 😫”



“Do our leaders know that covid19 isn’t just  a “period” …. this thing is gonna hit us for a long time. We are gonna have to eventually learn how to live with it ….just like we did with HIV n a bunch of other diseases n conditions”

“Why would we import covid vaccines when covid does not exist in the country 🤔”


Let’s also take a look at the popular tweets within this time period. These seemed to mostly discuss the above topics, some by being very grave about it, and others by trying to present them in a comedic way. A look into a few of these tweets shows this:


“One day I will tell you guys how my mom nearly married me off during the lockdown and how I had to sit her down and give her the “I am not like that kind of girl” speech 😹”

“I have seen great businesses and enterprises close during this Covid19. I have seen renown rich people struggle to provide for their families because their income has been frustrated. If you still have a meal everyday and a roof over your head, count yourself blessed.”

“If there is just one thing I pray for every day is that covid ends and SOPs for taxis are removed. The fact that the common man has to pay twice as much as they used to in order to go to work is heart breaking. I really pray that taxi fares go back to normal soon 🥺.”

“Aaaahhh… but African governments bought tents and cars when rich countries were investing in vaccine research.”



The analysis above has shown us the reducing interest in discussion about COVID-19 in the past months. One can only hope that that does not translate into the dismissal of Standard Operating Procedures (SOPs) and all other safety measures, for the sake of our health in the midst of this pandemic.

Radio Advert Analysis for Covid-19

Radio Advert Analysis for Covid-19

Ever been seated there listening to the radio and then you hear an advert about how COVID-19 spreads and how to stay safe? At Sunbird AI, that’s music to our ears.

Our most recent project has been monitoring and analyzing Ministry of Health adverts on the spread of COVID-19 and related safety measures.

The analysis is to track whether COVID-19 adverts are played on radio stations and how frequently they are played. This is important because the broadcasting of information to the public about COVID-19 is a priority, in order to keep us healthy and safe.

This project was implemented in a number of steps:


First, we have to have the radio data, and that means listening to a whole lot of radio. More than 300 stations, to be exact. And yes, I’m joking, of course, we did not manually listen to all that radio.

We went through a digital data collection process as described below:

  • Compilation of a file with streaming URLs for a number of radio stations whose streaming URLs were easily accessible
  • Writing a Python script to get to each of those streaming URLs and record the data for an hour at a time
  • Writing a cron job to run this script every top of the hour, for most of the hours of the day


By now you could be wondering about the huge amount of storage space that all this data would take up with time. In order to save storage space, a data retention policy is required. A data retention policy within an organization is a set of guidelines that describes which data will be archived, how long it will be kept, what happens to the data at the end of the retention period, and other factors concerning the retention of the data. In our case here, we store the radio recordings on our server only for the current day. At the end of each day, the recordings are backed up to cloud storage and then deleted from the server.


Before fingerprinting the recordings, we had to find a way to test the accuracy of that method. If the results of fingerprinting say that there are two instances of the advert in a certain recording, we had to know for sure how true that was.

This meant that we would have to have some form of labeled data, labeled by we the human beings, in order to prove the computer right. We chose out samples of our huge pile of data and annotated them using Audacity, an amazing audio software.

A glimpse into what the data annotation process looks like:

Here is a sample of annotated data for an hour of radio:

As the image shows, there are a lot of different things that go on in just an hour of radio, but what we are looking out for are the Ministry of Health COVID-19 adverts that run for just about a minute. Now that we know that the advert features in this particular hour, we can run the fingerprinting and see if it comes up with the same result.


First, what does fingerprinting even mean?

Audio fingerprinting is the process of digitally condensing an audio signal, generated by extracting acoustic relevant characteristics of a piece of audio content.

The short version of this is that it finds a way of identifying a piece of audio.

For our project, we ran a fingerprinting script using a Python tool called dejavu, with the aim of identifying the instances of the COVID-19 adverts played within the radio recordings.


After this process, what we get is the ability to choose any radio recording and find out the number of times the COVID-19 adverts are played. This can be extended according to what is required at a given time, for example checking how frequently the adverts play on an entire day on a particular radio station, or checking how they are dispersed throughout the day.

In that way, we achieve the goal of tracking the broadcasting of crucial COVID-19 information to the public, to make sure that safety information becomes common knowledge.