Point AI

Powered by AI and perfected by seasoned editors. Every story blends AI speed with human judgment.

EXCLUSIVE

Google partners African universities to launch WAXAL, a dataset of African languages

WAXAL contains 1,250 hours of natural speech and 20 hours of high-quality studio recordings.
Google rolls out AI skilling blueprint |techpoint.africa
Subject(s):

Psst… you’re reading Techpoint Digest

Every day, we handpick the biggest stories, skip the noise, and bring you a fun digest you can trust.

Digest Subscription (In-post)

Google, in partnership with African research institutions, has launched WAXAL, a large-scale open speech dataset designed to enhance artificial intelligence tools for African languages.

The dataset includes speech data for 21 Sub-Saharan African languages, such as Hausa, Yoruba, Igbo, Luganda, Swahili, and Acholi. According to Google, WAXAL is designed to support over 100 million speakers who have largely been left out of voice-based technologies due to a lack of quality language data.

Voice assistants, transcription tools, and other speech-powered technologies are common in many parts of the world. However, Africa’s over 2,000 languages have remained underrepresented in AI systems because of limited speech datasets. This digital divide prevents millions from using voice commands for education, healthcare, or business.

To address this gap, WAXAL was developed over three years with Google’s funding. It contains 1,250 hours of transcribed natural speech and more than 20 hours of high-quality studio recordings that can be used to create realistic synthetic voices.

“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers, and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” Aisha Walcott-Bryantt, Head of Google Research Africa, says.

A key part of the initiative was community involvement. African universities and organisations, including Makerere University in Uganda, University of Ghana, and Digital Umuganda in Rwanda, led the data collection process with support from Google researchers.

Unlike many global datasets, the partner institutions own the data. This ensures that African researchers and students have the power to build their own apps and tools rather than relying on outside companies.

“For AI to have a real impact in Africa, it must speak our languages and understand our contexts. The WAXAL dataset gives our researchers the high-quality data they need to build speech technologies that reflect our unique communities,” Joyce Nakatumba-Nabende, Senior Lecturer at Makerere University, notes.

At the University of Ghana, over 7,000 volunteers contributed their voices to the project. Professor Isaac Wiafe, an Associate Professor at the University of Ghana, says the effort is helping drive innovation in areas such as health, education, and agriculture.

Victoria Fakiya – Senior Writer

Techpoint Digest

Stop struggling to find your tech career path

Discover in-demand tech skills and build a standout portfolio in this FREE 5-day email course

The WAXAL dataset is now publicly available, giving developers, researchers, and startups access to foundational speech data for building more inclusive AI tools across the continent.

Follow Techpoint Africa on WhatsApp!

Never miss a beat on tech, startups, and business news from across Africa with the best of journalism.

Follow

Read next