UCT Researchers Launch AI Model for All 11 South African Languages
Summary
A team at the University of Cape Town has created an AI language model that covers all 11 of South Africa's official written languages. This breakthrough addresses a major gap for millions of speakers who have struggled with mainstream AI tools. Researchers Anri Lombard and Jan Buys lead the project, presenting it at the Language Resources and Evaluation Conference in Mallorca, Spain. They developed MzansiText, a multilingual dataset, and MzansiLM, a language model trained from scratch. Most South African languages are considered low-resource, meaning there’s limited data available to train AI models. For instance, isiNdebele and Sepedi often yield inconsistent results from popular AI assistants. The new model aims to change that, offering a more inclusive solution. While MzansiLM has 125 million parameters, it's still smaller compared to commercial AI systems, but it’s a significant step forward. The bottom line? This development not only enhances accessibility for speakers of underrepresented languages but also enriches the landscape of AI, making it more equitable and diverse.
This is an AI-generated audio summary. Always check the original source for complete reporting.