ASR for the Somali Language

Overview
An end-to-end ASR solution designed to accurately transcribe spoken Somali into text. Achieved a word error rate (WER) of under 30%, demonstrating high transcription accuracy.
The Somali ASR System is a team-driven initiative to create a reliable automatic speech recognition model for Somali—a language with limited digital speech resources. Unlike many ASR projects that rely on existing datasets, we started from scratch, collecting and annotating hours of Somali speech data ourselves.
After preprocessing the data and preparing it for training, we fine-tuned Meta’s Wav2Vec2 model using Hugging Face Transformers, PyTorch, and TensorFlow. Through multiple iterations and careful tuning, we achieved a WER of less than 30%, which is a significant milestone for a low-resource language.
The model is now publicly available on Hugging Face to support future research and development in Somali NLP and speech tech.
🛠️ Technologies Used
- Model: Wav2Vec2 (fine-tuned)
- Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
- Data Collection: Custom Somali speech dataset (recorded and annotated in-house)
🚀 Key Highlights
- Built a Somali ASR system as a team from the ground up
- Collected and labeled a high-quality Somali speech dataset
- Fine-tuned Wav2Vec2 to achieve <30% Word Error Rate
- Published the model on Hugging Face for open use