ASR for the Somali Language

Overview

An end-to-end ASR solution designed to accurately transcribe spoken Somali into text. Achieved a word error rate (WER) of under 30%, demonstrating high transcription accuracy.

The Somali ASR System is a team-driven initiative to create a reliable automatic speech recognition model for Somali—a language with limited digital speech resources. Unlike many ASR projects that rely on existing datasets, we started from scratch, collecting and annotating hours of Somali speech data ourselves.

After preprocessing the data and preparing it for training, we fine-tuned Meta’s Wav2Vec2 model using Hugging Face Transformers, PyTorch, and TensorFlow. Through multiple iterations and careful tuning, we achieved a WER of less than 30%, which is a significant milestone for a low-resource language.

The model is now publicly available on Hugging Face to support future research and development in Somali NLP and speech tech.

🛠️ Technologies Used

Model: Wav2Vec2 (fine-tuned)
Frameworks: PyTorch, TensorFlow, Hugging Face Transformers
Data Collection: Custom Somali speech dataset (recorded and annotated in-house)

🚀 Key Highlights

Built a Somali ASR system as a team from the ground up
Collected and labeled a high-quality Somali speech dataset
Fine-tuned Wav2Vec2 to achieve <30% Word Error Rate
Published the model on Hugging Face for open use

Tech Stack

PythonPytorchWave2Vec2

View Live Site