Cypriot Greek Joins the AI Voice Revolution

A New Era for Cypriot Greek Speech Recognition

Cypriot Greek speakers may soon be understood by the same voice-activated systems that have struggled to recognize the island’s unique dialect, thanks to a groundbreaking speech-to-text AI model developed by a small team of innovators. This development could change how technology interacts with one of Cyprus’ most distinctive linguistic features.

The project was led by Igor Akimov, an AI product manager at a foreign-interest company, who partnered with two interns—Hussein Khadra and Nikita Markov, students at the University of Nicosia and UCLan. Together, they aimed to solve a problem that has long been overlooked: the inability of existing voice recognition systems to accurately interpret Cypriot Greek.

The team created a speech-to-text AI model, specifically designed to understand and transcribe Cypriot Greek. The system works by converting spoken words into written text, which can then be used in AI voice agents, translation services, or automated phone support. This innovation opens up new possibilities for users who rely on voice-based technologies but have previously faced challenges due to the dialect's distinct pronunciation and vocabulary.

Broader Applications Across Industries

Beyond just improving voice recognition, the technology has potential applications in various sectors. In healthcare, it can automatically transcribe patients’ speech, especially that of older adults, and input it directly into medical systems without manual typing. This could save time and reduce errors in documentation.

In business, the system enables automated voice agents that can interact naturally with Cypriot customers, offering a more personalized experience. In education, it could help preserve the Cypriot dialect and culture by digitizing the island’s audio archives, ensuring that future generations can access and learn from these historical recordings.

The approach taken by the team could also be applied to other underrepresented languages and dialects. One of their main goals was to develop a methodology for working with languages that lack sufficient data, a strategy they believe could be replicated globally.

Challenges in Data Collection

Despite their success, the team faced significant challenges, particularly in gathering data. When looking for resources, they reached out to researchers but found little help. Responses ranged from data being lost, to requests for high fees, or outright refusal.

They scoured dictionaries, texts, and audio samples, but could not find high-quality, accessible datasets that paired speech with transcribed and validated text. Even Meta, which has collected data for 1,600 languages, had zero hours of Cypriot speech available.

So, the team decided to gather all the available Cypriot audio from TV shows, radio stations, podcasts, and books. Step by step, they created the largest Cypriot Greek speech collection ever assembled.

Training the AI System

Training the AI was a gradual process. In the first phase, the system absorbed everyday Cypriot Greek speech, its sounds, rhythms, and unique traits, to get a sense of how the dialect naturally sounds.

Next, the team fed it clearer, professional speech from news broadcasts and radio shows, helping the AI refine its understanding and reduce errors. A special reading assistant, KenLM, was also added to act almost like a tutor, suggesting the most likely words and boosting recognition accuracy.

To keep the model improving, the team built a platform where native speakers can correct the AI’s transcripts. These corrections are fed back into training, making the system increasingly accurate and faithful to the Cypriot dialect over time.

Achieving More with Less

Remarkably, all of this was accomplished on a budget of just $150, thanks to creative approaches and accessible cloud technology.

Yet, the project is far from finished. “With only a few hours of high-quality transcribed audio, we couldn’t create the world’s best model yet – but it’s absolutely achievable,” Akimov explained. “Right now, it’s more of a technological proof-of-concept waiting for more data.”

So far, the team has collected about 300 hours of Cypriot speech and is seeking help from volunteers. Spending just 15 minutes validating transcriptions on the project website could provide enough data to build a state-of-the-art model for Cypriot speech recognition, and potentially even a text-to-speech system that speaks in authentic Cypriot Greek.

Join the Effort

Interested individuals can visit voiceofcyprus.org to validate audio recordings. “This will help us – and Cyprus – tremendously. Even just 10-15 minutes makes a difference,” Akimov said. “We want every Cypriot to be able to speak in their own dialect and still be understood by technology.”

HAWX TECH