Tallinn University of Technology

To allow as many people as possible to share one and the same information space, the Estonian Public Broadcasting (ERR) has been using automatic subtitles for a little over a year now, which means better viewing convenience of topical programs, for example, for people with hearing loss and those who wish to have subtitles as textual support in addition to the audio.

Tanel Alumäe

The automatic subtitles were first given to ETV’s live programs “Terevisioon”, “Aktuaalne kaamera”, “Ringvaade”, “Esimene stuudio”, “UV Faktor” and “Ukraina stuudio”. 

The project was completed in collaboration between ERR, the Ministry of Education and Research and Tallinn University of Technology and is based on the tool called Kiirkirjutaja, created at the University of Technology.

The need was there long ago

From Tallinn University of Technology, the Kiirkirjutaja project was coordinated by Tenured Associate Professor Tanel Alumäe, Head of the Laboratory of Language Technology at the Department of Software Science. 

According to Alumäe, people with hearing loss had been talking about the idea of live TV broadcasts having subtitles, years ago already. Thanks to the rapid development of Estonian language speech recognition over the past decade, researchers finally felt that the idea had matured and was completely doable. Surprisingly, the birth of Kiirkirjutaja was also boosted by COVID-19, says Alumäe. 

“Due to the pandemic, it became very important to pass on information to the entire population quickly, and extraordinary news programs and press conferences were held every day. To this end, the state found some extraordinary funds to support the project,” Alumäe notes. 

Last year, Urmas Oru, board member of ERR, said that the Public Broadcasting aims to unite as many Estonian people as possible into a single information and discussion space. He explained that the number of programs with subtitles for the hearing impaired has been increased year by year at ERR, but the option was only possible for pre-recorded programs or repeat programs.

“Automatic subtitles are a big leap forward in making sure that topical information reaches those with hearing loss as quickly as possible. The tense international situation we have now, as well as the last two hectic years, attest to this need even more,” Oru said.

Systematic work at the lab

Alumäe reveals that Kiirkirjutaja is largely based on the systematic work that has been done at the language technology lab for the past 20 years or so. In creating artificial intelligence, fully automated transcription, for example, has been relied on, available to everyone interested at tekstiks.ee.

In addition to speech recognition, Kiirkirjutaja contains many other technological components as well. 

  • Speech/non-speech detection (a model that tells you if someone is actually speaking or what you hear is background noise, for example).
  • Estonian language detection (decides whether the language currently spoken is Estonian or another language).
  • Speaker identification.
  • Automatic punctuation. 

In Alumäe’s words, TalTech researchers had already had some contact with most of these components before, but it was a bit difficult to get them to work reliably in real-time mode. 

The development of the Estonian language detector, for example, proved more complicated than initially thought. This component is needed to ensure that Estonian subtitles are not generated for foreign language speech. According to Alumäe, it turned out that the detector generally works very well for the so-called ordinary speech but is often wrong in case of speech with an accent, which is rather often seen in news programs, for example. 

And so it happened that the English-language questions of an Estonian sports reporter were often classified as Estonian speech and Estonian speech with a Russian accent as non-Estonian speech. 

“Luckily, our laboratory had just completed the Estonian language accent corpus, which helped us make the respective models better,” Alumäe reveals how the solution to the issue was found. 

To save on costs, the use of a shadow speaker had to be avoided

Alumäe points out that in Europe, subtitling systems based on speech recognition have actually been used for a long time already, but they usually use a so-called shadow speaker, i.e. a trained expert who reads everything heard on the air out in their own words and rewords the sentences a little where necessary. 

This makes the task of the speech recognition system easier, as it avoids background noise and very spontaneous speech, which is usually where speech recognition errors come from. 

However, in the Kiirkirjutaja project, the aim was to avoid the need for using a shadow speaker in order to save on costs, even if it causes errors in subtitles in certain situations. Even the hearing-impaired people who had a say in the project pointed out, according to Alumäe, that it was better to have poorer quality subtitles than no subtitles at all.

Another key difference with the existing systems is that Kiirkirjutaja is based on free software only and is free of charge for everyone to use. 

A very important service

Alumäe says that a qualitative study was recently conducted with hearing-impaired people, which showed that for this target group, the subtitles generated by Kiirkirjutaja are a crucial service that is used daily. 

People with hearing loss emphasised that subtitles allow them to watch TV with their family, for example, without having to turn the sound up too much and disturbing others – it makes them feel more as part of the society. 

“Many people who do not have hearing loss also use the subtitles prepared by Kiirkirjutaja, for example, when there is household noise or other people are speaking in the background,” Alumäe points out another use of the subtitles.

At present, Kiirkirjutaja is used in ETV to create subtitles for most live programs in Estonian, as well as for the Riigikogu’s YouTube broadcasts. Kanal 2 is also involved in the integration of Kiirkirjutaja, and several companies engaged in the production of press conferences or other online broadcasts have shown interest in it.

When asked whether Kiirkirjutaja is ready or whether it is like the city of Tallinn that will never be complete, Alumäe replies that Kiirkirjutaja is in active development. 

“Right now, we are working on integrating a new speech recognition model, which will reduce errors by about one-third and will also drastically improve the quality of punctuation and thus the readability of subtitles,” he says.

At Tallinn University of Technology, education and research meet entrepreneurship, to promote sustainability and implement innovation. TalTech collaborates with universities, research institutions and companies all over the world. The researchers at the University of Technology provide support in organisational and product development to create solutions that change the world.