Laboratory of Language Technology

About the Laboratory of Language Technology
Events
People
Studies
Open positions
Projects
Publications
Software
Datasets
Contact

About the Laboratory of Language Technology

The laboratory focuses on the following topics:

Speech recognition
Speaker recognition
Language and accent identification
Speech corpora
Phonetics (Estonian prosody and sound system, L2 speech)
Various subtopics in natural language processing

In addition, we also work on making speech technology more accessible to the general public, by creating end-user oriented speech recognition applications and packaging speech recognition related software components in more accessible form. Our main focus is on Estonian speech recognition, but most of the components are not specific to Estonian. We are firm supporters of open source.

The lab was formerly part of the Institute of Cybernetics. Our old web pages are available here.

News

ERR Novaator: Uudne teadustöö pakub nippe aariate arusaadavamaks muutmiseks

Ooperilauljatele heidetakse ikka ja jälle ette, et nende lauldud tekst jääb kuulajatele arusaamatuks. Eesti Muusika- ja Teatriakadeemia ning Tallinna Tehnikaülikooli teadlaste ühistöö aitab nüüd mõista selle juurpõhjuseid ja visandab strateegiad laulu arusaadavamaks muutmiseks.

Tanel Alumäe: Kas üldine tehisintellekt on lähedal ja mida see meie jaoks tähendab

Tanel Alumäe esines ETAGi teaduspoliitika konverentsil ettekandega "Kas üldine tehisintellekt on lähedal ja mida see meie jaoks tähendab?"

Events

All events

The 36th Finnic Phonetics Symposium

25.04.202409:00

—

26.04.202417:00

TalTech Mektory, Raja 15, 12616 Tallinn

Speech Technology Workshop

21.02.202310:00

—

21.02.202317:00

Küberneetika maja (Akadeemia tee 21/1, Tallinn, ruum 101) / MS Teams

Kääriku NLP seminar

10.06.202410:00

—

10.06.202418:30

Kääriku spordibaas

All events

People

Past members
Tiia Sildam
Andra Velve
Henry Härm
Yuta Yanagi
Andres Käver
Kunnar Kukk
Jörgen Valk
Asadullah
Ottokar Tilk
Kairit Sirts
Rena Nemoto
Rainer Metsvahi

Studies

Courses

ITS8040 - Natural Language and Speech Processing

Lecturer: Tanel Alumäe
Language: English
Level: Master
Course details
Slides from the 2025 Spring Semester

ITS8035 - Speech Processing by Humans and Computers (Kõnetöötlus inimeses ja arvutis)

Lecturer: Einar Meister
Language: Estonian
Level: Master
Course details

Master Theses

We offer supervision of Master theses on topics that are related to our research.

Here is a selection of already supervised theses:

Helena Grete Lillepalu, Tanel Alumäe (sup). Võrdlusanalüüs suurte keelemudelite jõudluse hindamiseks eesti keeles. 2025
Erik Illaste, Tanel Alumäe (sup). Detection of Cognitive Decline from Spontaneous Speech: Comparing Model Performance and Interpretability. 2025.
Tiia Sildam, Andra Velve, Tanel Alumäe (sup). Kaskaad- ja otsemeetodi võrdlus eesti keele suulise kõne tõlkesüsteemide näitel. 2024.
Oleksanda Zamana, Tanel Alumäe (sup). Using Pretrained Language Models for Improved Speaker Identification. 2024.
Priit Käärd, Tanel Alumäe (sup). Weakly Supervised Speaker Identification System Implementation based on Estonian Public Figures. 2023.
Artem Filipenko, Tanel Alumäe (sup). Data Augmentation Techniques for Advanced End-to-End Keyphrase Extraction from Text. 2023.
Erko Peterson, Einar Meister (sup). Veebirakendus eesti keele häälduse treeninguks. 2023.
Ilja Samoilov, Tanel Alumäe (sup). Converting Automatic Transcriptions for Television Programs to Readable Subtitles. 2022. Nominated as one of the best MSc theses of the School of IT, received special prizes.
Andres Käver, Tanel Alumäe (sup). Efficient Population-based Data Augmentation in Speaker Verification. 2021.
Henry Härm, Tanel Alumäe (sup). Abstractive Summarization of News Broadcasts for Low Resource Languages. 2021.
Anu Käver, Tanel Alumäe (sup). Extractive Question Answering for Estonian Language. 2021.
Fred-Eric Kirsi, Tanel Alumäe (sup). End-to-end Phoneme Segmentation. 2020.
Jörgen Valk, Tanel Alumäe (sup). Using Web Scraping for Building Spoken Language Identification Models. 2020.
Hendrik Kivi, Tanel Alumäe (sup). Identification and Localization of Foreign Accent in Speech. 2020.
Aivo Olev, Tanel Alumäe (sup). Web Application for Authoring Speech Transcriptions. 2019.
Siim Kaspar Uustalu, Tanel Alumäe (sup). Automated Detection and Sentiment Analysis of Registered Entity Mentions in Estonian Language News Media. 2019. Nominated as one of the best MSc theses of the School of IT.
Siim Talts, Tanel Alumäe (sup). Analysing Election Candidate Exposure in Broadcast Media Using Weakly Supervised Training. 2019.
Leo Kristopher Piel, Tanel Alumäe (sup). Speech-based Identification of Children's Gender and Age with Neural Networks. 2018. Nominated as one of the best MSc theses of the School of IT.
Margus Baumann, Tanel Alumäe (sup). Identification of Foreign Language Accent from Speech Using Neural Networks. 2018.
Martin Talimets, Tanel Alumäe (sup). End-to-End Speech Recognition for Estonian. 2018.
Martin Väljaots, Einar Meister (sup). Computer Aided Pronunciation Training Tool for Estonian. 2018.
Roman Hrushchak, Einar Meister (sup). Visualization of Tongue and Lip Movements. 2018.
Evgeniia Rykova, Einar Meister (sup). Perceptual and acoustic similarities between the voices of family members: an approach to synthesize a voice based on family-shared F0 characteristics. 2018.
Thales Santos Ribeiro, Einar Meister (sup). Online Recording of Speech Corpora. 2018.
Lasha Amashukeli, Einar Meister (sup). Online Perception Experiments. 2018
Martin Karu, Tanel Alumäe (sup). Weakly Supervised Training of Speaker Identification Models. 2017. Best MSc thesis of the School of IT.
Anton Malmi, Einar Meister (sup), Pärtel Lippus (sup). Intervokaalse /l/-i kvaliteet ja palatograafia. 2016.
Rainer Metsvahi, Einar Meister (sup). Kõnesüntees Markovi peitmudelitega. 2012
Ervin Veber, Tanel Tammet (sup), Einar Meister (sup). HTML dokumentide teisendamine kõnesünteesi keelde. 2007.

PhD Studies

We are looking for talented and hardworking people to do their doctoral studies on topics that are related to our research.

All PhD students at our lab become a member of our team. You will be hired as an Early Stage Researcher, and will get a salary from the university, in addition to the doctoral scholarship. The full compensation depends on the person (better skills and better research output result in better salary), but the minimum is 2000 EUR (after taxes). This is actually about 25% more than the avarage salary in Estonia. Living costs in Estonia are significantly lower than in most Western European countries.

We can admit new PhD students any time.

Several topics in the field of speech recognition, speech translation, speaker recognition and text summarization are possible (the exact topic can be determined based on the student and her/his interests and skills).

Requirements:

Interest in scientific research (and understanding about what research is)
Masters degree in computer science (or a related field)
Good background in mathematics, statistics, probability theory and linear algebra
Good background in some subfield of speech technology (e.g., speech recogniton, speaker recogniton)
Knowledge of modern approaches in machine learning (incl. deep learning)
Excellent programming skills (Python, C++, bash scripting)
Experience with modern deep learning toolkits (Pytorch, Tensorflow)
Excellent academic writing skills
Previous academic or industry experience in speech or language processing is beneficial (but not strictly needed)

Current PhD Students

Aivo Olev, supervisor Tanel Alumäe. Speech processing of non-native speech.
Joonas Kalda, supervisor Tanel Alumäe. New algorithms in speaker segmentation and identification.
Martin Verrev, supervisor Tanel Alumäe and Tanel Tammet. Knowledge extraction from natural language using both machine learning and common sense knowledge systems.

Graduated Phd Students

Anton Malmi, supervisor Pärtel Lippus (University of Tartu) and Einar Meister. Vene emakeelega keelejuhtide eesti keele palatalisatsiooni akustika, taju ja produktsioon. 2022
Ottokar Tilk, supervisor Tanel Alumäe and Leo Võhandu. Neural Networks for Language Modeling and Related Tasks in Low-Resourced Domains and Languages. 2018.

Open positions

All news

18.12.2023

Postdoc Position in Speech Processing

We are looking to fill a postdoc position in at the at Laboratory of Language Technology, lead by Prof. Tanel Alumäe. The position is funded by...

All news

Projects

Current Projects

EXAI: Estonian Centre of Excellence in Artificial Intelligence (2024−2030)

EXAI focuses on advancing innovative methodologies for:

leveraging foundation models in building efficient and trustworthy analysis and prediction systems;
implementing control mechanisms and guardrails to ensure that the advanced AI systems follow their specification;
adapting and enhancing AI systems for improved performance in targeted application contexts;
achieving end-to-end security and privacy assurance of AI systems.

EXAI is a joint project by Tartu University (lead), Tallinn University of Technology and Cybernetica.

Estonian and Multilingual Speech Translation (2023-2024)

The aim of the project is to develop solutions for speech translation, or machine translation of speech signals, and to explore optimal approaches to this task. During our work, we will test different approaches, including end-to-end (directly translating a speech input into a text translation output) and pipeline approaches (separate speech recognition, machine translation and speech synthesis), we will also add Estonian speech input and text output to existing speech translation benchmarks, and will create a working server solution to use project results through API and integrate into the MS Teams/Zoom communication programs. We will focus on Estonian <-> English/Russian translation directions; if possible, we will also add German, Finnish and Latvian. The results of the project will be shared as open data and open source software. Research results will be published in scientific conferences and journals.

Estonian Speech Recognition (2018-2022)

Speech recognition is a technology for converting natural speech to text. It is used for dictating documents and automatic transcription of speech recordings. Estonian speech recognition has significantly improved during the recent years. On broadcast speech data, a word error rate of 10% has been reached. The improvements have been made possible due to our work on collecting and transcribing new speech corpora and recent advancements in deep neural networks in machine learning. The goal of this project is to further improve the state of Estonian speech recognition. We focus on the kind of speech data which currently causes many recognition errors: noisy data, multi-speaker meetings, speech from seniors, speech with high code-switching content. To fulfill this goal, we will improve the currently used speech recognition methods and algorithms and transcribe new speech corpora. We will also improve the flexibility and usability of our open-source Estonian speech recognition systems.

Improving the Intelligibility of Sung Text: the Problems and the Scientific Basis (2022-2026)

Joint project with the Estonian Academy of Music and Theatre

We expect vocalists to sing with intelligible text, but singers also have to obey the constraints which are dictated by the music. Thus, the methods which are used to enhance diction in speaking may not necessarily be fully applicable to singing. The standpoints of singers regarding how to achieve clear pronunciation are controversial, and investigations on the subject are scarce. This project aims to create a scientific basis for the further development of strategies to achieve a good balance between intelligibility and the requirements of the music, such as cantilena and phrasing, when singing in various acoustics and with the presence of the accompaniment. The research method includes the acoustical analysis of the vocal performances and carrying out perception tests of vocal stimuli with systematically modified phonetic and musical parameters. The results are applicable to voice training and could help text writers and composers to reduce problems of text intelligibility.

CEES Centre of Excellence in Estonian Studies (2016-2023)

The Centre of Excellence in Estonian Studies (CEES) is supported by the European Union through the European Regional Development Fund in the years 2016–2023. The centre assembles 15 research groups with more than 60 researchers and more than 50 post-graduate students from the different research institutions (including our lab).

The Centre of Excellence in Estonian Studies research is connected with focal phenomena of

the Estonian society and culture, some of them of carrying emblematic connotation: the Estonian language itself and its wide array of sublanguages and dialects, unique regilaul-verse, song festivals and the choir movement, original poetic culture, sacred sites;
the Estonian diasporas and ethnic groups (primarily Estonian Russians, Old Believers, Finno-Ugric minorities, neighboring and contact groups).
global cultural trends and local variations of global cultural phenomena (epic(s), humour, mythology, etc.), reinvented and modenized forms (e.g. punk song festivals),
contemporary culture, incl transmedia texts and behaviour.

Text Summarization for Estonian (2022-2024)

The goal of the project is to create a system for generating abstractive summaries for Estonian texts. Based on a (long) input text, the system will generate a short and concise text that will reproduce the most important information in the input text, without necessarily using the sentences and expressions present in the input text.

During the project, a large corpus of Estonian document-summary pairs will be annotated, and experiments with different end-to-end summarization models will be performed. Special attention will be given to transfer learning and multilingual
models that are inevitable under low-resource constraints. A method for finetuning the general model to a certain domain will be performed.

The implemented models, code and datasets will be distributed under open source licenses. A graphical user interface and a REST API will be implemented for the summarization system and it will also be distributed as a Docker image.

Past Projects

Subtitles for Estonian Live Broadcasts using ASR (2021)

The goal of this project is to develop technology that can produce real-time subtitles for live TV broadcasts (such as news, debates), press conferences and other live events, using speech recognition.

The subtitling system is available here. It is currently used by the Estonian Parliament (Riigikogu) for live captioning parliament sessions (available here, if a session is live), and by Estonian Public Broadcasting (ERR) for realtime captioning of live news programmes and talk shows. News story (in Estonian): https://diktor.geenius.ee/rubriik/tele/etv-otsesaadetele-saab-tanasest-valida-automaatsubtiitreid/

Rich Transcription System for the Estonian Parliament (2018-2020)

Within this project, we implemented a system for producing transcripts for the Estonian Parliament using speech recognition, automatic punctuation and speaker identification technology. The system outputs fully punctuated and "nicely" formatted text and identifies members of the parliament based on their voice. The system has been deployed to production. News story (in Estonian): https://www.err.ee/1134953/eesti-viimaste-stenografistide-too-vottis-ule-robot

Publications

Joonas Kalda, Séverin Baroudi, Martin Lebourdais, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin. Design choices for PixIT-based speaker-attributed ASR: Team ToTaTo at the NOTSOFAR-1 challenge. Computer Speech & Language (2025)
Tanel Alumäe, Artem Fedorchenko. TalTech Systems for the Interspeech 2025 ML-SUPERB 2.0 Challenge. Interspeech 2025.
Erik Illaste, Tanel Alumäe. TalTech Systems for the PROCESS Signal Processing Grand Challenge. ICASSP 2025.
Tanel Alumäe, Allison Koenecke. Striving for open-source and equitable speech-to-speech translation. Nature (News & Views Forum).
Artem Fedorchenko, Tanel Alumäe. Optimizing Estonian TV Subtitles with Semi-supervised Learning and LLMs. NoDaLiDa-BalticHLT 2025.

Aivo Olev, Tanel Alumäe. Open Source Platform for Estonian Speech Transcription. Language Resources & Evaluation.
Joonas Kalda, Tanel Alumäe, Séverin Baroudi, Martin Lebourdais, Hervé Bredin, Ricard Marxer. ToTaTo System Descriptions for the NOTSOFAR1 Challenge. CHiME 2024.
Joonas Kalda, Tanel Alumäe, Martin Lebourdais, Hervé Bredin, Séverin Baroudi, Ricard Marxer. TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024. Interspeech 2024.
Tiia Sildam, Andra Velve, Tanel Alumäe. Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation . LoResMT Workshop @ ACL 2024.
Oleksandra V Zamana, Priit Käärd, Tanel Alumäe. Using Pretrained Language Models for Improved Speaker Identification. Speaker Odyssey 2024.
Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin. PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings. arXiv:2403.02288, Speaker Odyssey 2024.
Henry Härm, Tanel Alumäe. TalTech Systems for the Odyssey 2024 Emotion Recognition Challenge. Speaker Odyssey 2024.
Daniil Rõbnikov, Tanel Alumäe. Single-Stage TTS with Adapted Vocoder and Cross-Attenton: TalTech Systems for the LIMMITS'24 Challenge. ICASSP 2024.
Allan Vurma, Einar Meister, Lya Meister, Jaan Ross, Marju Raju, Veeda Kala, Tuuri Dede. Coping with reverberant acoustics in singing by extending the plosive closures in vowel-plosive-vowel sequences. ISAPh 2024.

Alumäe, T.; Kong, J.; Rõbnikov, D. Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge. ASRU 2023.
Vurma, A.; Meister, E.; Meister, L.; Ross, J.; Raju, M.; Kala V.; Dede, T. The intensities of vowels and plosive bursts and their impact on text intelligibility in singing. Journal of the Acoustical Society of America (Vol. 154, Issue 4).
Alumäe, T.; Kalda, J.; Bode, K.; Kaitsa, M. Automatic Closed Captioning for Estonian Live Broadcasts. NoDaLiDa 2023.
Alumäe, T., Kukk, K., Le, V.-B., Barras, C., Messaoudi, A., Ben Kheder, W. Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation. Interspeech 2023.
Meister, E., Vurma, A., Dede, T., Kala, V., Meister, L., Raju, M., Ross, J. The Impact of the Intensity Ratio between Vowels and Voiceless Plosives on the Intelligibility of Sung Text. ICPhS 2023.
Meister, E., Meister, L. Developmental Changes of Fundamental Frequency and Temporal Characteristics in Estonian Adolescent Speech. ICPhS 2023.
Ebrahimi, A., ..., Alumäe, T. et al. Findings of the Second AmericasNLP Competition on Speech-to-Text Translation. NeurIPS 2022 Competition Track. PMLR, 2022.
Vurma, A.; Ross, J.; Meister, E.; Meister, L.; Raju, M.; Kala, V.; Dede, T. Intelligibility of plosives in operatic singing. ICMPC/APSCOM, 2023.

Malmi, Anton; Lippus, Pärtel; Meister, Einar. Spectral and temporal properties of Estonian palatalization. Journal of the International Phonetic Association. 2022 May 13:1-26.
Kalda, Joonas; Alumäe, Tanel. Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech. Speaker Odyssey 2022.
Alumäe, Tanel; Kukk, Kunnar. Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge. Speaker Odyssey 2022.
Kukk, Kunnar; Alumäe, Tanel. Improving Language Identification of Accented Speech. Interspeech 2022.
Malmi, Anton; Lippus, Pärtel; Meister, Einar. Articulatory properties of Estonian palatalization by Russian L1 speakers. Eesti Ja Soome-Ugri Keeleteaduse Ajakiri. Journal of Estonian and Finno-Ugric Linguistics, 13(2), 79–118.
Härm, Henry; Alumäe, Tanel. Abstractive Summarization of Broadcast News Stories for Estonian. Baltic HLT 2022.
Olev, Aivo; Alumäe, Tanel. Estonian Speech Recognition and Transcription Editing Service. Baltic HLT 2022.
Meister, Einar; Meister, Lya. Estonian Elderly Speech Corpus – Design, Collection and Preliminary Acoustic Analysis. Baltic HLT 2022.
Yuta Yanagi, Ryohei Orihara, Yasuyuki Tahara, Yuichi Sei, Tanel Alumäe, Akihiko Ohsuga. Inspection of The Classifying Performance of The Deepfake Voices by The Latest Text-to-Speech Models. ICMECE 2022.

Valk, Jörgen; Alumäe, Tanel. VoxLingua107: a Dataset for Spoken Language Recognition. IEEE SLT 2021 (video presentation).
Alumäe, Tanel; Jiaming Kong. Combining Hybrid and End-to-End Approaches for the OpenASR20 Challenge. Interspeech 2021.
Leier, Mairo; Riid, Andri;Alumäe, Tanel; Reinsalu, Uljana;Pihlak, René; Udal, Andres; Heinsar, Risto; Vainküla, Sven. Smart Elevator with Unsupervised Learning for Visitor Profiling and Personalised Destination Prediction. IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA) 2021.
Meister, Einar; Meister, Lya. Developmental Changes of Vowel Acoustics in Adolescents. Interspeech 2021.
Tavi, Lauri; Tomi Kinnunen; Einar Meister; Rosa González-Hautamäki; Anton Malmi. Articulation During Voice Disguise: A Pilot Study. SPECOM 2021.

Alumäe, Tanel; Valk, Jörgen. The TalTech Systems for the Short-duration Speaker Verification Challenge 2020. Interspeech 2020.
Lancucki, Adrian; Chorowski, Jan; Sanchez, Guillaume; Marxer, Ricard; Chen, Nanxin; Dolfing, Hans; Khurana, Sameer; Alumae, Tanel; Laurent, Antoine. Robust training of vector quantized bottleneck models. IJCNN 2020.
Talts, Siim; Alumäe, Tanel. Analyzing candidate speaking time in Estonian parliament election debates. DHN 2020.

Meister, Einar; Meister, Lya. Eesti laste kõne II. Vokaalide akustiline analüüs. Keel ja Kirjandus, 62 (4), 282−295.
Meister, Einar; Meister, Lya. Production of Estonian vowels by Finnish speakers. Eesti ja soome-ugri keeleteaduse ajakiri / Journal of Estonian and Finno-Ugric Linguistics, 10 (1), 129−143.
Alumäe, Tanel; Meister, Einar. Kõnetehnoloogia- ja foneetikauuringutest Tallinna Tehnikaülikoolis. Teadusmõte Eestis (X). Tehnikateadused III (177−189).
Tavi, Lauri; Alumäe, Tanel; Werner, Stefan. Recognition of creaky voice from emergency calls. Interspeech 2019.
Chorowski, Jan K.; Laurent, Antoine; Chen, Nanxin; Dolfing, Hans J.G.A.; Łańcucki, Adrian; Sanchez, Guillaume; Khurana, Sameer; Alumäe, Tanel; Laurent, Antoine. Unsupervised neural segmentation and clustering for unit discovery in sequential data. Perception as generative reasoning : Structure, Causality, Probability, NeurIPS 2019 workshop.

Paats, A.; Alumäe, T.; Meister, E.; Fridolin, I. Retrospective analysis of clinical performance of an Estonian speech recognition system for radiology: effects of different acoustic and language models. Journal of Digital Imaging, 31 (5). Best paper of the year of the Scool of IT.
Asadullah; Alumäe, Tanel. Data augmentation and teacher-student training for LF-MMI based robust speech recognition. TSD 2018. Best student paper award.
Karu, Martin; Alumäe, Tanel. Weakly supervised training of speaker identification models. Odyssey 2018 The Speaker and Language Recognition Workshop.
Alumäe, Tanel; Tilk, Ottokar; Asadullah. Advanced rich transcription system for Estonian speech. Baltic HLT 2018.
Piel, Leo Kristopher; Alumäe, Tanel. Speech-Based Identification of Children’s Gender and Age with Neural networks. Baltic HLT 2018.
Alumäe, Tanel. Training speaker recognition models with recording-level labels. SLT 2018.
Kelli, Aleksei; Vider, Kadri; Kull, Irene; Siil, Triin; Lindén, Krister; Tavast, Arvi; Värv, Age; Ginter, Carri; Meister, Einar. Keeleressursside loomise ja kasutamisega seonduvaid isikuandmete kaitse küsimusi. Eesti Rakenduslingvistika Ühingu aastaraamat.
Meister, Einar. Keeletehnoloogia ja eesti keel. Raag, Raimo; Valge, Jüri (Toim.). Sõida tasa üle silla : raamat eesti keelest ja meelest (223−234). Tallinn; Tartu: EKSA.

Software

End-user applications

Web-based Estonian speech transcription system

Web applications that allows to transcribe long speech recordings, such as interviews, conference speeches. It uses our latest Estonian speech recognition technology. Also does automatic punctuation and identifies Estonian public figures based on their voice.

The application can be used via fully web-based interface that also provides web-based post-editing capabilities.

Source code:

Transcription system: https://github.com/taltechnlp/est-asr-pipeline/tree/whisper-gpu

Offline subtitling system for Estonian

Offline subtitling system can be used to produce same-language subtitles for Estonian speech recordings, including Youtube videos. Produced subtitles have timecodes, are properly segmented into not-too-long chunks and are sometimes post-editied for better readability. The model is trained on 900 hours of manually produced hard-of-hearing subtitles.

Links:

Subtitling system: http://bark.cs.taltech.ee/subtitreeri/
Models: https://huggingface.co/TalTechNLP/whisper-large-v3-et-subs, https://huggingface.co/TalTechNLP/whisper-large-v3-turbo-et-subs

Live captioning system for Estonian (Kiirkirjutaja)

Kiirkirjutaja is a realtime speech-to-text tool designed for real-time subtitling of Estonian TV broadcasts and streaming media. It consists of the following components: speech activity detector, online speaker change detector , speech recognition, punctuator and words-to-numbers converter.

Links:

Estonian speech recognition for Android (Kõnele)

For many years, Estonian speech recognition was natively not available for Android. Therefore, we developed an Android application Kõnele. Kõnele works as a virtual keyboard: when the user wants to dictate text in any application (e.g. GMail), she/he can switch to the Kõnele keyboard and use speech recognition to input text.

Kõnele uses client-server based speech recognition: Kõnele records user's speech and sends it to the lab's server, and the server sends recognized text back to the user's device.

Links:

Kõnele in Google Play store
Kõnele source code and documentation
Full-duplex real-time speech recognition server based on Kaldi: https://github.com/alumae/kaldi-gstreamer-server

Automatic phonetic segmentation for Estonian speech

The web application uses speech recognition models to generate phoneme boundaries for Estonian speech, based on a provided orthographic transcript. It is primarily used by phoneticians to generate initial phonetic segmentations for phonetic corpus annotation.

Links:

Transcribed Speech Archive Browser (TSAB)

An interface to a large collection of automatically transcribed Estonian radio broadcasts and some podcasts. Serves mostly as a showcase of our Estonian speech recognition technology.

Links:

Application: http://bark.phon.ioc.ee/tsab
Source code: https://github.com/alumae/tsab
Publication: Tanel Alumäe, Ahti Kitsik. TSAB - web interface for transcribed speech collections. Interspeech 2011.

Datasets

VoxLingua107

VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. VoxLingua107 contains data for 107 languages. The total amount of speech in the training set is 6628 hours. The average amount of data per language is 62 hours. See more...

Spoken language identification model trained on this dataset is available in the Huggingface model repository.

ERR2020

ERR2020 is a speech corpus that contains 389 of manually transcribed TV and radio shows from the archive of ERR (Estonian Public Broadcasting). See more...

TalTech Estonian Speech Dataset 1.0

A union of all sharable long-form speech datasets transcribed in our lab, to be used for training speech recognition models. Includes ~1300 h of training data, and additional development and test data. See more...

Department of Software Science

Table of Contents

About the Laboratory of Language Technology

The laboratory focuses on the following topics:

News

ERR Novaator: Uudne teadustöö pakub nippe aariate arusaadavamaks muutmiseks

Tanel Alumäe: Kas üldine tehisintellekt on lähedal ja mida see meie jaoks tähendab

Events

The 36th Finnic Phonetics Symposium

Speech Technology Workshop

Kääriku NLP seminar

People

Past members

Studies

Courses

ITS8040 - Natural Language and Speech Processing

ITS8035 - Speech Processing by Humans and Computers (Kõnetöötlus inimeses ja arvutis)

Master Theses

PhD Studies

Current PhD Students

Graduated Phd Students

Open positions

Postdoc Position in Speech Processing

Projects

Current Projects

EXAI: Estonian Centre of Excellence in Artificial Intelligence (2024−2030)

Estonian and Multilingual Speech Translation (2023-2024)

Estonian Speech Recognition (2018-2022)

Improving the Intelligibility of Sung Text: the Problems and the Scientific Basis (2022-2026)

CEES Centre of Excellence in Estonian Studies (2016-2023)

Text Summarization for Estonian (2022-2024)

Past Projects

Subtitles for Estonian Live Broadcasts using ASR (2021)

Rich Transcription System for the Estonian Parliament (2018-2020)

Publications

2025

2024

2023

2022

2021

2020

2019

2018

Software

End-user applications

Web-based Estonian speech transcription system

Offline subtitling system for Estonian

Live captioning system for Estonian (Kiirkirjutaja)

Estonian speech recognition for Android (Kõnele)

Automatic phonetic segmentation for Estonian speech

Transcribed Speech Archive Browser (TSAB)

Datasets

VoxLingua107

ERR2020

TalTech Estonian Speech Dataset 1.0

Contact