About the Laboratory of Language Technology
The laboratory focuses on the following topics:
- Speech recognition
- Speaker recognition
- Language and accent identification
- Speech corpora
- Phonetics (Estonian prosody and sound system, L2 speech)
- Various subtopics in natural language processing
In addition, we also work on making speech technology more accessible to the general public, by creating end-user oriented speech recognition applications and packaging speech recognition related software components in more accessible form. Our main focus is on Estonian speech recognition, but most of the components are not specific to Estonian. We are firm supporters of open source.
The lab was formerly part of the Institute of Cybernetics. Our old web pages are available here.
News
ERR Novaator: Uudne teadustöö pakub nippe aariate arusaadavamaks muutmiseks
Ooperilauljatele heidetakse ikka ja jälle ette, et nende lauldud tekst jääb kuulajatele arusaamatuks. Eesti Muusika- ja Teatriakadeemia ning Tallinna Tehnikaülikooli teadlaste ühistöö aitab nüüd mõista selle juurpõhjuseid ja visandab strateegiad laulu arusaadavamaks muutmiseks.
Tanel Alumäe: Kas üldine tehisintellekt on lähedal ja mida see meie jaoks tähendab
Tanel Alumäe esines ETAGi teaduspoliitika konverentsil ettekandega "Kas üldine tehisintellekt on lähedal ja mida see meie jaoks tähendab?"
Loading...
People
Past members
- Henry Härm
- Yuta Yanagi
- Andres Käver
- Kunnar Kukk
- Jörgen Valk
- Asadullah
- Ottokar Tilk
- Kairit Sirts
- Rena Nemoto
- Rainer Metsvahi
Studies
Courses
ITS8040 - Natural Language and Speech Processing
- Lecturer: Tanel Alumäe
- Language: English
- Level: Master
- Course details
- Slides from the 2020 Spring Semester
ITS8035 - Speech Processing by Humans and Computers (Kõnetöötlus inimeses ja arvutis)
- Lecturer: Einar Meister
- Language: Estonian
- Level: Master
- Course details
Master Theses
We offer supervision of Master theses on topics that are related to our research.
Here is a selection of already supervised theses:
- Tiia Sildam, Andra Velve, Tanel Alumäe (sup). Kaskaad- ja otsemeetodi võrdlus eesti keele suulise kõne tõlkesüsteemide näitel. 2024.
- Oleksanda Zamana, Tanel Alumäe (sup). Using Pretrained Language Models for Improved Speaker Identification. 2024.
- Priit Käärd, Tanel Alumäe (sup). Weakly Supervised Speaker Identification System Implementation based on Estonian Public Figures. 2023.
- Artem Filipenko, Tanel Alumäe (sup). Data Augmentation Techniques for Advanced End-to-End Keyphrase Extraction from Text. 2023.
- Erko Peterson, Einar Meister (sup). Veebirakendus eesti keele häälduse treeninguks. 2023.
- Ilja Samoilov, Tanel Alumäe (sup). Converting Automatic Transcriptions for Television Programs to Readable Subtitles. 2022. Nominated as one of the best MSc theses of the School of IT, received special prizes.
- Andres Käver, Tanel Alumäe (sup). Efficient Population-based Data Augmentation in Speaker Verification. 2021.
- Henry Härm, Tanel Alumäe (sup). Abstractive Summarization of News Broadcasts for Low Resource Languages. 2021.
- Anu Käver, Tanel Alumäe (sup). Extractive Question Answering for Estonian Language. 2021.
- Fred-Eric Kirsi, Tanel Alumäe (sup). End-to-end Phoneme Segmentation. 2020.
- Jörgen Valk, Tanel Alumäe (sup). Using Web Scraping for Building Spoken Language Identification Models. 2020.
- Hendrik Kivi, Tanel Alumäe (sup). Identification and Localization of Foreign Accent in Speech. 2020.
- Aivo Olev, Tanel Alumäe (sup). Web Application for Authoring Speech Transcriptions. 2019.
- Siim Kaspar Uustalu, Tanel Alumäe (sup). Automated Detection and Sentiment Analysis of Registered Entity Mentions in Estonian Language News Media. 2019. Nominated as one of the best MSc theses of the School of IT.
- Siim Talts, Tanel Alumäe (sup). Analysing Election Candidate Exposure in Broadcast Media Using Weakly Supervised Training. 2019.
- Leo Kristopher Piel, Tanel Alumäe (sup). Speech-based Identification of Children's Gender and Age with Neural Networks. 2018. Nominated as one of the best MSc theses of the School of IT.
- Margus Baumann, Tanel Alumäe (sup). Identification of Foreign Language Accent from Speech Using Neural Networks. 2018.
- Martin Talimets, Tanel Alumäe (sup). End-to-End Speech Recognition for Estonian. 2018.
- Martin Väljaots, Einar Meister (sup). Computer Aided Pronunciation Training Tool for Estonian. 2018.
- Roman Hrushchak, Einar Meister (sup). Visualization of Tongue and Lip Movements. 2018.
- Evgeniia Rykova, Einar Meister (sup). Perceptual and acoustic similarities between the voices of family members: an approach to synthesize a voice based on family-shared F0 characteristics. 2018.
- Thales Santos Ribeiro, Einar Meister (sup). Online Recording of Speech Corpora. 2018.
- Lasha Amashukeli, Einar Meister (sup). Online Perception Experiments. 2018
- Martin Karu, Tanel Alumäe (sup). Weakly Supervised Training of Speaker Identification Models. 2017. Best MSc thesis of the School of IT.
- Anton Malmi, Einar Meister (sup), Pärtel Lippus (sup). Intervokaalse /l/-i kvaliteet ja palatograafia. 2016.
- Rainer Metsvahi, Einar Meister (sup). Kõnesüntees Markovi peitmudelitega. 2012
- Ervin Veber, Tanel Tammet (sup), Einar Meister (sup). HTML dokumentide teisendamine kõnesünteesi keelde. 2007.
PhD Studies
We are looking for talented and hardworking people to do their doctoral studies on topics that are related to our research.
All PhD students at our lab become a member of our team. You will be hired as an Early Stage Researcher, and will get a salary from the university, in addition to the doctoral scholarship. The full compensation depends on the person (better skills and better research output result in better salary), but the minimum is 2000 EUR (after taxes). This is actually about 25% more than the avarage salary in Estonia. Living costs in Estonia are significantly lower than in most Western European countries.
We can admit new PhD students any time.
Several topics in the field of speech recognition, speech translation, speaker recognition and text summarization are possible (the exact topic can be determined based on the student and her/his interests and skills).
Requirements:
- Interest in scientific research (and understanding about what research is)
- Masters degree in computer science (or a related field)
- Good background in mathematics, statistics, probability theory and linear algebra
- Good background in some subfield of speech technology (e.g., speech recogniton, speaker recogniton)
- Knowledge of modern approaches in machine learning (incl. deep learning)
- Excellent programming skills (Python, C++, bash scripting)
- Experience with modern deep learning toolkits (Pytorch, Tensorflow)
- Excellent academic writing skills
- Previous academic or industry experience in speech or language processing is beneficial (but not strictly needed)
Current PhD Students
- Aivo Olev, supervisor Tanel Alumäe. Speech processing of non-native speech.
- Joonas Kalda, supervisor Tanel Alumäe. New algorithms in speaker segmentation and identification.
- Martin Verrev, supervisor Tanel Alumäe and Tanel Tammet. Knowledge extraction from natural language using both machine learning and common sense knowledge systems.
Graduated Phd Students
- Anton Malmi, supervisor Pärtel Lippus (University of Tartu) and Einar Meister. Vene emakeelega keelejuhtide eesti keele palatalisatsiooni akustika, taju ja produktsioon. 2022
- Ottokar Tilk, supervisor Tanel Alumäe and Leo Võhandu. Neural Networks for Language Modeling and Related Tasks in Low-Resourced Domains and Languages. 2018.
Loading...
Projects
Current Projects
EXAI: Estonian Centre of Excellence in Artificial Intelligence (2024−2030)
EXAI focuses on advancing innovative methodologies for:
- leveraging foundation models in building efficient and trustworthy analysis and prediction systems;
- implementing control mechanisms and guardrails to ensure that the advanced AI systems follow their specification;
- adapting and enhancing AI systems for improved performance in targeted application contexts;
- achieving end-to-end security and privacy assurance of AI systems.
EXAI is a joint project by Tartu University (lead), Tallinn University of Technology and Cybernetica.
Estonian and Multilingual Speech Translation (2023-2024)
The aim of the project is to develop solutions for speech translation, or machine translation of speech signals, and to explore optimal approaches to this task. During our work, we will test different approaches, including end-to-end (directly translating a speech input into a text translation output) and pipeline approaches (separate speech recognition, machine translation and speech synthesis), we will also add Estonian speech input and text output to existing speech translation benchmarks, and will create a working server solution to use project results through API and integrate into the MS Teams/Zoom communication programs. We will focus on Estonian <-> English/Russian translation directions; if possible, we will also add German, Finnish and Latvian. The results of the project will be shared as open data and open source software. Research results will be published in scientific conferences and journals.
Estonian Speech Recognition (2018-2022)
Speech recognition is a technology for converting natural speech to text. It is used for dictating documents and automatic transcription of speech recordings. Estonian speech recognition has significantly improved during the recent years. On broadcast speech data, a word error rate of 10% has been reached. The improvements have been made possible due to our work on collecting and transcribing new speech corpora and recent advancements in deep neural networks in machine learning. The goal of this project is to further improve the state of Estonian speech recognition. We focus on the kind of speech data which currently causes many recognition errors: noisy data, multi-speaker meetings, speech from seniors, speech with high code-switching content. To fulfill this goal, we will improve the currently used speech recognition methods and algorithms and transcribe new speech corpora. We will also improve the flexibility and usability of our open-source Estonian speech recognition systems.
Improving the Intelligibility of Sung Text: the Problems and the Scientific Basis (2022-2026)
Joint project with the Estonian Academy of Music and Theatre
We expect vocalists to sing with intelligible text, but singers also have to obey the constraints which are dictated by the music. Thus, the methods which are used to enhance diction in speaking may not necessarily be fully applicable to singing. The standpoints of singers regarding how to achieve clear pronunciation are controversial, and investigations on the subject are scarce. This project aims to create a scientific basis for the further development of strategies to achieve a good balance between intelligibility and the requirements of the music, such as cantilena and phrasing, when singing in various acoustics and with the presence of the accompaniment. The research method includes the acoustical analysis of the vocal performances and carrying out perception tests of vocal stimuli with systematically modified phonetic and musical parameters. The results are applicable to voice training and could help text writers and composers to reduce problems of text intelligibility.
CEES Centre of Excellence in Estonian Studies (2016-2023)
The Centre of Excellence in Estonian Studies (CEES) is supported by the European Union through the European Regional Development Fund in the years 2016–2023. The centre assembles 15 research groups with more than 60 researchers and more than 50 post-graduate students from the different research institutions (including our lab).
The Centre of Excellence in Estonian Studies research is connected with focal phenomena of
- the Estonian society and culture, some of them of carrying emblematic connotation: the Estonian language itself and its wide array of sublanguages and dialects, unique regilaul-verse, song festivals and the choir movement, original poetic culture, sacred sites;
- the Estonian diasporas and ethnic groups (primarily Estonian Russians, Old Believers, Finno-Ugric minorities, neighboring and contact groups).
- global cultural trends and local variations of global cultural phenomena (epic(s), humour, mythology, etc.), reinvented and modenized forms (e.g. punk song festivals),
- contemporary culture, incl transmedia texts and behaviour.
Text Summarization for Estonian (2022-2024)
The goal of the project is to create a system for generating abstractive summaries for Estonian texts. Based on a (long) input text, the system will generate a short and concise text that will reproduce the most important information in the input text, without necessarily using the sentences and expressions present in the input text.
During the project, a large corpus of Estonian document-summary pairs will be annotated, and experiments with different end-to-end summarization models will be performed. Special attention will be given to transfer learning and multilingual
models that are inevitable under low-resource constraints. A method for finetuning the general model to a certain domain will be performed.
The implemented models, code and datasets will be distributed under open source licenses. A graphical user interface and a REST API will be implemented for the summarization system and it will also be distributed as a Docker image.
Past Projects
Subtitles for Estonian Live Broadcasts using ASR (2021)
The goal of this project is to develop technology that can produce real-time subtitles for live TV broadcasts (such as news, debates), press conferences and other live events, using speech recognition.
The subtitling system is available here. It is currently used by the Estonian Parliament (Riigikogu) for live captioning parliament sessions (available here, if a session is live), and by Estonian Public Broadcasting (ERR) for realtime captioning of live news programmes and talk shows. News story (in Estonian): https://diktor.geenius.ee/rubriik/tele/etv-otsesaadetele-saab-tanasest-valida-automaatsubtiitreid/
Rich Transcription System for the Estonian Parliament (2018-2020)
Within this project, we implemented a system for producing transcripts for the Estonian Parliament using speech recognition, automatic punctuation and speaker identification technology. The system outputs fully punctuated and "nicely" formatted text and identifies members of the parliament based on their voice. The system has been deployed to production. News story (in Estonian): https://www.err.ee/1134953/eesti-viimaste-stenografistide-too-vottis-ule-robot
Publications
- Tiia Sildam, Andra Velve, Tanel Alumäe. Finetuning End-to-End Models for Estonian Conversational Spoken Language Translation. LoResMT Workshop @ ACL 2024.
- Oleksandra V Zamana, Priit Käärd, Tanel Alumäe. Using Pretrained Language Models for Improved Speaker Identification. Speaker Odyssey 2024.
- Joonas Kalda, Clément Pagés, Ricard Marxer, Tanel Alumäe, Hervé Bredin. PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings. arXiv:2403.02288, Speaker Odyssey 2024.
- Henry Härm, Tanel Alumäe. TalTech Systems for the Odyssey 2024 Emotion Recognition Challenge. Speaker Odyssey 2024.
- Daniil Rõbnikov, Tanel Alumäe. Single-Stage TTS with Adapted Vocoder and Cross-Attenton: TalTech Systems for the LIMMITS'24 Challenge. ICASSP 2024.
- Alumäe, T.; Kong, J.; Robnikov, D. Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech Systems for the MADASR 2023 Challenge. ASRU 2023.
- Vurma, A.; Meister, E.; Meister, L.; Ross, J.; Raju, M.; Kala V.; Dede, T. The intensities of vowels and plosive bursts and their impact on text intelligibility in singing. Journal of the Acoustical Society of America (Vol. 154, Issue 4).
- Alumäe, T.; Kalda, J.; Bode, K.; Kaitsa, M. Automatic Closed Captioning for Estonian Live Broadcasts. NoDaLiDa 2023.
- Alumäe, T., Kukk, K., Le, V.-B., Barras, C., Messaoudi, A., Ben Kheder, W. Exploring the Impact of Pretrained Models and Web-Scraped Data for the 2022 NIST Language Recognition Evaluation. Interspeech 2023.
- Meister, E., Vurma, A., Dede, T., Kala, V., Meister, L., Raju, M., Ross, J. The Impact of the Intensity Ratio between Vowels and Voiceless Plosives on the Intelligibility of Sung Text. ICPhS 2023.
- Meister, E., Meister, L. Developmental Changes of Fundamental Frequency and Temporal Characteristics in Estonian Adolescent Speech. ICPhS 2023.
- Ebrahimi, A., ..., Alumäe, T. et al. Findings of the Second AmericasNLP Competition on Speech-to-Text Translation. NeurIPS 2022 Competition Track. PMLR, 2022.
- Vurma, A.; Ross, J.; Meister, E.; Meister, L.; Raju, M.; Kala, V.; Dede, T. Intelligibility of plosives in operatic singing. ICMPC/APSCOM, 2023.
- Malmi, Anton; Lippus, Pärtel; Meister, Einar. Spectral and temporal properties of Estonian palatalization. Journal of the International Phonetic Association. 2022 May 13:1-26.
- Kalda, Joonas; Alumäe, Tanel. Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech. Speaker Odyssey 2022.
- Alumäe, Tanel; Kukk, Kunnar. Pretraining Approaches for Spoken Language Recognition: TalTech Submission to the OLR 2021 Challenge. Speaker Odyssey 2022.
- Kukk, Kunnar; Alumäe, Tanel. Improving Language Identification of Accented Speech. Interspeech 2022.
- Malmi, Anton; Lippus, Pärtel; Meister, Einar. Articulatory properties of Estonian palatalization by Russian L1 speakers. Eesti Ja Soome-Ugri Keeleteaduse Ajakiri. Journal of Estonian and Finno-Ugric Linguistics, 13(2), 79–118.
- Härm, Henry; Alumäe, Tanel. Abstractive Summarization of Broadcast News Stories for Estonian. Baltic HLT 2022.
- Olev, Aivo; Alumäe, Tanel. Estonian Speech Recognition and Transcription Editing Service. Baltic HLT 2022.
- Meister, Einar; Meister, Lya. Estonian Elderly Speech Corpus – Design, Collection and Preliminary Acoustic Analysis. Baltic HLT 2022.
- Yuta Yanagi, Ryohei Orihara, Yasuyuki Tahara, Yuichi Sei, Tanel Alumäe, Akihiko Ohsuga. Inspection of The Classifying Performance of The Deepfake Voices by The Latest Text-to-Speech Models. ICMECE 2022.
- Valk, Jörgen; Alumäe, Tanel. VoxLingua107: a Dataset for Spoken Language Recognition. IEEE SLT 2021 (video presentation).
- Alumäe, Tanel; Jiaming Kong. Combining Hybrid and End-to-End Approaches for the OpenASR20 Challenge. Interspeech 2021.
- Leier, Mairo; Riid, Andri;Alumäe, Tanel; Reinsalu, Uljana;Pihlak, René; Udal, Andres; Heinsar, Risto; Vainküla, Sven. Smart Elevator with Unsupervised Learning for Visitor Profiling and Personalised Destination Prediction. IEEE Conference on Cognitive and Computational Aspects of Situation Management (CogSIMA) 2021.
- Meister, Einar; Meister, Lya. Developmental Changes of Vowel Acoustics in Adolescents. Interspeech 2021.
- Tavi, Lauri; Tomi Kinnunen; Einar Meister; Rosa González-Hautamäki; Anton Malmi. Articulation During Voice Disguise: A Pilot Study. SPECOM 2021.
- Alumäe, Tanel; Valk, Jörgen. The TalTech Systems for the Short-duration Speaker Verification Challenge 2020. Interspeech 2020.
- Lancucki, Adrian; Chorowski, Jan; Sanchez, Guillaume; Marxer, Ricard; Chen, Nanxin; Dolfing, Hans; Khurana, Sameer; Alumae, Tanel; Laurent, Antoine. Robust training of vector quantized bottleneck models. IJCNN 2020.
- Talts, Siim; Alumäe, Tanel. Analyzing candidate speaking time in Estonian parliament election debates. DHN 2020.
- Meister, Einar; Meister, Lya. Eesti laste kõne II. Vokaalide akustiline analüüs. Keel ja Kirjandus, 62 (4), 282−295.
- Meister, Einar; Meister, Lya. Production of Estonian vowels by Finnish speakers. Eesti ja soome-ugri keeleteaduse ajakiri / Journal of Estonian and Finno-Ugric Linguistics, 10 (1), 129−143.
- Alumäe, Tanel; Meister, Einar. Kõnetehnoloogia- ja foneetikauuringutest Tallinna Tehnikaülikoolis. Teadusmõte Eestis (X). Tehnikateadused III (177−189).
- Tavi, Lauri; Alumäe, Tanel; Werner, Stefan. Recognition of creaky voice from emergency calls. Interspeech 2019.
- Chorowski, Jan K.; Laurent, Antoine; Chen, Nanxin; Dolfing, Hans J.G.A.; Łańcucki, Adrian; Sanchez, Guillaume; Khurana, Sameer; Alumäe, Tanel; Laurent, Antoine. Unsupervised neural segmentation and clustering for unit discovery in sequential data. Perception as generative reasoning : Structure, Causality, Probability, NeurIPS 2019 workshop.
- Paats, A.; Alumäe, T.; Meister, E.; Fridolin, I. Retrospective analysis of clinical performance of an Estonian speech recognition system for radiology: effects of different acoustic and language models. Journal of Digital Imaging, 31 (5). Best paper of the year of the Scool of IT.
- Asadullah; Alumäe, Tanel. Data augmentation and teacher-student training for LF-MMI based robust speech recognition. TSD 2018. Best student paper award.
- Karu, Martin; Alumäe, Tanel. Weakly supervised training of speaker identification models. Odyssey 2018 The Speaker and Language Recognition Workshop.
- Alumäe, Tanel; Tilk, Ottokar; Asadullah. Advanced rich transcription system for Estonian speech. Baltic HLT 2018.
- Piel, Leo Kristopher; Alumäe, Tanel. Speech-Based Identification of Children’s Gender and Age with Neural networks. Baltic HLT 2018.
- Alumäe, Tanel. Training speaker recognition models with recording-level labels. SLT 2018.
- Kelli, Aleksei; Vider, Kadri; Kull, Irene; Siil, Triin; Lindén, Krister; Tavast, Arvi; Värv, Age; Ginter, Carri; Meister, Einar. Keeleressursside loomise ja kasutamisega seonduvaid isikuandmete kaitse küsimusi. Eesti Rakenduslingvistika Ühingu aastaraamat.
- Meister, Einar. Keeletehnoloogia ja eesti keel. Raag, Raimo; Valge, Jüri (Toim.). Sõida tasa üle silla : raamat eesti keelest ja meelest (223−234). Tallinn; Tartu: EKSA.
Software
End-user applications
Web-based Estonian speech transcription system
Web applications that allows to transcribe long speech recordings, such as interviews, conference speeches. It uses our latest Estonian speech recognition technology. Also does automatic punctuation and identifies Estonian public figures based on their voice.
The application can be used via an old interface (transcripts are sent to e-mail) or new fully web-based interface that also provides web-based post-editing capabilities.
Source code:
- Transcription system: https://github.com/alumae/kaldi-offline-transcriber
- REST API to the transcription system: https://bitbucket.org/alumae/kaldi-offline-transcriber-web
Live captioning system for Estonian (Kiirkirjutaja)
Kiirkirjutaja is a realtime speech-to-text tool designed for real-time subtitling of Estonian TV broadcasts and streaming media. It consists of the following components: speech activity detector, online speaker change detector , speech recognition, punctuator and words-to-numbers converter.
Links:
Estonian speech recognition for Android (Kõnele)
For many years, Estonian speech recognition was natively not available for Android. Therefore, we developed an Android application Kõnele. Kõnele works as a virtual keyboard: when the user wants to dictate text in any application (e.g. GMail), she/he can switch to the Kõnele keyboard and use speech recognition to input text.
Kõnele uses client-server based speech recognition: Kõnele records user's speech and sends it to the lab's server, and the server sends recognized text back to the user's device.
Links:
- Kõnele in Google Play store
- Kõnele source code and documentation
- Full-duplex real-time speech recognition server based on Kaldi: https://github.com/alumae/kaldi-gstreamer-server
Automatic phonetic segmentation for Estonian speech
The web application uses speech recognition models to generate phoneme boundaries for Estonian speech, based on a provided orthographic transcript. It is primarily used by phoneticians to generate initial phonetic segmentations for phonetic corpus annotation.
Links:
Transcribed Speech Archive Browser (TSAB)
An interface to a large collection of automatically transcribed Estonian radio broadcasts and some podcasts. Serves mostly as a showcase of our Estonian speech recognition technology.
Links:
- Application: http://bark.phon.ioc.ee/tsab
- Source code: https://github.com/alumae/tsab
- Publication: Tanel Alumäe, Ahti Kitsik. TSAB - web interface for transcribed speech collections. Interspeech 2011.
Datasets
VoxLingua107
VoxLingua107 is a speech dataset for training spoken language identification models. The dataset consists of short speech segments automatically extracted from YouTube videos and labeled according the language of the video title and description, with some post-processing steps to filter out false positives. VoxLingua107 contains data for 107 languages. The total amount of speech in the training set is 6628 hours. The average amount of data per language is 62 hours. See more...
Spoken language identification model trained on this dataset is available in the Huggingface model repository.
ERR2020
ERR2020 is a speech corpus that contains 389 of manually transcribed TV and radio shows from the archive of ERR (Estonian Public Broadcasting). See more...
TalTech Estonian Speech Dataset 1.0
A union of all sharable long-form speech datasets transcribed in our lab, to be used for training speech recognition models. Includes ~1300 h of training data, and additional development and test data. See more...