default search action
Tommi Jauhiainen
Person information
- affiliation: University of Helsinki, Finland
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c25]Marcos Zampieri, Kai North, Tommi Jauhiainen, Mariano Felice, Neha Kumari, Nishant Nair, Yash Mahesh Bangera:
Language Variety Identification with True Labels. LREC/COLING 2024: 10100-10109 - 2023
- [j3]Krister Lindén, Tommi Jauhiainen, Sam Hardwick:
FinnSentiment: a Finnish social media corpus for sentiment polarity annotation. Lang. Resour. Evaluation 57(2): 581-609 (2023) - [c24]Heidi Jauhiainen, Tommi Jauhiainen:
Automatic Word Segmentation for Egyptian Hieroglyphic Texts. DH 2023 - [c23]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
Tuning HeLI-OTS for Guarani-Spanish Code Switching Analysis. IberLEF@SEPLN 2023 - [c22]Noëmi Aepli, Çagri Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubesic, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri:
Findings of the VarDial Evaluation Campaign 2023. VarDial@EACL 2023: 251-261 - [e2]Yves Scherrer, Tommi Jauhiainen, Nikola Ljubesic, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri:
Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial@EACL 2023, Dubrovnik, Croatia, May 5, 2023. Association for Computational Linguistics 2023, ISBN 978-1-959429-50-0 [contents] - [i8]Marcos Zampieri, Kai North, Tommi Jauhiainen, Mariano Felice, Neha Kumari, Nishant Nair, Yash Bangera:
Language Variety Identification with True Labels. CoRR abs/2303.01490 (2023) - [i7]Noëmi Aepli, Çagri Çöltekin, Rob van der Goot, Tommi Jauhiainen, Mourhaf Kazzaz, Nikola Ljubesic, Kai North, Barbara Plank, Yves Scherrer, Marcos Zampieri:
Findings of the VarDial Evaluation Campaign 2023. CoRR abs/2305.20080 (2023) - 2022
- [c21]Ute Dieckmann, Mietta Lennes, Jussi Piitulainen, Jyrki Niemi, Erik Axelson, Tommi Jauhiainen, Krister Lindén:
The Pipeline for Publishing Resources in the Language Bank of Finland. CLARIN Annual Conference 2022: 33-43 - [c20]Tommi Jauhiainen, Jussi Piitulainen, Erik Axelson, Krister Lindén:
Language Identification as part of the Text Corpus Creation Pipeline at the Language Bank of Finland. DHNB 2022: 251-259 - [c19]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
HeLI-OTS, Off-the-shelf Language Identifier for Text. LREC 2022: 3912-3922 - [c18]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
Optimizing Naive Bayes for Arabic Dialect Identification. WANLP@EMNLP 2022: 409-414 - 2021
- [c17]Bharathi Raja Chakravarthi, Mihaela Gaman, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubesic, Niko Partanen, Ruba Priyadharshini, Christoph Purschke, Eswari Rajagopal, Yves Scherrer, Marcos Zampieri:
Findings of the VarDial Evaluation Campaign 2021. VarDial@EACL 2021: 1-11 - [c16]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
Naive Bayes-based Experiments in Romanian Dialect Identification. VarDial@EACL 2021: 76-83 - [c15]Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri:
Comparing Approaches to Dravidian Language Identification. VarDial@EACL 2021: 120-127 - [e1]Marcos Zampieri, Preslav Nakov, Nikola Ljubesic, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen:
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, VarDial@EACL 2021, Kiyv, Ukraine, April 20, 2021. Association for Computational Linguistics 2021, ISBN 978-1-954085-12-1 [contents] - [i6]Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri:
Comparing Approaches to Dravidian Language Identification. CoRR abs/2103.05552 (2021) - 2020
- [c14]Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén:
Building Web Corpora for Minority Languages. WAC@LREC 2020: 23-32 - [c13]Matias Lindgren, Tommi Jauhiainen, Mikko Kurimo:
Releasing a Toolkit and Comparing the Performance of Language Embeddings Across Various Spoken Language Identification Datasets. INTERSPEECH 2020: 467-471 - [c12]Mihaela Gaman, Dirk Hovy, Radu Tudor Ionescu, Heidi Jauhiainen, Tommi Jauhiainen, Krister Lindén, Nikola Ljubesic, Niko Partanen, Christoph Purschke, Yves Scherrer, Marcos Zampieri:
A Report on the VarDial Evaluation Campaign 2020. VarDial@COLING 2020: 1-14 - [c11]Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén:
Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora. VarDial@COLING 2020: 173-185 - [c10]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
Experiments in Language Variety Geolocation and Dialect Identification. VarDial@COLING 2020: 220-231 - [i5]Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén:
Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus. CoRR abs/2008.12169 (2020) - [i4]Krister Lindén, Tommi Jauhiainen, Sam Hardwick:
FinnSentiment - A Finnish Social Media Corpus for Sentiment Polarity Annotation. CoRR abs/2012.02613 (2020)
2010 – 2019
- 2019
- [j2]Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén:
Automatic Language Identification in Texts: A Survey. J. Artif. Intell. Res. 65: 675-782 (2019) - [j1]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
Language model adaptation for language and dialect identification of text. Nat. Lang. Eng. 25(5): 561-583 (2019) - [i3]Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, Krister Lindén:
Language and Dialect Identification of Cuneiform Texts. CoRR abs/1903.01891 (2019) - [i2]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
Language Model Adaptation for Language and Dialect Identification of Text. CoRR abs/1903.10915 (2019) - 2018
- [c9]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
Iterative Language Model Adaptation for Indo-Aryan Language Identification. VarDial@COLING 2018 2018: 66-75 - [c8]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles. VarDial@COLING 2018 2018: 137-144 - [c7]Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén:
HeLI-based Experiments in Swiss German Dialect Identification. VarDial@COLING 2018 2018: 254-262 - [i1]Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén:
Automatic Language Identification in Texts: A Survey. CoRR abs/1804.08186 (2018) - 2017
- [c6]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
Evaluation of language identification methods using 285 languages. NODALIDA 2017: 183-191 - [c5]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
Evaluating HeLI with Non-Linear Mappings. VarDial 2017: 102-108 - 2016
- [c4]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
HeLI, a Word-Based Backoff Method for Language Identification. VarDial@COLING 2016: 153-162 - 2015
- [c3]Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen:
Language Set Identification in Noisy Synthetic Multilingual Documents. CICLing (1) 2015: 633-643
2000 – 2009
- 2002
- [c2]Kristiina Jokinen, Antti Kerminen, Tommi Jauhiainen, Jukka Kuusisto, Graham Wilcock, Markku Turunen, Jaakko Hakulinen, Krista Lagus:
Adaptive Dialogue Systems - Interaction with Interact. SIGDIAL Workshop 2002: 64-73 - 2001
- [c1]Tommi Jauhiainen:
Using existing written language analyzers in understanding natural spoken Finnish. NODALIDA 2001
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-08-25 19:15 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint