default search action
Benoît Sagot
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [c141]Wissam Antoun, Benoît Sagot, Djamé Seddah:
From Text to Source: Results in Detecting Large Language Model-Generated Content. LREC/COLING 2024: 7531-7543 - [c140]Lydia Nishimwe, Benoît Sagot, Rachel Bawden:
Making Sentence Embeddings Robust to User-Generated Content. LREC/COLING 2024: 10984-10998 - [c139]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
On the Scaling Laws of Geographical Representation in Language Models. LREC/COLING 2024: 12416-12422 - [c138]Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden:
When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages. LREC/COLING 2024: 17544-17556 - [c137]Benoît Sagot:
Mieux comprendre les modèles de langue et les textes qu'ils produisent. CORIA 2024 - [c136]Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot:
Anisotropy Is Inherent to Self-Attention in Transformers. EACL (1) 2024: 35-48 - [c135]Armel Zebaze, Benoît Sagot, Rachel Bawden:
Tree of Problems: Improving structured problem solving with compositionality. EMNLP 2024: 18028-18047 - [c134]Nathan Godey, Éric Villemonte de la Clergerie, Benoît Sagot:
Headless Language Models: Learning without Predicting with Contrastive Weight Tying. ICLR 2024 - [c133]You Zuo, Kim Gerdes, Éric de la Clergerie, Benoît Sagot:
PatentEval: Understanding Errors in Patent Generation. NAACL-HLT 2024: 2687-2710 - [i54]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
Anisotropy Is Inherent to Self-Attention in Transformers. CoRR abs/2401.12143 (2024) - [i53]Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussà, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoît Sagot, Emmanuel Dupoux:
SpiRit-LM: Interleaved Spoken and Written Language Model. CoRR abs/2402.05755 (2024) - [i52]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
On the Scaling Laws of Geographical Representation in Language Models. CoRR abs/2402.19406 (2024) - [i51]Lydia Nishimwe, Benoît Sagot, Rachel Bawden:
Making Sentence Embeddings Robust to User-Generated Content. CoRR abs/2403.17220 (2024) - [i50]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck. CoRR abs/2404.07647 (2024) - [i49]You Zuo, Kim Gerdes, Éric Villemonte de la Clergerie, Benoît Sagot:
PatentEval: Understanding Errors in Patent Generation. CoRR abs/2406.06589 (2024) - [i48]Matthieu Futeral, Armel Zebaze, Pedro Ortiz Suarez, Julien Abadji, Rémi Lacroix, Cordelia Schmid, Rachel Bawden, Benoît Sagot:
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus. CoRR abs/2406.08707 (2024) - [i47]Matthieu Futeral, Cordelia Schmid, Benoît Sagot, Rachel Bawden:
Towards Zero-Shot Multimodal Machine Translation. CoRR abs/2407.13579 (2024) - [i46]Armel Zebaze, Benoît Sagot, Rachel Bawden:
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation. CoRR abs/2408.00397 (2024) - [i45]Rasul Dent, Juliette Janes, Thibault Clérice, Pedro Ortiz Suarez, Benoît Sagot:
Molyé: A Corpus-based Approach to Language Contact in Colonial France. CoRR abs/2408.04554 (2024) - [i44]Armel Zebaze, Benoît Sagot, Rachel Bawden:
Tree of Problems: Improving structured problem solving with compositionality. CoRR abs/2410.06634 (2024) - [i43]Wissam Antoun, Francis Kulumba, Rian Touchent, Éric de la Clergerie, Benoît Sagot, Djamé Seddah:
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection. CoRR abs/2411.08868 (2024) - [i42]Thibault Clérice, Juliette Janes, Hugo Scheithauer, Sarah Bénière, Florian Cafiero, Laurent Romary, Simon Gabay, Benoît Sagot:
Diachronic Document Dataset for Semantic Layout Analysis. CoRR abs/2411.10068 (2024) - 2023
- [j16]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. Trans. Assoc. Comput. Linguistics 11: 250-266 (2023) - [c132]Wissam Antoun, Benoît Sagot, Djamé Seddah:
Data-Efficient French Language Modeling with CamemBERTa. ACL (Findings) 2023: 5174-5185 - [c131]Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Benoît Sagot, Rachel Bawden:
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation. ACL (1) 2023: 5394-5413 - [c130]Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, Jingfei Du, Ann Lee, Vedanuj Goswami, Changhan Wang, Juan Pino, Benoît Sagot, Holger Schwenk:
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations. ACL (1) 2023: 16251-16269 - [c129]Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux:
Generative Spoken Language Model based on continuous word-sized audio tokens. EMNLP 2023: 3008-3028 - [c128]Robin Algayres, Pablo Diego-Simon, Benoît Sagot, Emmanuel Dupoux:
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words. EMNLP (Findings) 2023: 12103-12112 - [c127]Valentin Taillandier, Dieuwke Hupkes, Benoît Sagot, Emmanuel Dupoux, Paul Michel:
Neural Agents Struggle to Take Turns in Bidirectional Emergent Communication. ICLR 2023 - [c126]Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot:
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer. INTERSPEECH 2023: 32-36 - [c125]Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah:
Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect? CORIA-TALN (1) 2023: 14-27 - [c124]Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden:
Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects. CORIA-TALN (1) 2023: 28-42 - [c123]You Zuo, Benoît Sagot, Kim Gerdes, Houda Mouzoun, Samir Ghamri-Doudane:
Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons. CORIA-TALN (1) 2023: 349-365 - [c122]Rachel Bawden, Benoît Sagot:
RoCS-MT: Robustness Challenge Set for Machine Translation. WMT 2023: 198-216 - [i41]Niyati Bafna, Cristina España-Bonet, Josef van Genabith, Benoît Sagot, Rachel Bawden:
A Simple Method for Unsupervised Bilingual Lexicon Induction for Data-Imbalanced, Closely Related Language Pairs. CoRR abs/2305.14012 (2023) - [i40]Wissam Antoun, Benoît Sagot, Djamé Seddah:
Data-Efficient French Language Modeling with CamemBERTa. CoRR abs/2306.01497 (2023) - [i39]Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah:
Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect? CoRR abs/2306.05871 (2023) - [i38]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
Is Anisotropy Inherent to Transformers? CoRR abs/2306.07656 (2023) - [i37]Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot:
SONAR: Sentence-Level Multimodal and Language-Agnostic Representations. CoRR abs/2308.11466 (2023) - [i36]Nathan Godey, Éric de la Clergerie, Benoît Sagot:
Headless Language Models: Learning without Predicting with Contrastive Weight Tying. CoRR abs/2309.08351 (2023) - [i35]Wissam Antoun, Benoît Sagot, Djamé Seddah:
From Text to Source: Results in Detecting Large Language Model-Generated Content. CoRR abs/2309.13322 (2023) - [i34]Paul-Ambroise Duquenne, Holger Schwenk, Benoît Sagot:
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer. CoRR abs/2310.03724 (2023) - [i33]Robin Algayres, Yossi Adi, Tu Anh Nguyen, Jade Copet, Gabriel Synnaeve, Benoît Sagot, Emmanuel Dupoux:
Generative Spoken Language Model based on continuous word-sized audio tokens. CoRR abs/2310.05224 (2023) - [i32]Robin Algayres, Pablo Diego-Simon, Benoît Sagot, Emmanuel Dupoux:
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words. CoRR abs/2310.05235 (2023) - 2022
- [j15]Tu Anh Nguyen, Benoît Sagot, Emmanuel Dupoux:
Are Discrete Units Necessary for Spoken Language Modeling? IEEE J. Sel. Top. Signal Process. 16(6): 1415-1423 (2022) - [j14]Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Balli, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi:
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets. Trans. Assoc. Comput. Linguistics 10: 50-72 (2022) - [j13]Robin Algayres, Tristan Ricoul, Julien Karadayi, Hugo Laurençon, Mohamed Salah Zaïem, Abdelrahman Mohamed, Benoît Sagot, Emmanuel Dupoux:
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon. Trans. Assoc. Comput. Linguistics 10: 1051-1065 (2022) - [c121]Clémentine Fourrier, Benoît Sagot:
Probing Multilingual Cognate Prediction Models. ACL (Findings) 2022: 3786-3801 - [c120]Nathan Godey, Roman Castagné, Éric de la Clergerie, Benoît Sagot:
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling. EMNLP (Findings) 2022: 2859-2870 - [c119]Paul-Ambroise Duquenne, Hongyu Gong, Benoît Sagot, Holger Schwenk:
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation. EMNLP 2022: 5794-5806 - [c118]Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux:
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning. INTERSPEECH 2022: 2123-2127 - [c117]Loïc Grobol, Mathilde Regnault, Pedro Javier Ortiz Suárez, Benoît Sagot, Laurent Romary, Benoît Crabbé:
BERTrade: Using Contextual Embeddings to Parse Old French. LREC 2022: 1104-1113 - [c116]Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot:
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases. LREC 2022: 1651-1664 - [c115]Rachel Bawden, Jonathan Poinhos, Eleni Kogkitsidou, Philippe Gambette, Benoît Sagot, Simon Gabay:
Automatic Normalisation of Early Modern French. LREC 2022: 3354-3366 - [c114]Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot:
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. LREC 2022: 3367-3374 - [c113]Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. LREC 2022: 4344-4355 - [c112]Thibault Charmet, Inès Cherichi, Matthieu Allain, Urszula Czerwinska, Amaury Fouret, Benoît Sagot, Rachel Bawden:
Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France's Court of Cassation Rulings. LREC 2022: 4754-4766 - [c111]Simon Gabay, Pedro Javier Ortiz Suárez, Rachel Bawden, Alexandre Bartz, Philippe Gambette, Benoît Sagot:
Le projet FREEM : ressources, outils et enjeux pour l'étude du français d'Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French). TALN-RECITAL 2022: 154-165 - [c110]Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah:
Quand être absent de mBERT n'est que le commencement : Gérer de nouvelles langues à l'aide de modèles de langues multilingues (When Being Unseen from mBERT is just the Beginning : Handling New Languages With Multilingual Language Models). TALN-RECITAL 2022: 450-451 - [c109]Jesujoba Alabi, Lydia Nishimwe, Benjamin Muller, Camille Rey, Benoît Sagot, Rachel Bawden:
Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation? WMT 2022: 233-243 - [i31]Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. CoRR abs/2201.06642 (2022) - [i30]Simon Gabay, Pedro Javier Ortiz Suárez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot:
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. CoRR abs/2202.09452 (2022) - [i29]Tu Anh Nguyen, Benoît Sagot, Emmanuel Dupoux:
Are discrete units necessary for Spoken Language Modeling? CoRR abs/2203.05936 (2022) - [i28]Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, Ali Elkahky, Paden Tomasello, Robin Algayres, Benoît Sagot, Abdelrahman Mohamed, Emmanuel Dupoux:
Generative Spoken Dialogue Language Modeling. CoRR abs/2203.16502 (2022) - [i27]Robin Algayres, Adel Nabli, Benoît Sagot, Emmanuel Dupoux:
Speech Sequence Embeddings using Nearest Neighbors Contrastive Learning. CoRR abs/2204.05148 (2022) - [i26]Paul-Ambroise Duquenne, Hongyu Gong, Benoît Sagot, Holger Schwenk:
T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation. CoRR abs/2205.12216 (2022) - [i25]Yu Lu Liu, Rachel Bawden, Thomas Scaliom, Benoît Sagot, Jackie Chi Kit Cheung:
MaskEval: Weighted MLM-Based Evaluation for Text Summarization and Simplification. CoRR abs/2205.12394 (2022) - [i24]Robin Algayres, Tristan Ricoul, Julien Karadayi, Hugo Laurençon, Mohamed Salah Zaïem, Abdelrahman Mohamed, Benoît Sagot, Emmanuel Dupoux:
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon. CoRR abs/2206.11332 (2022) - [i23]Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, Jingfei Du, Ann Lee, Vedanuj Goswami, Changhan Wang, Juan Miguel Pino, Benoît Sagot, Holger Schwenk:
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations. CoRR abs/2211.04508 (2022) - [i22]Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilic, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, et al.:
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. CoRR abs/2211.05100 (2022) - [i21]Nathan Godey, Roman Castagné, Éric de la Clergerie, Benoît Sagot:
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling. CoRR abs/2212.07284 (2022) - [i20]Matthieu Futeral, Cordelia Schmid, Ivan Laptev, Benoît Sagot, Rachel Bawden:
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation. CoRR abs/2212.10140 (2022) - 2021
- [j12]Yacine Mohamed Idir, Olivier Orfila, Vincent Judalet, Benoît Sagot, Patrice Chatellier:
Mapping Urban Air Quality from Mobile Sensors Using Spatio-Temporal Geostatistics. Sensors 21(14): 4717 (2021) - [c108]Clémentine Fourrier, Rachel Bawden, Benoît Sagot:
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? ACL/IJCNLP (Findings) 2021: 847-861 - [c107]Arij Riabi, Benoît Sagot, Djamé Seddah:
Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios? W-NUT 2021: 423-436 - [c106]Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah:
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT. EACL 2021: 2214-2231 - [c105]Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano:
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering. EMNLP (1) 2021: 7016-7030 - [c104]Benjamin Muller, Antonios Anastasopoulos, Benoît Sagot, Djamé Seddah:
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models. NAACL-HLT 2021: 448-462 - [i19]Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah:
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT. CoRR abs/2101.11109 (2021) - [i18]Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Balli, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi:
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets. AfricaNLP 2021 - [i17]Thomas Scialom, Louis Martin, Jacopo Staiano, Éric Villemonte de la Clergerie, Benoît Sagot:
Rethinking Automatic Evaluation in Sentence Simplification. CoRR abs/2104.07560 (2021) - [i16]Arij Riabi, Benoît Sagot, Djamé Seddah:
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? CoRR abs/2110.13658 (2021) - [i15]Sabrina J. Mielke, Zaid Alyafeai, Elizabeth Salesky, Colin Raffel, Manan Dey, Matthias Gallé, Arun Raja, Chenglei Si, Wilson Y. Lee, Benoît Sagot, Samson Tan:
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP. CoRR abs/2112.10508 (2021) - 2020
- [c103]Djamé Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Suárez, Benoît Sagot, Abhishek Srivastava:
Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell. ACL 2020: 1139-1150 - [c102]Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. ACL 2020: 1703-1714 - [c101]Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia:
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. ACL 2020: 4668-4679 - [c100]Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric de la Clergerie, Djamé Seddah, Benoît Sagot:
CamemBERT: a Tasty French Language Model. ACL 2020: 7203-7219 - [c99]Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux:
Evaluating the Reliability of Acoustic Speech Embeddings. INTERSPEECH 2020: 4621-4625 - [c98]Clémentine Fourrier, Benoît Sagot:
Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0. LREC 2020: 3207-3216 - [c97]Gaël Guibon, Benoît Sagot:
OFrLex: A Computational Morphological and Syntactic Lexicon for Old French. LREC 2020: 3217-3225 - [c96]Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot:
Establishing a New State-of-the-Art for French Named Entity Recognition. LREC 2020: 4631-4638 - [c95]Louis Martin, Éric de la Clergerie, Benoît Sagot, Antoine Bordes:
Controllable Sentence Simplification. LREC 2020: 4689-4698 - [c94]Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Benoît Sagot, Djamé Seddah:
Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ). JEP-TALN-RECITAL (2) 2020: 54-65 - [i14]Benjamin Muller, Benoît Sagot, Djamé Seddah:
Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi. CoRR abs/2005.00318 (2020) - [i13]Louis Martin, Angela Fan, Éric de la Clergerie, Antoine Bordes, Benoît Sagot:
Multilingual Unsupervised Sentence Simplification. CoRR abs/2005.00352 (2020) - [i12]Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia:
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations. CoRR abs/2005.00481 (2020) - [i11]Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot:
Establishing a New State-of-the-Art for French Named Entity Recognition. CoRR abs/2005.13236 (2020) - [i10]Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. CoRR abs/2006.06202 (2020) - [i9]Robin Algayres, Mohamed Salah Zaïem, Benoît Sagot, Emmanuel Dupoux:
Evaluating the reliability of acoustic speech embeddings. CoRR abs/2007.13542 (2020) - [i8]Arij Riabi, Thomas Scialom, Rachel Keraron, Benoît Sagot, Djamé Seddah, Jacopo Staiano:
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering. CoRR abs/2010.12643 (2020) - [i7]Benjamin Muller, Antonis Anastasopoulos, Benoît Sagot, Djamé Seddah:
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models. CoRR abs/2010.12858 (2020)
2010 – 2019
- 2019
- [c93]Ganesh Jawahar, Benoît Sagot, Djamé Seddah:
What Does BERT Learn about the Structure of Language? ACL (1) 2019: 3651-3657 - [c92]Benjamin Muller, Benoît Sagot, Djamé Seddah:
Enhancing BERT for Lexical Normalization. W-NUT@EMNLP 2019: 297-306 - [c91]Benoît Sagot:
Développement d'un lexique morphologique et syntaxique de l'ancien français (Development of a morphological and syntactic lexicon of Old French). PFIA (Articles courts) 2019: 265-274 - [i6]Louis Martin, Samuel Humeau, Pierre-Emmanuel Mazaré, Antoine Bordes, Éric Villemonte de la Clergerie, Benoît Sagot:
Reference-less Quality Estimation of Text Simplification Systems. CoRR abs/1901.10746 (2019) - [i5]Louis Martin, Benoît Sagot, Éric de la Clergerie, Antoine Bordes:
Controllable Sentence Simplification. CoRR abs/1910.02677 (2019) - [i4]Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot:
CamemBERT: a Tasty French Language Model. CoRR abs/1911.03894 (2019) - [i3]Charlotte Rochereau, Benoît Sagot, Emmanuel Dupoux:
Modeling German Verb Argument Structures: LSTMs vs. Humans. CoRR abs/1912.00239 (2019) - 2018
- [b1]Benoît Sagot:
Informatiser le lexique - Modélisation, développement et exploitation de lexiques morphologiques, syntaxiques et sémantiques. (Computerising the lexicon - Modelling, development and use of morphological, syntactic and semantic lexicons). Sorbonne University, Paris, France, 2018 - [c90]Ganesh Jawahar, Benjamin Muller, Amal Fethi, Louis Martin, Éric Villemonte de la Clergerie, Benoît Sagot, Djamé Seddah:
ELMoLex: Connecting ELMo and Lexicon Features for Dependency Parsing. CoNLL Shared Task (2) 2018: 223-237 - [c89]Amir More, Özlem Çetinoglu, Çagri Çöltekin, Nizar Habash, Benoît Sagot, Djamé Seddah, Dima Taji, Reut Tsarfaty:
CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing. LREC 2018 - [c88]Benoît Sagot:
A multilingual collection of CoNLL-U-compatible morphological lexicons. LREC 2018 - [c87]Djamé Seddah, Éric Villemonte de la Clergerie, Benoît Sagot, Héctor Martínez Alonso, Marie Candito:
Cheating a Parser to Death: Data-driven Cross-Treebank Annotation Transfer. LREC 2018 - 2017
- [j11]Sacha Beniamine, Olivier Bonami, Benoît Sagot:
Inferring Inflection Classes with Description Length. J. Lang. Model. 5(3): 465-525 (2017) - [c86]Héctor Martínez Alonso, Amaury Delamaire, Benoît Sagot:
Annotating omission in statement pairs. LAW@ACL 2017: 41-45 - [c85]Éric Villemonte de la Clergerie, Benoît Sagot, Djamé Seddah:
The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy. CoNLL Shared Task (2) 2017: 243-252 - [c84]Benoît Sagot, Héctor Martínez Alonso:
Improving neural tagging with lexical information. IWPT 2017: 25-31 - [c83]