
Florian Metze
Person information
- affiliation: Carnegie Mellon University, Pittsburgh, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2021
- [i55]Abhilasha Ravichander, Siddharth Dalmia, Maria Ryskina, Florian Metze, Eduard H. Hovy, Alan W. Black:
NoiseQA: Challenge Set Evaluation for User-Centric Question Answering. CoRR abs/2102.08345 (2021) - 2020
- [j15]Shruti Palaskar, Ramon Sanabria, Florian Metze:
Transfer learning for multimodal dialog. Comput. Speech Lang. 64: 101093 (2020) - [j14]Lucia Specia, Loïc Barrault
, Ozan Caglayan
, Amanda Cardoso Duarte, Desmond Elliott
, Spandana Gella
, Nils Holzenberger
, Chiraag Lala, Sun Jae Lee
, Jindrich Libovický
, Pranava Madhyastha
, Florian Metze
, Karl Mulligan
, Alissa Ostapenko, Shruti Palaskar
, Ramon Sanabria, Josiah Wang
, Raman Arora:
Grounded Sequence to Sequence Transduction. IEEE J. Sel. Top. Signal Process. 14(3): 577-591 (2020) - [j13]Odette Scharenborg
, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang
, Emmanuel Dupoux, Laurent Besacier, Alan W. Black
, Mark Hasegawa-Johnson
, Florian Metze
, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. IEEE ACM Trans. Audio Speech Lang. Process. 28: 964-975 (2020) - [j12]Fengquan Dong, Kun Qian
, Zhao Ren
, Alice Baird, Xinjian Li, Zhenyu Dai, Bo Dong, Florian Metze
, Yoshiharu Yamamoto, Björn W. Schuller
:
Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS - The Heart Sounds Shenzhen Corpus. IEEE J. Biomed. Health Informatics 24(7): 2082-2092 (2020) - [c181]Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. Black, Florian Metze:
Towards Zero-Shot Learning for Automatic Phonemic Transcription. AAAI 2020: 8261-8268 - [c180]Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott:
Fine-Grained Grounding for Multimodal Speech Recognition. EMNLP (Findings) 2020: 2667-2677 - [c179]Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze:
On Long-Tailed Phenomena in Neural Machine Translation. EMNLP (Findings) 2020: 3088-3095 - [c178]Tejas Srinivasan, Ramon Sanabria, Florian Metze:
Looking Enhances Listening: Recovering Missing Speech Using Images. ICASSP 2020: 6304-6308 - [c177]Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze:
ASR Error Correction and Domain Adaptation Using Machine Translation. ICASSP 2020: 6344-6348 - [c176]Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze:
Universal Phone Recognition with a Multilingual Allophone System. ICASSP 2020: 8249-8253 - [c175]Mahaveer Jain, Gil Keren, Jay Mahadeokar, Geoffrey Zweig, Florian Metze, Yatharth Saraf:
Contextual RNN-T for Open Domain ASR. INTERSPEECH 2020: 11-15 - [c174]Zimeng Qiu, Yiyuan Li, Xinjian Li, Florian Metze, William M. Campbell:
Towards Context-Aware End-to-End Code-Switching Speech Recognition. INTERSPEECH 2020: 4776-4780 - [c173]David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig:
AlloVera: A Multilingual Allophone Database. LREC 2020: 5329-5336 - [c172]Vikas Raunak, Vaibhav Kumar, Vivek Gupta, Florian Metze:
On Dimensional Linguistic Properties of the Word Embedding Space. RepL4NLP@ACL 2020: 156-165 - [i54]Zhong Zhou, Isak Czeresnia Etinger, Florian Metze, Alexander G. Hauptmann, Alexander H. Waibel:
Gun Source and Muzzle Head Detection. CoRR abs/2001.11120 (2020) - [i53]Tejas Srinivasan, Ramon Sanabria, Florian Metze:
Looking Enhances Listening: Recovering Missing Speech Using Images. CoRR abs/2002.05639 (2020) - [i52]Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. Black, Florian Metze:
Towards Zero-shot Learning for Automatic Phonemic Transcription. CoRR abs/2002.11781 (2020) - [i51]Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. Black, Florian Metze:
Universal Phone Recognition with a Multilingual Allophone System. CoRR abs/2002.11800 (2020) - [i50]Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze:
ASR Error Correction and Domain Adaptation Using Machine Translation. CoRR abs/2003.07692 (2020) - [i49]David R. Mortensen, Xinjian Li, Patrick Littell, Alexis Michaud, Shruti Rijhwani, Antonios Anastasopoulos, Alan W. Black, Florian Metze, Graham Neubig:
AlloVera: A Multilingual Allophone Database. CoRR abs/2004.08031 (2020) - [i48]Amanda Cardoso Duarte, Shruti Palaskar, Deepti Ghadiyaram, Kenneth DeHaan, Florian Metze, Jordi Torres, Xavier Giró-i-Nieto:
How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language. CoRR abs/2008.08143 (2020) - [i47]Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze:
Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations. CoRR abs/2009.05739 (2020) - [i46]Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott:
Fine-Grained Grounding for Multimodal Speech Recognition. CoRR abs/2010.02384 (2020) - [i45]Mandela Patrick, Po-Yao Huang, Yuki Markus Asano, Florian Metze, Alexander G. Hauptmann, João F. Henriques, Andrea Vedaldi:
Support-set bottlenecks for video-text representation learning. CoRR abs/2010.02824 (2020) - [i44]Vikas Raunak, Siddharth Dalmia, Vivek Gupta, Florian Metze:
On Long-Tailed Phenomena in Neural Machine Translation. CoRR abs/2010.04924 (2020) - [i43]Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott:
Multimodal Speech Recognition with Unstructured Audio Masking. CoRR abs/2010.08642 (2020) - [i42]Juncheng B. Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze:
Audio-Visual Event Recognition through the lens of Adversary. CoRR abs/2011.07430 (2020)
2010 – 2019
- 2019
- [j11]Niluthpol Chowdhury Mithun
, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury:
Joint embeddings with multimodal cues for video-text retrieval. Int. J. Multim. Inf. Retr. 8(1): 3-18 (2019) - [j10]Okko Räsänen, Shreyas Seshadri
, Julien Karadayi, Eric Riebling, John Bunce
, Alejandrina Cristià
, Florian Metze, Marisa Casillas, Celia Rosemberg, Elika Bergelson, Melanie Soderstrom
:
Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech. Speech Commun. 113: 63-80 (2019) - [c171]Suyoun Kim, Siddharth Dalmia, Florian Metze:
Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion. ACL (1) 2019: 1131-1141 - [c170]Shruti Palaskar, Jindrich Libovický, Spandana Gella, Florian Metze:
Multimodal Abstractive Summarization for How2 Videos. ACL (1) 2019: 6587-6596 - [c169]Yun Wang, Juncheng Li, Florian Metze:
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling. ICASSP 2019: 31-35 - [c168]Yun Wang, Florian Metze:
Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling. ICASSP 2019: 745-749 - [c167]Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze:
Phoneme Level Language Models for Sequence Based Low Resource ASR. ICASSP 2019: 6091-6095 - [c166]Shruti Palaskar, Vikas Raunak, Florian Metze:
Learned in Speech Recognition: Contextual Acoustic Word Embeddings. ICASSP 2019: 6530-6534 - [c165]Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha
, Florian Metze, Raman Arora:
Learning from Multiview Correlations in Open-domain Videos. ICASSP 2019: 8628-8632 - [c164]Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze:
Multimodal Grounding for Sequence-to-sequence Speech Recognition. ICASSP 2019: 8648-8652 - [c163]Vikas Raunak, Sang Keun Choe, Quanyang Lu, Yi Xu, Florian Metze:
On Leveraging the Visual Modality for Neural Machine Translation. INLG 2019: 147-151 - [c162]Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze:
Multilingual Speech Recognition with Corpus Relatedness Sampling. INTERSPEECH 2019: 2120-2124 - [c161]Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze:
SANTLR: Speech Annotation Toolkit for Low Resource Languages. INTERSPEECH 2019: 3681-3682 - [c160]Suyoun Kim, Siddharth Dalmia, Florian Metze:
Cross-Attention End-to-End ASR for Two-Party Conversations. INTERSPEECH 2019: 4380-4384 - [c159]Florian Metze:
Survey Talk: Multimodal Processing of Speech and Language. INTERSPEECH 2019 - [c158]Yasufumi Moriya, Ramon Sanabria, Florian Metze, Gareth J. F. Jones:
MediaEval 2019: Eyes and Ears Together. MediaEval 2019 - [c157]Suyoun Kim, Florian Metze:
Acoustic-to-Word Models with Conversational Context Information. NAACL-HLT (1) 2019: 2766-2771 - [c156]Juncheng Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze:
Adversarial Music: Real world Audio Adversary against Wake-word Detection System. NeurIPS 2019: 11908-11918 - [c155]Vikas Raunak, Vivek Gupta, Florian Metze:
Effective Dimensionality Reduction for Word Embeddings. RepL4NLP@ACL 2019: 235-243 - [i41]Shruti Palaskar, Vikas Raunak, Florian Metze:
Learned In Speech Recognition: Contextual Acoustic Word Embeddings. CoRR abs/1902.06833 (2019) - [i40]Siddharth Dalmia, Xinjian Li, Alan W. Black, Florian Metze:
Phoneme Level Language Models for Sequence Based Low Resource ASR. CoRR abs/1902.07613 (2019) - [i39]Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori S. Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard H. Hovy, Alan W. Black, Jaime G. Carbonell, Graham Horwood, Shabnam Tafreshi, Mona T. Diab, Efsun Sarioglu, Noura Farra, Kathleen R. McKeown:
The ARIEL-CMU Systems for LoReHLT18. CoRR abs/1902.08899 (2019) - [i38]Suyoun Kim, Florian Metze:
Acoustic-to-Word Models with Conversational Context Information. CoRR abs/1905.08796 (2019) - [i37]Yasufumi Moriya, Ramon Sanabria, Florian Metze, Gareth J. F. Jones:
Grounding Object Detections With Transcriptions. CoRR abs/1906.06147 (2019) - [i36]Shruti Palaskar, Jindrich Libovický, Spandana Gella, Florian Metze:
Multimodal Abstractive Summarization for How2 Videos. CoRR abs/1906.07901 (2019) - [i35]Suyoun Kim, Siddharth Dalmia, Florian Metze:
Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion. CoRR abs/1906.11604 (2019) - [i34]Tejas Srinivasan, Ramon Sanabria, Florian Metze:
Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions. CoRR abs/1907.00477 (2019) - [i33]Suyoun Kim, Siddharth Dalmia, Florian Metze:
Cross-Attention End-to-End ASR for Two-Party Conversations. CoRR abs/1907.10726 (2019) - [i32]Xinjian Li, Siddharth Dalmia, Alan W. Black, Florian Metze:
Multilingual Speech Recognition with Corpus Relatedness Sampling. CoRR abs/1908.01060 (2019) - [i31]Xinjian Li, Zhong Zhou, Siddharth Dalmia, Alan W. Black, Florian Metze:
SANTLR: Speech Annotation Toolkit for Low Resource Languages. CoRR abs/1908.01067 (2019) - [i30]Vikas Raunak, Vaibhav Kumar, Vivek Gupta, Florian Metze:
On Dimensional Linguistic Properties of the Word Embedding Space. CoRR abs/1910.02211 (2019) - [i29]Vikas Raunak, Sang Keun Choe, Quanyang Lu, Yi Xu, Florian Metze:
On Leveraging the Visual Modality for Neural Machine Translation. CoRR abs/1910.02754 (2019) - [i28]Tejas Srinivasan, Ramon Sanabria, Florian Metze:
Multitask Learning For Different Subword Segmentations In Neural Machine Translation. CoRR abs/1910.12368 (2019) - [i27]Juncheng B. Li, Shuhui Qu, Xinjian Li, J. Zico Kolter, Florian Metze:
Adversarial Music: Real World Audio Adversary Against Wake-word Detection System. CoRR abs/1911.00126 (2019) - [i26]Vikas Raunak, Vaibhav Kumar, Florian Metze:
On Compositionality in Neural Machine Translation. CoRR abs/1911.01497 (2019) - [i25]Siddharth Dalmia, Abdelrahman Mohamed, Mike Lewis, Florian Metze, Luke Zettlemoyer:
Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models. CoRR abs/1911.03782 (2019) - 2018
- [c154]Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black:
Sequence-Based Multi-Lingual Low Resource Speech Recognition. ICASSP 2018: 4909-4913 - [c153]Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx
, Rachid Riad, Liming Wang, Emmanuel Dupoux:
Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop. ICASSP 2018: 4979-4983 - [c152]Neville Ryant, Elika Bergelson, Kenneth Church, Alejandrina Cristià, Jun Du, Sriram Ganapathy, Sanjeev Khudanpur, Diana Kowalski, Mahesh Krishnamoorthy, Rajat Kulshreshta, Mark Liberman, Yu-Ding Lu, Matthew Maciejewski, Florian Metze, Jan Profant, Lei Sun, Yu Tsao
, Zhou Yu:
Enhancement and Analysis of Conversational Speech: JSALT 2017. ICASSP 2018: 5154-5158 - [c151]Shruti Palaskar, Ramon Sanabria, Florian Metze:
End-to-end Multimodal Speech Recognition. ICASSP 2018: 5774-5778 - [c150]Juncheng Li, Yun Wang, Joseph Szurley, Florian Metze, Samarjit Das:
A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging. ICASSP 2018: 6832-6836 - [c149]Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel:
Subword and Crossword Units for CTC Acoustic Models. INTERSPEECH 2018: 396-400 - [c148]Yun Wang, Juncheng Li, Florian Metze:
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks. INTERSPEECH 2018: 1339-1343 - [c147]Adrien Le Franc
, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze, Alejandrina Cristià:
The ACLEW DiViMe: An Easy-to-use Diarization Tool. INTERSPEECH 2018: 1383-1387 - [c146]Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley, Samarjit Das:
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection. INTERSPEECH 2018: 3279-3283 - [c145]Boyang Li, Beth Cardier, Tong Wang, Florian Metze:
Annotating High-Level Structures of Short Stories and Personal Anecdotes. LREC 2018 - [c144]Yasufumi Moriya, Ramon Sanabria, Florian Metze, Gareth J. F. Jones:
Eyes and Ears Together: New Task for Multimodal Spoken Content Analysis. MediaEval 2018 - [c143]Niluthpol Chowdhury Mithun
, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury:
Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval. ICMR 2018: 19-27 - [c142]Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. Black:
Domain Robust Feature Extraction for Rapid Low Resource ASR Development. SLT 2018: 258-265 - [c141]Shruti Palaskar, Florian Metze:
Acoustic-to-Word Recognition with Sequence-to-Sequence Models. SLT 2018: 397-404 - [c140]Suyoun Kim, Florian Metze:
Dialog-Context Aware end-to-end Speech Recognition. SLT 2018: 434-440 - [c139]Ramon Sanabria, Florian Metze:
Hierarchical Multitask Learning With CTC. SLT 2018: 485-490 - [i24]Eduard H. Hovy, Taylor Berg-Kirkpatrick, Jaime G. Carbonell, Hans Chalupsky, Anatole Gershman, Alexander G. Hauptmann, Florian Metze, Teruko Mitamura, Aditi Chaudhary, Xianyang Chen, Bernie Po-Yao Huang, Hector Zhengzhong Liu, Xuezhe Ma, Shruti Palaskar, Dheeraj Rajagopal, Maria Ryskina, Ramon Sanabria:
OPERA: Operations-oriented Probabilistic Extraction, Reasoning, and Analysis. TAC 2018 - [i23]Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux:
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop. CoRR abs/1802.05092 (2018) - [i22]Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W. Black:
Sequence-based Multi-lingual Low Resource Speech Recognition. CoRR abs/1802.07420 (2018) - [i21]Yun Wang, Juncheng Li, Florian Metze:
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks. CoRR abs/1804.01146 (2018) - [i20]Shruti Palaskar, Ramon Sanabria, Florian Metze:
End-to-End Multimodal Speech Recognition. CoRR abs/1804.09713 (2018) - [i19]Ramon Sanabria, Florian Metze:
Hierarchical Multi Task Learning With CTC. CoRR abs/1807.07104 (2018) - [i18]Shruti Palaskar, Florian Metze:
Acoustic-to-Word Recognition with Sequence-to-Sequence Models. CoRR abs/1807.09597 (2018) - [i17]Siddharth Dalmia, Xinjian Li, Florian Metze, Alan W. Black:
Domain Robust Feature Extraction for Rapid Low Resource ASR Development. CoRR abs/1807.10984 (2018) - [i16]Suyoun Kim, Florian Metze:
Dialog-context aware end-to-end speech recognition. CoRR abs/1808.02171 (2018) - [i15]Ankit Shah, Harini Kesavamoorthy, Poorva Rane, Pramati Kalwad, Alexander G. Hauptmann, Florian Metze:
Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset. CoRR abs/1809.00241 (2018) - [i14]Yun Wang, Juncheng Li, Florian Metze:
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling. CoRR abs/1810.09050 (2018) - [i13]Yun Wang, Florian Metze:
Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling. CoRR abs/1810.09052 (2018) - [i12]Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze:
How2: A Large-scale Dataset for Multimodal Language Understanding. CoRR abs/1811.00347 (2018) - [i11]Ozan Caglayan, Ramon Sanabria, Shruti Palaskar, Loïc Barrault, Florian Metze:
Multimodal Grounding for Sequence-to-Sequence Speech Recognition. CoRR abs/1811.03865 (2018) - [i10]Nils Holzenberger, Shruti Palaskar, Pranava Madhyastha, Florian Metze, Raman Arora:
Learning from Multiview Correlations in Open-Domain Videos. CoRR abs/1811.08890 (2018) - 2017
- [c138]Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das:
A comparison of Deep Learning methods for environmental sound detection. ICASSP 2017: 126-130 - [c137]Yun Wang, Florian Metze:
A first attempt at polyphonic sound event detection using connectionist temporal classification. ICASSP 2017: 2986-2990 - [c136]Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze:
Visual features for context-aware speech recognition. ICASSP 2017: 5020-5024 - [c135]Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Comparison of Decoding Strategies for CTC Acoustic Models. INTERSPEECH 2017: 513-517 - [c134]Yun Wang, Florian Metze:
A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification. INTERSPEECH 2017: 3097-3101 - [c133]Niluthpol Chowdhury Mithun, Juncheng B. Li, Florian Metze, Amit K. Roy-Chowdhury, Samarjit Das:
CMU-UCR-BOSCH @ TRECVID 2017: VIDEO TO TEXT RETRIEVAL. TRECVID 2017 - [p4]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
Preliminaries. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 3-17 - [p3]Yajie Miao, Florian Metze:
End-to-End Architectures for Speech Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 299-323 - [p2]Shinji Watanabe, Takaaki Hori, Yajie Miao, Marc Delcroix, Florian Metze, John R. Hershey:
Toolkits for Robust Speech Processing. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 369-382 - [e3]Shinji Watanabe, Marc Delcroix, Florian Metze, John R. Hershey:
New Era for Robust Speech Recognition, Exploiting Deep Learning. Springer 2017, ISBN 978-3-319-64679-4 [contents] - [i9]Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das:
A Comparison of deep learning methods for environmental sound. CoRR abs/1703.06902 (2017) - [i8]Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel:
Comparison of Decoding Strategies for CTC Acoustic Models. CoRR abs/1708.04469 (2017) - [i7]Boyang Li, Beth Cardier, Tong Wang, Florian Metze:
Annotating High-Level Structures of Short Stories and Personal Anecdotes. CoRR abs/1710.06917 (2017) - [i6]Abhinav Gupta
, Yajie Miao, Leonardo Neves, Florian Metze:
Visual Features for Context-Aware Speech Recognition. CoRR abs/1712.00489 (2017) - [i5]Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel:
Subword and Crossword Units for CTC Acoustic Models. CoRR abs/1712.06855 (2017) - [i4]Shao-Yen Tseng, Juncheng Li, Yun Wang, Joseph Szurley, Florian Metze, Samarjit Das:
Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection. CoRR abs/1712.09673 (2017) - [i3]Juncheng Li, Yun Wang, Joseph Szurley, Florian Metze, Samarjit Das:
A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging. CoRR abs/1712.09680 (2017) - 2016
- [c132]Marvin Ritter, Markus Müller, Sebastian Stüker, Florian Metze, Alex Waibel:
Training Deep Neural Networks for Reverberation Robust Speech Recognition. ITG Symposium on Speech Communication 2016: 1-5 - [c131]Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko
, Florian Metze, Alexander H. Waibel:
An empirical exploration of CTC acoustic models. ICASSP 2016: 2623-2627 - [c130]Yun Wang, Leonardo Neves, Florian Metze:
Audio-based multimedia event detection using deep recurrent neural networks. ICASSP 2016: 2742-2746 - [c129]Florian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson:
Virtual Machines and Containers as a Platform for Experimentation. INTERSPEECH 2016: 1603-1607 - [c128]Rebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha A. Larson, Gina-Anne Levow, Emily Mower Provost
:
Experiences with Shared Resources for Research and Education in Speech and Language Processing. INTERSPEECH 2016: 1627-1631 - [c127]Yashesh Gaur, Florian Metze, Jeffrey P. Bigham:
Manipulating Word Lattices to Incorporate Human Corrections. INTERSPEECH 2016: 3062-3065 - [c126]Yajie Miao, Florian Metze:
Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach. INTERSPEECH 2016: 3414-3418 - [c125]Yun Wang, Florian Metze:
Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection. ICMR 2016: 265-269 - [c124]Yashesh Gaur, Walter S. Lasecki, Florian Metze, Jeffrey P. Bigham:
The effects of automatic speech recognition quality on human transcription latency. W4A 2016: 23:1-23:8 - [i2]Ramon Sanabria, Florian Metze, Fernando De la Torre:
Robust end-to-end deep audiovisual speech recognition. CoRR abs/1611.06986 (2016) - 2015
- [j9]Yajie Miao, Hao Zhang, Florian Metze:
Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors. IEEE ACM Trans. Audio Speech Lang. Process. 23(11): 1938-1949 (2015) - [c123]Yajie Miao, Mohammad Gowayyed, Florian Metze:
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. ASRU 2015: 167-174 - [c122]Florian Metze, Ankur Gandhe, Yajie Miao, Zaid A. W. Sheikh, Yun Wang, Di Xu, Hao Zhang, Jungsuk Kim, Ian R. Lane, Wonkyum Lee, Sebastian Stüker, Markus Müller:
Semi-supervised training in low-resource ASR and KWS. ICASSP 2015: 4699-4703 - [c121]Hao Zhang, Yajie Miao, Florian Metze:
Regularizing DNN acoustic models with Gaussian stochastic neurons. ICASSP 2015: 4964-4968 - [c120]Xavier Anguera, Luis Javier Rodríguez-Fuentes, Andi Buzo, Florian Metze, Igor Szöke, Mikel Peñagarikano
:
QUESST2014: Evaluating Query-by-Example Speech Search in a zero-resource setting with real-life queries. ICASSP 2015: 5833-5837 - [c119]Yajie Miao, Florian Metze:
Distance-aware DNNs for robust speech recognition. INTERSPEECH 2015: 761-765 - [c118]