


default search action
Mark Hasegawa-Johnson
Mark A. Hasegawa-Johnson
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j41]Bashima Islam
, Nancy L. McElwain, Jialu Li
, Maria I. Davila
, Yannan Hu, Kexin Hu, Jordan M. Bodway, Ashutosh Dhekne, Romit Roy Choudhury, Mark Hasegawa-Johnson
:
Preliminary Technical Validation of LittleBeats™: A Multimodal Sensing Platform to Capture Cardiac Physiology, Motion, and Vocalizations. Sensors 24(3): 901 (2024) - [c222]Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark Hasegawa-Johnson, Sungwoong Kim, Chang Dong Yoo:
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback. ACL (Findings) 2024: 14969-14981 - [c221]Mohammad Nur Hossain Khan, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam:
InfantMotion2Vec: Unlabeled Data-Driven Infant Pose Estimation Using a Single Chest IMU. BSN 2024: 1-4 - [c220]Mohammad Nur Hossain Khan, Jialu Li, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam:
Sound Tagging in Infant-centric Home Soundscapes. CHASE 2024: 142-146 - [c219]Maliha Jahan, Helin Wang, Thomas Thebaud, Yinglun Sun, Giang Ha Le, Zsuzsanna Fagyal, Odette Scharenborg, Mark Hasegawa-Johnson, Laureano Moro-Velázquez, Najim Dehak:
Finding Spoken Identifications: Using GPT-4 Annotation for an Efficient and Fast Dataset Creation Pipeline. LREC/COLING 2024: 7296-7306 - [c218]SooHwan Eom, Jay Shim, Gwanhyeong Koo, Haebin Na, Mark Hasegawa-Johnson, Sungwoong Kim, Chang Dong Yoo:
Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM. EMNLP (Findings) 2024: 14158-14167 - [c217]Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich:
LIMMITS'24: Multi-Speaker, Multi-Lingual Indic TTS with Voice Cloning. ICASSP Workshops 2024: 61-62 - [c216]Jialu Li, Mark Hasegawa-Johnson, Nancy L. McElwain:
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations. ICASSP Workshops 2024: 550-554 - [c215]Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo:
G2PU: Grapheme-To-Phoneme Transducer with Speech Units. ICASSP 2024: 10061-10065 - [c214]Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo:
Unsupervised Speech Recognition with N-skipgram and Positional Unigram Matching. ICASSP 2024: 10936-10940 - [c213]SooHwan Eom, Eunseop Yoon, Hee Suk Yoon, Chanwoo Kim, Mark Hasegawa-Johnson, Chang D. Yoo:
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition. ICASSP 2024: 12707-12711 - [c212]Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark A. Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo:
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion. ICLR 2024 - [c211]Heting Gao, Kaizhi Qian, Junrui Ni, Chuang Gan, Mark A. Hasegawa-Johnson, Shiyu Chang, Yang Zhang:
Speech Self-Supervised Learning Using Diffusion Model Synthetic Data. ICML 2024 - [i62]Jialu Li
, Mark Hasegawa-Johnson, Nancy L. McElwain:
Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations. CoRR abs/2402.06888 (2024) - [i61]Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Mark Hasegawa-Johnson, Yingzhen Li, Chang D. Yoo:
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion. CoRR abs/2403.14119 (2024) - [i60]Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo:
Towards Unsupervised Speech Recognition Without Pronunciation Models. CoRR abs/2406.08380 (2024) - [i59]Mohammad Nur Hossain Khan, Jialu Li
, Nancy L. McElwain, Mark Hasegawa-Johnson, Bashima Islam:
Sound Tagging in Infant-centric Home Soundscapes. CoRR abs/2406.17190 (2024) - [i58]Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo:
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback. CoRR abs/2407.16574 (2024) - [i57]Eunseop Yoon, Hee Suk Yoon, John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo:
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition. CoRR abs/2408.05769 (2024) - [i56]Junkai Wu, Xulin Fan, Bo-Ru Lu, Xilin Jiang, Nima Mesgarani, Mark Hasegawa-Johnson, Mari Ostendorf:
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue. CoRR abs/2409.04927 (2024) - [i55]Xiuwen Zheng, Bornali Phukon, Mark Hasegawa-Johnson:
Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility. CoRR abs/2409.19818 (2024) - [i54]Sandeep Nagar, Mark Hasegawa-Johnson, David G. Beiser, Narendra Ahuja:
R2I-rPPG: A Robust Region of Interest Selection Method for Remote Photoplethysmography to Extract Heart Rate. CoRR abs/2410.15851 (2024) - 2023
- [j40]Oshane O. Thomas
, Hongyu Shen
, Ryan L. Raaum
, William E. H. Harcourt-Smith, John D. Polk
, Mark Hasegawa-Johnson:
Automated morphological phenotyping using learned shape descriptors and functional maps: A novel approach to geometric morphometrics. PLoS Comput. Biol. 19(1) (2023) - [c210]Liming Wang, Mark Hasegawa-Johnson, Chang Dong Yoo:
A Theory of Unsupervised Speech Recognition. ACL (1) 2023: 1192-1215 - [c209]Liming Wang, Junrui Ni
, Heting Gao, Jialu Li, Kai Chieh Chang, Xulin Fan, Junkai Wu, Mark Hasegawa-Johnson, Chang Dong Yoo:
Listen, Decipher and Sign: Toward Unsupervised Speech-to-Sign Language Recognition. ACL (Findings) 2023: 6785-6800 - [c208]Eunseop Yoon, Hee Suk Yoon, John B. Harvill, Mark Hasegawa-Johnson, Chang Dong Yoo:
INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition. ACL (Findings) 2023: 9893-9902 - [c207]Kai Chieh Chang, Mark Hasegawa-Johnson, Nancy L. McElwain, Bashima Islam:
Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data. APSIPA ASC 2023: 2370-2377 - [c206]Abhayjeet Singh, Amala Nagireddi, Deekshitha G, Jesuraja Bandekar, Roopa R., Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A. Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich:
Lightweight, Multi-Speaker, Multi-Lingual Indic Text-to-Speech. ICASSP 2023: 1-2 - [c205]Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson:
Dual-Path Cross-Modal Attention for Better Audio-Visual Speech Extraction. ICASSP 2023: 1-5 - [c204]Jialu Li
, Mark Hasegawa-Johnson, Nancy L. McElwain:
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio. INTERSPEECH 2023: 1035-1039 - [c203]Eunseop Yoon, Hee Suk Yoon
, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John B. Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo:
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction. INTERSPEECH 2023: 2028-2032 - [c202]Wonjune Kang, Mark Hasegawa-Johnson, Deb Roy:
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions. INTERSPEECH 2023: 2303-2307 - [c201]Wanyue Zhai, Mark Hasegawa-Johnson:
Wav2ToBI: a new approach to automatic ToBI transcription. INTERSPEECH 2023: 2748-2752 - [c200]John B. Harvill, Mark Hasegawa-Johnson, Hee Suk Yoon, Chang D. Yoo, Eunseop Yoon:
One-Shot Exemplification Modeling via Latent Sense Representations. RepL4NLP@ACL 2023: 303-314 - [i53]Jialu Li
, Mark Hasegawa-Johnson, Nancy L. McElwain:
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio. CoRR abs/2305.12530 (2023) - [i52]Eunseop Yoon, Hee Suk Yoon, John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo:
INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition. CoRR abs/2305.16371 (2023) - [i51]Liming Wang, Mark A. Hasegawa-Johnson, Chang D. Yoo:
A Theory of Unsupervised Speech Recognition. CoRR abs/2306.07926 (2023) - [i50]Kai Chieh Chang, Mark Hasegawa-Johnson, Nancy L. McElwain, Bashima Islam:
Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data. CoRR abs/2306.15808 (2023) - [i49]Eunseop Yoon, Hee Suk Yoon
, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John B. Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo:
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction. CoRR abs/2308.08442 (2023) - [i48]Jialu Li
, Mark Hasegawa-Johnson, Karrie Karahalios:
Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features. CoRR abs/2309.07287 (2023) - [i47]Liming Wang, Mark Hasegawa-Johnson, Chang D. Yoo:
Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching. CoRR abs/2310.02382 (2023) - [i46]Zhonghao Wang, Wei Wei, Yang Zhao, Zhisheng Xiao, Mark Hasegawa-Johnson, Humphrey Shi
, Tingbo Hou:
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models. CoRR abs/2312.00079 (2023) - 2022
- [j39]Piotr Zelasko
, Siyuan Feng
, Laureano Moro-Velázquez
, Ali Abavisani
, Saurabhchand Bhati, Odette Scharenborg
, Mark Hasegawa-Johnson, Najim Dehak
:
Discovering phonetic inventories with crosslingual automatic speech recognition. Comput. Speech Lang. 74: 101358 (2022) - [j38]Heting Gao, Junrui Ni
, Yang Zhang, Kaizhi Qian, Shiyu Chang, Mark Hasegawa-Johnson:
Domain Generalization for Language-Independent Automatic Speech Recognition. Frontiers Artif. Intell. 5: 806274 (2022) - [j37]Heting Gao
, Xiaoxuan Wang, Sunghun Kang, Rusty Mina
, Dias Issa
, John B. Harvill, Leda Sari
, Mark Hasegawa-Johnson
, Chang D. Yoo
:
Seamless equal accuracy ratio for inclusive CTC speech recognition. Speech Commun. 136: 76-83 (2022) - [j36]Jialu Li
, Mark Hasegawa-Johnson
:
Autosegmental Neural Nets 2.0: An Extensive Study of Training Synchronous and Asynchronous Phones and Tones for Under-Resourced Tonal Languages. IEEE ACM Trans. Audio Speech Lang. Process. 30: 1918-1926 (2022) - [c199]Junghyun Lee
, Gwangsu Kim, Mahbod Olfat, Mark Hasegawa-Johnson, Chang D. Yoo:
Fast and Efficient MMD-Based Fair PCA via Optimization over Stiefel Manifold. AAAI 2022: 7363-7371 - [c198]Liming Wang, Siyuan Feng, Mark Hasegawa-Johnson, Chang Dong Yoo:
Self-supervised Semantic-driven Phoneme Discovery for Zero-resource Speech Recognition. ACL (1) 2022: 8027-8047 - [c197]Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing:
Equivariance Discovery by Learned Parameter-Sharing. AISTATS 2022: 1527-1545 - [c196]John B. Harvill, Yash R. Wani, Mustafa Alam, Narendra Ahuja, Mark Hasegawa-Johnson, David Chestek, David G. Beiser
:
Estimation of Respiratory Rate from Breathing Audio. EMBC 2022: 4599-4603 - [c195]Hee Suk Yoon, Eunseop Yoon, John B. Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang Dong Yoo:
SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation. EMNLP (Findings) 2022: 1493-1502 - [c194]John B. Harvill, Yash R. Wani, Moitreya Chatterjee, Mustafa Alam, David G. Beiser
, David Chestek, Mark Hasegawa-Johnson, Narendra Ahuja:
Detection of Covid-19 from Joint Time and Frequency Analysis of Speech, Breathing and Cough Audio. ICASSP 2022: 3683-3687 - [c193]Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson:
SpeechSplit2.0: Unsupervised Speech Disentanglement for Voice Conversion without Tuning Autoencoder Bottlenecks. ICASSP 2022: 6332-6336 - [c192]Haeyong Kang, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, Chang D. Yoo:
Forget-free Continual Learning with Winning Subnetworks. ICML 2022: 10734-10750 - [c191]Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni
, Cheng-I Lai, David D. Cox, Mark Hasegawa-Johnson, Shiyu Chang:
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers. ICML 2022: 18003-18017 - [c190]Junrui Ni
, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition. INTERSPEECH 2022: 461-465 - [c189]Mahir Morshed, Mark Hasegawa-Johnson:
Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks. INTERSPEECH 2022: 2298-2302 - [c188]Heting Gao, Junrui Ni
, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
WavPrompt: Towards Few-Shot Spoken Language Understanding with Frozen Language Models. INTERSPEECH 2022: 2738-2742 - [c187]John B. Harvill, Mark Hasegawa-Johnson, Chang D. Yoo:
Frame-Level Stutter Detection. INTERSPEECH 2022: 2843-2847 - [c186]John B. Harvill, Roxana Girju, Mark Hasegawa-Johnson:
Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity. NAACL-HLT 2022: 5259-5270 - [i45]Piotr Zelasko, Siyuan Feng, Laureano Moro-Velázquez, Ali Abavisani, Saurabhchand Bhati, Odette Scharenborg, Mark Hasegawa-Johnson, Najim Dehak:
Discovering Phonetic Inventories with Crosslingual Automatic Speech Recognition. CoRR abs/2201.11207 (2022) - [i44]Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson:
SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks. CoRR abs/2203.14156 (2022) - [i43]Jialu Li
, Mark Hasegawa-Johnson, Nancy L. McElwain:
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features. CoRR abs/2203.15183 (2022) - [i42]Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition. CoRR abs/2203.15796 (2022) - [i41]Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson:
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models. CoRR abs/2203.15863 (2022) - [i40]Raymond A. Yeh, Yuan-Ting Hu, Mark Hasegawa-Johnson, Alexander G. Schwing:
Equivariance Discovery by Learned Parameter-Sharing. CoRR abs/2204.03640 (2022) - [i39]Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David D. Cox, Mark Hasegawa-Johnson, Shiyu Chang:
Improving Self-Supervised Speech Representations by Disentangling Speakers. CoRR abs/2204.09224 (2022) - [i38]Zhongweiyang Xu, Xulin Fan, Mark Hasegawa-Johnson:
Dual-path Attention is All You Need for Audio-Visual Speech Extraction. CoRR abs/2207.04213 (2022) - [i37]Hee Suk Yoon
, Eunseop Yoon, John B. Harvill, Sunjae Yoon, Mark Hasegawa-Johnson, Chang D. Yoo:
SMSMix: Sense-Maintained Sentence Mixup for Word Sense Disambiguation. CoRR abs/2212.07072 (2022) - 2021
- [j35]Jialu Li
, Mark Hasegawa-Johnson, Nancy L. McElwain:
Analysis of acoustic and voice quality features for the classification of infant and mother vocalizations. Speech Commun. 133: 41-61 (2021) - [j34]Leda Sari
, Mark Hasegawa-Johnson
, Samuel Thomas
:
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection. IEEE ACM Trans. Audio Speech Lang. Process. 29: 324-333 (2021) - [j33]Xinsheng Wang
, Justin van der Hout, Jihua Zhu
, Mark Hasegawa-Johnson
, Odette Scharenborg
:
Synthesizing Spoken Descriptions of Images. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3242-3254 (2021) - [j32]Leda Sari
, Mark Hasegawa-Johnson
, Chang D. Yoo
:
Counterfactually Fair Automatic Speech Recognition. IEEE ACM Trans. Audio Speech Lang. Process. 29: 3515-3525 (2021) - [c185]Liming Wang, Mark Hasegawa-Johnson:
A Translation Framework for Visually Grounded Spoken Unit Discovery. ACSCC 2021: 1419-1425 - [c184]Junzhe Zhu, Raymond A. Yeh
, Mark Hasegawa-Johnson:
Multi-Decoder Dprnn: Source Separation for Variable Number of Speakers. ICASSP 2021: 3420-3424 - [c183]Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian, Mark Hasegawa-Johnson, Jishen Zhao:
Continuous Cnn For Nonuniform Time Series. ICASSP 2021: 3550-3554 - [c182]Xinsheng Wang, Siyuan Feng
, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg
:
Show and Speak: Directly Synthesize Spoken Description of Images. ICASSP 2021: 4190-4194 - [c181]John B. Harvill, Dias Issa
, Mark Hasegawa-Johnson, Chang Dong Yoo:
Synthesis of New Words for Improved Dysarthric Speech Recognition on an Expanded Vocabulary. ICASSP 2021: 6428-6432 - [c180]Junzhe Zhu, Mark Hasegawa-Johnson, Nancy L. McElwain:
A Comparison Study on Infant-Parent Voice Diarization. ICASSP 2021: 7178-7182 - [c179]Siyuan Feng
, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg
, Najim Dehak
:
How Phonotactics Affect Multilingual and Zero-Shot ASR Performance. ICASSP 2021: 7238-7242 - [c178]Liming Wang, Xinsheng Wang, Mark Hasegawa-Johnson, Odette Scharenborg
, Najim Dehak
:
Align or attend? Toward More Efficient and Accurate Spoken Word Discovery Using Speech-to-Image Retrieval. ICASSP 2021: 7603-7607 - [c177]Zhonghao Wang, Kai Wang, Mo Yu, Jinjun Xiong
, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi
:
Interpretable Visual Reasoning via Induced Symbolic Space. ICCV 2021: 1858-1867 - [c176]Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David D. Cox, Mark Hasegawa-Johnson:
Global Prosody Style Transfer Without Text Transcriptions. ICML 2021: 8650-8660 - [c175]John B. Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David G. Beiser
, David Chestek:
Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation. Interspeech 2021: 926-930 - [c174]Heting Gao, Junrui Ni
, Yang Zhang, Kaizhi Qian, Shiyu Chang, Mark Hasegawa-Johnson:
Zero-Shot Cross-Lingual Phonetic Recognition with External Language Embedding. Interspeech 2021: 1304-1308 - [c173]Kiran Ramnath
, Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo:
Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering. NAACL-HLT 2021: 1908-1919 - [i36]Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David D. Cox, Mark Hasegawa-Johnson:
Global Rhythm Style Transfer Without Text Transcriptions. CoRR abs/2106.08519 (2021) - [i35]Junghyun Lee, Gwangsu Kim, Matt Olfat, Mark Hasegawa-Johnson, Chang D. Yoo:
Fast and Efficient MMD-based Fair PCA via Optimization over Stiefel Manifold. CoRR abs/2109.11196 (2021) - 2020
- [j31]Odette Scharenborg
, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx
, Rachid Riad, Liming Wang
, Emmanuel Dupoux, Laurent Besacier, Alan W. Black
, Mark Hasegawa-Johnson
, Florian Metze
, Graham Neubig, Sebastian Stüker, Pierre Godard, Markus Müller:
Speech Technology for Unwritten Languages. IEEE ACM Trans. Audio Speech Lang. Process. 28: 964-975 (2020) - [j30]Liming Wang
, Mark Hasegawa-Johnson
:
Multimodal Word Discovery and Retrieval With Spoken Descriptions and Visual Concepts. IEEE ACM Trans. Audio Speech Lang. Process. 28: 1560-1573 (2020) - [c172]Tarek Sakakini, Jong Yoon Lee, Aditya Duri, Renato Ferreira Leitão Azevedo, Victor Sadauskas, Kuangxiao Gu, Suma Bhat
, Daniel G. Morrow, James Graumlich, Saqib Walayat, Mark Hasegawa-Johnson, Thomas S. Huang, Ann Willemsen-Dunlap, Donald Halpin:
Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains. LOUHI@EMNLP 2020: 115-126 - [c171]Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore:
F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder. ICASSP 2020: 6284-6288 - [c170]Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson:
Training Spoken Language Understanding Systems with Non-Parallel Speech and Text. ICASSP 2020: 8109-8113 - [c169]Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson, David D. Cox:
Unsupervised Speech Decomposition via Triple Information Bottleneck. ICML 2020: 7836-7846 - [c168]Jialu Li
, Mark Hasegawa-Johnson:
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? INTERSPEECH 2020: 1027-1031 - [c167]Ali Abavisani, Mark Hasegawa-Johnson:
Automatic Estimation of Intelligibility Measure for Consonants in Speech. INTERSPEECH 2020: 1161-1165 - [c166]Liming Wang, Mark Hasegawa-Johnson:
A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions. INTERSPEECH 2020: 1456-1460 - [c165]Leda Sari, Mark Hasegawa-Johnson:
Deep F-Measure Maximization for End-to-End Speech Understanding. INTERSPEECH 2020: 1580-1584 - [c164]Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg
:
Evaluating Automatically Generated Phoneme Captions for Images. INTERSPEECH 2020: 2317-2321 - [c163]Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari:
Identify Speakers in Cocktail Parties with End-to-End Attention. INTERSPEECH 2020: 3092-3096 - [c162]Piotr Zelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg
, Najim Dehak
:
That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages. INTERSPEECH 2020: 3705-3709 - [c161]Mark Hasegawa-Johnson
, Leanne Rolston, Camille Goudeseune
, Gina-Anne Levow, Katrin Kirchhoff
:
Grapheme-to-Phoneme Transduction for Cross-Language ASR. SLSP 2020: 3-19 - [i34]Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore:
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder. CoRR abs/2004.07370 (2020) - [i33]Kaizhi Qian, Yang Zhang, Shiyu Chang, David D. Cox, Mark Hasegawa-Johnson:
Unsupervised Speech Decomposition via Triple Information Bottleneck. CoRR abs/2004.11284 (2020) - [i32]Ali Abavisani, Mark Hasegawa-Johnson:
Automatic Estimation of Inteligibility Measure for Consonants in Speech. CoRR abs/2005.06065 (2020) - [i31]Piotr Zelasko
, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak:
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages. CoRR abs/2005.08118 (2020) - [i30]Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari:
Identify Speakers in Cocktail Parties with End-to-End Attention. CoRR abs/2005.11408 (2020) - [i29]Jialu Li, Mark Hasegawa-Johnson:
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? CoRR abs/2007.14351 (2020) - [i28]Justin van der Hout, Zoltán D'Haese, Mark Hasegawa-Johnson, Odette Scharenborg:
Evaluating Automatically Generated Phoneme Captions for Images. CoRR abs/2007.15916 (2020) - [i27]Leda Sari, Mark Hasegawa-Johnson:
Deep F-measure Maximization for End-to-End Speech Understanding. CoRR abs/2008.03425 (2020) - [i26]Wenda Chen, Jonathan Huang, Mark Hasegawa-Johnson:
Utterance-level Intent Recognition from Keywords. CoRR abs/2009.08064 (2020) - [i25]Siyuan Feng
, Piotr Zelasko, Laureano Moro-Velázquez, Ali Abavisani, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak:
How Phonotactics Affect Multilingual and Zero-shot ASR Performance. CoRR abs/2010.12104 (2020) - [i24]Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg:
Show and Speak: Directly Synthesize Spoken Description of Images. CoRR abs/2010.12267 (2020) - [i23]Junzhe Zhu, Mark Hasegawa-Johnson, Nancy McElwain:
A Comparison Study on Infant-Parent Voice Diarization. CoRR abs/2011.02698 (2020) - [i22]Zhonghao Wang, Mo Yu, Kai Wang, Jinjun Xiong, Wen-Mei Hwu, Mark Hasegawa-Johnson, Humphrey Shi:
Interpretable Visual Reasoning via Induced Symbolic Space. CoRR abs/2011.11603 (2020) - [i21]Junzhe Zhu, Raymond A. Yeh, Mark Hasegawa-Johnson:
Multi-Decoder DPRNN: High Accuracy Source Counting and Separation. CoRR abs/2011.12022 (2020) - [i20]