I am currently a Postdoctoral Researcher at the Research Center for Information Technology Innovation, Academia Sinica, in Taipei, Taiwan. I received a Ph.D. degree in Computer Science and Information Engineering from National Taiwan University, Taipei, Taiwan.
I am a Reviewer in leading journals/conferences, such as IEEE/ACM TASLP, IEEE SPL, IEEE J-STSP, IEEE ICASSP, Interspeech, IEEE ASRU, IEEE SLT, IEEE ICME, Speech Communication, etc. I am also a co-organizer of VoiceMOS Challenge 2024.
My research interests include deep learning, speech processing, speech recognition, and non-intrusive speech assessment. Please kindly check the following link for more updated publications.
Selected Publications
R. E. Zezario, S. M. Siniscalchi, H.-M. Wang, and Y. Tsao, “A Study on Zero-shot Non-intrusive Speech Assessment using Large Language Models”, to appear in ICASSP 2025 - 2025 IEEE International Conference on Acoustics. [pdf]
D. A. M. G. Wisnu, S. Rini, R. E. Zezario, H.-M. Wang, and Y. Tsao, “HAAQI-Net: A Non-intrusive Neural Music Audio Quality Assessment Model for Hearing Aids,” to appear in IEEE/ACM Transactions on Audio, Speech, and Language Processing. [pdf]
W.-C. Huang, S.-W. Fu, E. Cooper, R. E. Zezario, T. Toda, H.-M. Wang, J. Yamagishi, and Y. Tsao,”The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction”, to appear in IEEE Workshop on Spoken Language Technology (SLT2024). [pdf]
R. E. Zezario, F. Chen, C.-S. Fuh, H.-M. Wang, and Y.Tsao, “Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata,” INTERSPEECH 2024, pp. 3844-3848, 2024. [pdf]
R. E. Zezario, Y.-W. Chen, S.-W. Fu, Y. Tsao, H. -M. Wang and C. -S. Fuh, “A Study on Incorporating Whisper for Robust Speech Assessment,” IEEE ICME 2024. (Top Performance on the Track 3 - VoiceMOS Challenge 2023) [pdf] [dataset] [github]
R. E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao, “Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model,” ICASSP 2024 - 2024 IEEE International Conference on Acoustics, pp. 831-835, 2024. [pdf]
R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54-70, 2023. (IEEE Signal Processing Societys top 25 downloaded articles (Sep.2022 - Sep.2023))[pdf] [code]
R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MTI-Net: A Multi-Target Speech Intelligibility Prediction Model,” INTERSPEECH 2022, pp. 5463-5467, 2022. [pdf] [code]
R. E. Zezario, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids,” INTERSPEECH 2022, pp. 3944-3948, 2022. (Gold Prize for the best non-intrusive systems at Clarity Prediction Challenge 2022) [pdf] [code]
R. E. Zezario, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Speech Enhancement with Zero-Shot Model Selection,” 2021 29th European Signal Processing Conference (EUSIPCO), pp. 491-495, 2021. [pdf] [code]
R. E. Zezario, S. -W. Fu, C. -S. Fuh, Y. Tsao and H. -M. Wang, “STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model,” 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 482-486, 2020. [pdf] [code]
C. Yu* , R. E. Zezario* , S.-S. Wang, J. Sherman, Y.-Y. Hsieh, X. Lu, H.-M. Wang, and Y. Tsao, “Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2756-2769, 2020. (* equal contribution) [pdf]
R. E. Zezario, T. Hussain, X. Lu, H. -M. Wang and Y. Tsao, “Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6669-6673, 2020. [pdf]
R. E. Zezario, S.-W. Fu, X. Lu, H.-M. Wang, and Y. Tsao, “Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric,” INTERSPEECH 2019, pp.3168- 3172, 2019. [pdf] [code]
R. E. Zezario, J. Huang, X. Lu, Y. Tsao, H. Hwang and H. Wang, “Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement,” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 373-377, 2018. [pdf]
C. -Y. Hsu, R. E. Zezario, J. -C. Wang, C. -W. Ho, X. Lu and Y. Tsao, “Incorporating local environment information with ensemble neural networks to robust automatic speech recognition,” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5, 2016. [pdf]