I am currently a Postdoctoral Researcher at the Research Center for Information Technology Innovation, Academia Sinica, in Taipei, Taiwan. I received a Ph.D. degree in Computer Science and Information Engineering from National Taiwan University, Taipei, Taiwan.

I am a Reviewer in leading journals/conferences, such as IEEE/ACM TASLP, IEEE SPL, IEEE ICASSP, Interspeech, IEEE ASRU, IEEE SLT, IEEE ICME, Speech Communication, etc. I am also a co-organizer of VoiceMOS Challenge 2024.

My research interests include deep learning, speech processing, and deep learning-based non-intrusive speech assessment model. Please kindly check the following link for more update publications.

Selected Publications

R. E. Zezario, F. Chen, C.-S. Fuh, H.-M. Wang, and Y.Tsao, “Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata,” INTERSPEECH 2024, pp. 3844-3848, 2024. [pdf]

W.-C. Huang, S.-W. Fu, E. Cooper, R. E. Zezario, T. Toda, H.-M. Wang, J. Yamagishi, and Y. Tsao,”The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction”, to appear in IEEE Workshop on Spoken Language Technology (SLT2024). [pdf]

R. E. Zezario, Y.-W. Chen, S.-W. Fu, Y. Tsao, H. -M. Wang and C. -S. Fuh, “A Study on Incorporating Whisper for Robust Speech Assessment,” IEEE ICME 2024. (Top Performance on the Track 3 - VoiceMOS Challenge 2023) [pdf] [dataset] [github]

R. E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao, “Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model,” ICASSP 2024 - 2024 IEEE International Conference on Acoustics, pp. 831-835, 2024. [pdf]

R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54-70, 2023. (IEEE Signal Processing Societys top 25 downloaded articles (Sep.2022 - Sep.2023))[pdf] [code]

R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MTI-Net: A Multi-Target Speech Intelligibility Prediction Model,” INTERSPEECH 2022, pp. 5463-5467, 2022. [pdf] [code]

R. E. Zezario, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids,” INTERSPEECH 2022, pp. 3944-3948, 2022. (Gold Prize for the best non-intrusive systems at Clarity Prediction Challenge 2022) [pdf] [code]

R. E. Zezario, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Speech Enhancement with Zero-Shot Model Selection,” 2021 29th European Signal Processing Conference (EUSIPCO), pp. 491-495, 2021. [pdf] [code]

R. E. Zezario, S. -W. Fu, C. -S. Fuh, Y. Tsao and H. -M. Wang, “STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model,” 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 482-486, 2020. [pdf] [code]

C. Yu* , R. E. Zezario* , S.-S. Wang, J. Sherman, Y.-Y. Hsieh, X. Lu, H.-M. Wang, and Y. Tsao, “Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2756-2769, 2020. (* equal contribution) [pdf]

R. E. Zezario, T. Hussain, X. Lu, H. -M. Wang and Y. Tsao, “Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6669-6673, 2020. [pdf]

R. E. Zezario, J. W. C. Sigalingging, T. Hussain, J. -C. Wang and Y. Tsao, “Comparative Study of Masking and Mapping Based on Hierarchical Extreme Learning Machine for Speech Enhancement,” 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 1-2, 2019. [pdf]

R. E. Zezario,S.-W. Fu, X. Lu, H.-M. Wang, and Y. Tsao, “Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric,” INTERSPEECH 2019, pp.3168- 3172, 2019. [pdf] [code]

R. E. Zezario, J. Huang, X. Lu, Y. Tsao, H. Hwang and H. Wang, “Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement,” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 373-377, 2018. [pdf]

C. -Y. Hsu, R. E. Zezario, J. -C. Wang, C. -W. Ho, X. Lu and Y. Tsao, “Incorporating local environment information with ensemble neural networks to robust automatic speech recognition,” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5, 2016. [pdf]