I am currently a Postdoctoral Researcher at the Research Center for Information Technology Innovation, Academia Sinica, in Taipei, Taiwan. I was previously a Research Assistant at the same institute and also worked as an Applied Scientist II Intern at Amazon in California, USA.

I received a Ph.D. degree in Computer Science and Information Engineering from National Taiwan University, Taipei, Taiwan.

I was honored with the Gold Prize for the best non-intrusive systems and 1st place for the Hearing Industry Research Consortium student prizes at the Clarity Prediction Challenge, the 2nd Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2022). I was also honored with the Best Reviewer award of IEEE ASRU 2023.

I am also a Reviewer in leading journals/conferences, such as IEEE/ACM TASLP, IEEE SPL, IEEE ICASSP, Interspeech, IEEE ASRU, IEEE ICME, Expert Systems with Applications, etc.

My research interests include deep learning, speech processing, and deep learning-based non-intrusive speech assessment model. Please kindly check the following link for more update publications.

Selected Publications

R. E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao, “Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model,” to appear in IEEE ICASSP 2024. [pdf]

R. E. Zezario, Y.-W. Chen, S.-W. Fu, Y. Tsao, H. -M. Wang and C. -S. Fuh, “A Study on Incorporating Whisper for Robust Speech Assessment,” to appear in IEEE ICME 2024. (Top Performance on the Track 3 - VoiceMOS Challenge 2023) [pdf] [dataset] [github]

R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 54-70, 2023. [pdf] [code]

R. E. Zezario, S.-W. Fu, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MTI-Net: A Multi-Target Speech Intelligibility Prediction Model,” INTERSPEECH 2022, pp. 5463-5467, 2022. [pdf] [code]

R. E. Zezario, F. Chen, C. -S. Fuh, H. -M. Wang and Y. Tsao, “MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids,” INTERSPEECH 2022, pp. 3944-3948, 2022. (Gold Prize for the best non-intrusive systems at Clarity Prediction Challenge 2022) [pdf] [code]

R. E. Zezario, C. -S. Fuh, H. -M. Wang and Y. Tsao, “Speech Enhancement with Zero-Shot Model Selection,” 2021 29th European Signal Processing Conference (EUSIPCO), pp. 491-495, 2021. [pdf] [code]

R. E. Zezario, S. -W. Fu, C. -S. Fuh, Y. Tsao and H. -M. Wang, “STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model,” 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 482-486, 2020. [pdf] [code]

C. Yu* , R. E. Zezario* , S.-S. Wang, J. Sherman, Y.-Y. Hsieh, X. Lu, H.-M. Wang, and Y. Tsao, “Speech Enhancement Based on Denoising Autoencoder With Multi-Branched Encoders,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2756-2769, 2020. (* equal contribution) [pdf]

R. E. Zezario, T. Hussain, X. Lu, H. -M. Wang and Y. Tsao, “Self-Supervised Denoising Autoencoder with Linear Regression Decoder for Speech Enhancement,” ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6669-6673, 2020. [pdf]

R. E. Zezario, J. W. C. Sigalingging, T. Hussain, J. -C. Wang and Y. Tsao, “Comparative Study of Masking and Mapping Based on Hierarchical Extreme Learning Machine for Speech Enhancement,” 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), pp. 1-2, 2019. [pdf]

R. E. Zezario,S.-W. Fu, X. Lu, H.-M. Wang, and Y. Tsao, “Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric,” INTERSPEECH 2019, pp.3168- 3172, 2019. [pdf] [code]

R. E. Zezario, J. Huang, X. Lu, Y. Tsao, H. Hwang and H. Wang, “Deep Denoising Autoencoder Based Post Filtering for Speech Enhancement,” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 373-377, 2018. [pdf]

C. -Y. Hsu, R. E. Zezario, J. -C. Wang, C. -W. Ho, X. Lu and Y. Tsao, “Incorporating local environment information with ensemble neural networks to robust automatic speech recognition,” 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP), pp. 1-5, 2016. [pdf]