Xueyao Zhang

Xueyao Zhang (张雪遥)

Ph.D. student,
School of Data Science,
The Chinese University of Hong Kong, Shenzhen

E-mail: xueyaozhang [AT] link.cuhk.edu.cn

Curriculum Vitae / Google Scholar / DBLP / GitHub / Zhihu

About me

I'm a third-year Ph.D. student at the Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), supervised by Professor Zhizheng Wu. I'm a co-founder of Amphion , which is an open-source toolkit for Audio, Music, and Speech Generation.

Before CUHK-Shenzhen, I received my master's degree (2019-2022) at the Institute of Computing Technology Chinese Academy of Sciences (ICT, CAS), researching Fake News Detection and Fact Checking.

My research interests include:

Applications: Speech Generation, AI Music & AI for Music, AI for Social Good
Technologies: Generative Model, Representation Learning

Milestones

2025/04	My first experience in organizing an academic competition, Singing Voice Conversion Challenge 2025.
2025/01	My first paper about speech generation got accepted by ICLR 2025.
2024/08	My first paper about singing voice processing got accepted by IEEE SLT 2024.
2024/05	I performed research at Meta (California, USA) as an intern during summer 2024.
2023/12	My first attempt at leading the development of a large-scale open-source project, Amphion.
2023/04	I was admitted by Tencent Rhino-Bird Talent Program 腾讯犀牛鸟精英人才计划 (Top 50+ of China).
2022/09	I entered the Chinese University of Hong Kong (Shenzhen) as a Ph.D. student.
2022/06	My first paper about music generation got accepted by ACM MM 2022.
2021/11	I got the National Graduate Scholarship 2021 (funded by Ministry of Education of China).
2021/01	My first paper about fake news detection got accepted by WWW 2021.
2020/01	I got the third place at Campus Singer Competition, University of Chinese Academy of Sciences.

✍ Call for Cooperations

Our team is broadly interested in Audio/Speech processing and synthesis, DeepFake detection, and AI + Music. We publish our work in the top conferences and journals, and deploy research output to products by collaborating with industry. Our team has open positions for Postdocs, PhD students, Research Assistants, and visiting research students/researchers (more information can be seen here). If you are willing to join our team, feel free to contact wuzhizheng [AT] cuhk.edu.cn. Besides, you are always welcome to discuss any ideas with me.

Representative Works

Speech, Music, and Audio Generation

ACL 2025 Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang*, Yuancheng Wang*, Chaoren Wang, Ziniu Li, Zhuo Chen, Zhizheng Wu (*: Equal Contribution)
Proceedings of the Association for Computational Linguistics 2025
Preprint / Demo
TL;DR: We propose an intelligibility preference speech dataset with specially designed DPO extensions, to improve zero-shot TTS intelligibility and overall quality.

ICLR 2025 Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Xueyao Zhang, Xiaohui Zhang, Kainan Peng, Zhenyu Tang, Vimal Manohar, Yingru Liu, Jeff Hwang, Dangna Li, Yuhao Wang, Julian Chan, Yuan Huang, Zhizheng Wu, Mingbo Ma
Proceedings of the International Conference on Learning Representations 2025
Paper / Poster / Code / HuggingFace / Demo
TL;DR: We propose a versatile zero-shot voice imitation framework, with controllable timbre and style.

SLT 2024

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang*, Liumeng Xue*, Yicheng Gu*, Yuancheng Wang*, Jiaqi Li*, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu (*: Equal Contribution)
Proceedings of the IEEE Spoken Language Technology Workshop 2024
Preprint / GitHub / HuggingFace / OpenXLab
TL;DR: We develop a unified audio generation open-source toolkit.

SLT 2024 Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion
Xueyao Zhang, Zihao Fang, Yicheng Gu, Haopeng Chen, Lexiao Zou, Liumeng Xue, Zhizheng Wu
Proceedings of the IEEE Spoken Language Technology Workshop 2024
Preprint / Code / Demo / Pretrained Model / HuggingFace Space / OpenXLab App
TL;DR: We propose to utilize multiple content features for singing voice conversion.

MM 2022 Structure-Enhanced Pop Music Generation via Harmony-Aware Learning
Xueyao Zhang, Jinchao Zhang, Yao Qiu, Li Wang, Jie Zhou
Proceedings of the ACM International Conference on Multimedia 2022
PDF / Preprint / Code / Slides / Demo
TL;DR: We propose to learn harmony for generating form- and texture- enhanced pop music.

Fake News Detection

WWW 2021 > 400 citations Mining Dual Emotion for Fake News Detection
Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu
Proceedings of the Web Conference 2021
PDF / Code / Slides / Video / Chinese Video
TL;DR: We leverage both publisher emotion and social emotion for fake news detection.

CIKM 2021 Integrating Pattern- and Fact-based Fake News Detection via Model Preference Learning
Qiang Sheng*, Xueyao Zhang*, Juan Cao, and Lei Zhong (*: Equal Contribution)
Proceedings of the ACM International Conference on Information and Knowledge Management 2021
PDF / Poster / Code / Chinese Blog
TL;DR: We propose a graph-based model preference learning framework to separately handle the pattern and fact indicators in fake news detection.

ACL 2022 Zoom Out and Observe: News Environment Perception for Fake News Detection
Qiang Sheng, Juan Cao, Xueyao Zhang, Rundong Li, Danding Wang, and Yongchun Zhu
Proceedings of the Annual Meeting of the Association for Computational Linguistics 2022
PDF / Poster / Code / Chinese Video / Chinese Blog
TL;DR: For the first time, we propose to perceive signals from the news environment for fake news detection.

👉 Full Publications

Presentations and Talks

2025/04	I was honored to be invited to give a talk, Controllable and Unified Speech and Singing Voice Generation [Slides], at Professor Ye Wang's Lab at National University of Singapore.
2024/12	I was honored to be invited to give a talk, A Comprehensive Guide to Amphion's Singing Voice Conversion [Slides], at Singing Voice Deepfake Detection (SVDD) Workshop @ IEEE SLT 2024 [Website].
2024/04	I was honored to be invited to give a guest lecture, Introduction of Sound, Speech, and Singing Voice [Slides], at Professor Kanyuan Huang's course at CUHK-Shenzhen (2024/4/24).
2024/02	I was honored to be invited to give talks, A Comprehensive Guide to Amphion's Singing Voice Conversion (Amphion的歌声转换指南) [Slides], at Professor Yong Qin's Lab at Nankai University (2024/01/04), Speech Home 语音之家 (2024/01/12), BAAI Talk 智源社区 (2024/01/16) [Recording], TechBeat 将门创投 (2024/02/07) [Recording], Professor Li Liu's course at the Hong Kong University of Science and Technology, Guangzhou (2024/2/27).
2023/01	I drafted a singing voice conversion tutorial at CSC3160 (CUHK-Shenzhen). [Blog]
2022/10	I got the Best Presentation Award at Huawei-CUHKSZ NLP/Speech Workshop by presenting my research work, Structure-Enhanced Pop Music Generation via Harmony-Aware Learning. [Slides] [Award]
2022/06	I gave a talk at two labs of ICT as an outstanding graduate, 如何做有新意的研究——以“做问题”的视角. [Chinese Blog] [Chinese Slides] The talk was also presented at SDS Early Career Colloquium of CUHK-Shenzhen, Towards a novel research in a problem-driven way. [Slides]
2022/05	My master's theis defense, Research on Fake News Detection Based on Emotion (基于情感的虚假新闻检测方法研究). [Chinese Slides]
2022/01	I participated to release the theme song of WeChat Open Class 2022 as a co-producer, Ru Wei（入微）, which was composed by AI and sung by humans.
2021/05	I gave a talk about composition at Pattern Recognition Center of Wechat AI, How to create a pop song? (一首流行歌是如何创作的). [Chinese Slides]

Professional Experiences

Meta Platforms, Inc	Research Scientist Intern, Generative AI, @Menlo Park, California, USA (2024/05 ~ 2024/09) #Speech Generation#
WeChat, Tencent	Research Intern, Pattern Recognition Center of Wechat AI, @Beijing, China (2021/04 ~ 2022/02, 2023/06 ~ 2024/05) #Music Generation#, #Singing Voice Conversion#

Services

Reviewer	Conferences: ACL Rolling Review CSCW 2021 EMNLP 2021 ICASSP 2023 (valuable reviewer), 2024 (valuable reviewer), 2025 ICLR 2025 ICMC 2023 MM 2023, 2024 NCMMSC 2023, 2024 Journals: EURASIP Journal on Audio, Speech, and Music Processing IEEE Signal Processing Letters (SPL) IEEE Transactions on Audio, Speech and Language Processing (TASLP) IEEE Transactions on Computational Social Systems (TCSS) Information Processing and Management (IP&M) Journal of Chinese Information Processing (中文信息学报)
Student Volunteer	IEEE Spoken Language Technology Workshop 2024
Competition Organizer	The Singing Voice Conversion Challenge 2025. Organized by researchers from Nagoya University, CUHK-Shenzhen, Carnegie Mellon University, and National Institute of Informatics.
Teaching Assistant	2017 Fall, Object-Oriented Programming (JAVA), Wuhan University 2022 Fall, CSC3100 Data Structures, CUHK-Shenzhen 2023 Spring, CSC3160/MDS6002 Fundamentals of Speech and Language Processing, CUHK-Shenzhen. I got the Best Teaching Assistant Award. 2023 Fall, CSC4130 Introduction to Human-Computer Interaction, CUHK-Shenzhen 2024 Spring, CSC4050 Computing Capstone, CUHK-Shenzhen 2024 Fall, CSC1001 Introduction to Computer Science: Programming Methodology, CUHK-Shenzhen

Education

2022-	Ph.D. student in Data Science, supervised by Professor Zhizheng Wu, School of Data Science, The Chinese University of Hong Kong, Shenzhen
2019-2022	Master in Computer Application Technology (Research-based), supervised by Professor Juan Cao, working closely with Professor Qiang Sheng, Institute of Computing Technology Chinese Academy of Sciences, University of Chinese Academy of Sciences
2015-2019	B.Eng. in Software Engineering, School of Computer Science, Wuhan University
2012-2015	Jiyuan No.1 Middle School of Henan

Honors and Awards

2023	Admitted by Tencent Rhino-Bird Talent Program 腾讯犀牛鸟精英人才计划 (Top 50+ of China)
2022	Outstanding Graduate, University of Chinese Academy of Sciences & Beijing Municipal Education Commission (Top 5%)
2021	National Graduate Scholarship, Ministry of Education of China (Top 0.2%)
2020	Third place at Campus Singer Competition, University of Chinese Academy of Sciences (Top 3 among over 50,000)
2019	Outstanding Graduate, Wuhan University (Top 10%)
2019	Excellent Bachelor Thesis, Wuhan University (Top 5%)
2016	National Undergraduate Scholarship, Ministry of Education of China (Top 0.2%)
2014	First price in Chinese High School Mathematics League (Top 50 in Henan Province)

Blogs

#年终总结#	2024 《2024与龙年：世界、视野与身边》 2023 《2023与兔年：开端、挑战与不经意间》 2022 《2022年：伤痛、时间与起点》 2021 《2021年：收获、决定与困惑》 2020 《去年》 2019 《2019大事记》 2018 《请回答2018》 2017 《2017流水账》
#哲学思考#	2020/02/10 《对人工智能艺术创作的哲学思考》 2016/11/24 《为什么有有，没有没有》
#关于大学#	2019/10/02 《后大学时代》 2018/08/04 《的大学》
#关于高考#	2017/10/20 《后高考时代｜感情篇》 2016/05/02 《后高考时代｜思辨篇》 2016/04/26 《后高考时代｜离别篇》