-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathpublications.html
More file actions
2571 lines (2557 loc) · 184 KB
/
publications.html
File metadata and controls
2571 lines (2557 loc) · 184 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="zxx">
<head>
<meta charset="utf-8">
<title>THUHCSI</title>
<!-- mobile responsive meta -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1">
<!-- ** Plugins Needed for the Project ** -->
<!-- Bootstrap -->
<link rel="stylesheet" href="plugins/bootstrap/bootstrap.min.css">
<!-- slick slider -->
<link rel="stylesheet" href="plugins/slick/slick.css">
<!-- themefy-icon -->
<link rel="stylesheet" href="plugins/themify-icons/themify-icons.css">
<!-- animation css -->
<link rel="stylesheet" href="plugins/animate/animate.css">
<!-- aos -->
<link rel="stylesheet" href="plugins/aos/aos.css">
<!-- venobox popup -->
<link rel="stylesheet" href="plugins/venobox/venobox.css">
<!-- Main Stylesheet -->
<link href="css/style.css" rel="stylesheet">
<!--Favicon-->
<link rel="shortcut icon" href="images/favicon.jpg" type="image/x-icon">
<link rel="icon" href="images/favicon.jpg" type="image/x-icon">
</head>
<body>
<!-- preloader start --
<div class="preloader">
<img src="images/hcsi-pre.gif" alt="preloader" width="500px">
</div>
<!-- preloader end -->
<!-- header -->
<header class="fixed-top header">
<!-- navbar -->
<div class="navigation w-100">
<div class="container">
<nav class="navbar navbar-expand-lg navbar-dark p-0">
<!-- logo -->
<a class="navbar-brand" href="index.html"><img src="images/logo.png" alt="logo" width="250"></a>
<button class="navbar-toggler rounded-0" type="button" data-toggle="collapse" data-target="#navigation" aria-controls="navigation" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<!-- menu -->
<div class="collapse navbar-collapse" id="navigation">
<ul class="navbar-nav ml-auto text-center">
<!-- about -->
<li class="nav-item dropdown view @@about">
<a class="nav-link dropdown-toggle" href="#" id="navbarAbout" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
About
</a>
<div class="dropdown-menu" aria-labelledby="navbarAbout">
<a class="dropdown-item" href="labintro.html">Introduction</a>
<a class="dropdown-item" href="researches.html">Research Areas</a>
<a class="dropdown-item" href="joinus.html">Join Us</a>
<a class="dropdown-item" href="collaborators.html">Collaborators</a>
</div>
</li>
<!-- members -->
<li class="nav-item dropdown view @@members">
<a class="nav-link dropdown-toggle" href="#" id="navbarMembers" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Members
</a>
<div class="dropdown-menu" aria-labelledby="navbarMembers">
<a class="dropdown-item" href="zywu.html">Director</a>
<a class="dropdown-item" href="members.html">Students</a>
<a class="dropdown-item" href="alumni.html">Alumni</a>
<a class="dropdown-item" href="stories.html">Stories</a>
<a class="dropdown-item" href="honors.html">Honors</a>
</div>
</li>
<!-- news -->
<li class="nav-item @@news">
<a class="nav-link" href="news.html">News</a>
</li>
<!-- research -->
<li class="nav-item dropdown view active">
<a class="nav-link dropdown-toggle" href="#" id="navbarResearch" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Research
</a>
<div class="dropdown-menu" aria-labelledby="navbarResearch">
<a class="dropdown-item active" href="publications.html">Publications</a>
<a class="dropdown-item" href="patents.html">Patents</a>
<a class="dropdown-item" href="projects.html">Projects</a>
</div>
</li>
<!-- awards -->
<li class="nav-item @@awards">
<a class="nav-link" href="awards.html">Awards</a>
</li>
<!-- demos -->
<li class="nav-item @@demos">
<a class="nav-link" href="demos.html">Demos</a>
</li>
</ul>
</div>
<!-- /menu -->
</nav>
</div>
</div>
<!-- /navbar -->
</header>
<!-- /header -->
<!-- page title -->
<section class="page-title-section overlay" data-background="images/backgrounds/hcsi-gather.jpg">
<div class="container">
<div class="row">
<div class="col-md-8">
<ul class="list-inline custom-breadcrumb">
<li class="list-inline-item"><p class="h2 text-primary font-secondary">Publications</p></li>
<li class="list-inline-item text-white h3 font-secondary"></li>
</ul>
<p class="text-lighten"> </p>
</div>
</div>
</div>
</section>
<!-- /page title -->
<!-- papers -->
<section class="section">
<div class="container">
<!-- paper category navigation bar -->
<div class="row">
<div class="col-12">
<!-- nav tab -->
<div class="border-bottom">
<ul class="nav nav-pills text-center">
<li class="navi-item">
<a class="nav-link active" href="#year" data-toggle="pill">
by year
</a>
</li>
<li class="navi-item">
<a class="nav-link" href="#cat" data-toggle="pill">
by category
</a>
</li>
</ul>
</div>
<!-- nav tab content -->
<div class="tab-content" id="pills-tabContent">
<div class="tab-pane fade show active" id="year">
<div class="col-12">
<div class="mb-3"></div>
<!-- paper category list -->
<ul class="list-inline text-center filter-controls mb-3">
<li class="list-inline-item px-2 mb-3 active" data-filter="all">All</li>
<li class="list-inline-item px-2 mb-3" data-filter="2026">2026</li>
<li class="list-inline-item px-2 mb-3" data-filter="2025">2025</li>
<li class="list-inline-item px-2 mb-3" data-filter="2024">2024</li>
<li class="list-inline-item px-2 mb-3" data-filter="2023">2023</li>
<li class="list-inline-item px-2 mb-3" data-filter="2022">2022</li>
<li class="list-inline-item px-2 mb-3" data-filter="2021">2021</li>
<li class="list-inline-item px-2 mb-3" data-filter="2020">2020</li>
<li class="list-inline-item px-2 mb-3" data-filter="2019">2019</li>
<li class="list-inline-item px-2 mb-3" data-filter="2018">2018</li>
<li class="list-inline-item px-2 mb-3" data-filter="2017">2017</li>
<li class="list-inline-item px-2 mb-3" data-filter="2016">2016</li>
<li class="list-inline-item px-2 mb-3" data-filter="2015">2015</li>
<li class="list-inline-item px-2 mb-3" data-filter="2014">2014</li>
<li class="list-inline-item px-2 mb-3" data-filter="2013">2013</li>
<li class="list-inline-item px-2 mb-3" data-filter="2012">2012</li>
<li class="list-inline-item px-2 mb-3" data-filter="2011">2011</li>
<li class="list-inline-item px-2 mb-3" data-filter="2010">2010</li>
<li class="list-inline-item px-2 mb-3" data-filter="2009">2009</li>
<li class="list-inline-item px-2 mb-3" data-filter="2008">2008</li>
<li class="list-inline-item px-2 mb-3" data-filter="2007">2007</li>
<li class="list-inline-item px-2 mb-3" data-filter="2006">2006</li>
<li class="list-inline-item px-2 mb-3" data-filter="2005">2005</li>
</ul>
</div>
</div>
<div class="tab-pane fade" id="cat">
<div class="col-12">
<div class="mb-3"></div>
<!-- paper category list -->
<ul class="list-inline text-center filter-controls mb-3">
<li class="list-inline-item px-2 mb-3 active" data-filter="all">All</li>
<li class="list-inline-item px-2 mb-3" data-filter="jnl">Journal</li>
<li class="list-inline-item px-2 mb-3" data-filter="cf">Conference</li>
<li class="list-inline-item px-2 mb-3" data-filter="sel">Selected</li>
</ul>
</div>
</div>
</div>
</div>
</div>
<!-- papers list -->
<div class="filtr-container">
<!-- PUBLICATIONS-PH -->
<!-- paper -->
<div data-category="2026,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Liyang Chen, Tianxiang Ma, Jiawei Liu, Bingchuan Li, Zhuowei Chen, Lijie Liu, Xu He, Gen Li, Qian He, Zhiyong Wu. "Human-Centric Video Generation via Collaborative Multi-Modal Conditioning," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, vol. 40, no. 4, pp. 2939-2947. AAAI, Singapore, January 20-27, 2026.. <span class="text-lighten">(EI, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2026,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yuanyuan Wang, Dongchao Yang, Yiwen Shao, Hangting Chen, Jiankun Zhao, Zhiyong Wu, Helen Meng, Xixin Wu. "DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, vol. 40, no. 40, pp. 33728-33736. AAAI, Singapore, January 20-27, 2026.. <span class="text-lighten">(EI, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiwei Xue, Xiangyang Luo, Zhanghao Hu, Xin Zhang, Xunzhi Xiang, Yuqin Dai, Jianzhuang Liu, Zhensong Zhang, Minglei Li, Jian Yang, Fei Ma, Zhiyong Wu, Changpeng Yang, Zonghong Dai, Fei Richard Yu. "Human Motion Video Generation: A Survey," <i>IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)</i>, vol. 47, no. 11, pp. 10709-10730. IEEE, July 31, 2025.. <span class="text-lighten">(SCI, EI: 0253218942512, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingbei Li, Weihao Wu, Yi Meng, Luwen Zhang, Qiao Tian, Yuping Wang, Yuxuan Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "Inferring Speaking Styles for Conversational Speech Synthesis by Learning Contextual Dependencies," <i>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</i>, vol. 33, pp. 3160-3173. IEEE, July 16, 2025.. <span class="text-lighten">(SCI, EI, CCF-B, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jinjiang Liu, Hao Li, Fei Chen, Zhiyong Wu, Xueliang Zhang. "Inplace Frequency Filtering and Cepstral Speech Modeling in Binaural Speech Enhancement," <i>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</i>, vol. 33, pp. 2775-2787. IEEE, June 16, 2025.. <span class="text-lighten">(SCI, EI, CCF-B, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Liyang Chen, Weihong Bao, Shun Lei, Boshi Tang, Zhiyong Wu, Shiyin Kang, Haozhi Huang, Helen Meng. "AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation," <i>IEEE Transactions on Multimedia (TMM)</i>, vol. 27, pp. 3598-3609. IEEE, February 13, 2025.. <span class="text-lighten">(SCI, EI: 20250817931440, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiwei Xue, Yanbo Fan, Xuan Wang, Zhiyong Wu. "Echo: Enhancing Conversational Behavior Generation via Hierarchical Semantic Comprehension with Large Language Models," [in] <i>SIGGRAPH Asia Conference Papers (SA)</i>, pp. 1-9. ACM, Hong Kong, China, December 15-18, 2025.. <span class="text-lighten">(EI, CCF-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhisheng Zhang, Derui Wang, Yifan Mi, Zhiyong Wu, JieGao, Yuxin Cao, Kai Ye, Jason Xue, Jie Hao. "E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis," [in] <i>Annual Conference on Neural Information Processing Systems (NeurIPS)</i>, pp. XXXX-XXXX. MIT Press, San Diego, USA, December 2-7, 2025.. <span class="text-lighten">(EI, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shun Lei, Yaoxun Xu, Zhiwei Lin, Huaicheng Zhang, Wei Tan, Hangting Chen, Yixuan Zhang, Chenyu Yang, Haina Zhu, Shuai Wang, Zhiyong Wu, Dong Yu. "LeVo: High-Quality Song Generation with Multi-Preference Alignment," [in] <i>Annual Conference on Neural Information Processing Systems (NeurIPS)</i>, pp. XXXX-XXXX. MIT Press, San Diego, USA, December 2-7, 2025.. <span class="text-lighten">(EI, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Shun Lei, Zhiwei Lin, Rongzhi Gu, Zhiyong Wu. "MuCodec: Ultra Low-Bitrate Music Codec for Music Generation," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 689-698. ACM, Dublin, Ireland, October 27-31, 2025.. <span class="text-lighten">(EI: 20255019681816, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Songtao Zhou, Xiaoyu Qin, Yixuan Zhou, Qixin Wang, Zeyu Jin, Zixuan Wang, Zhiyong Wu, Jia Jia. "HarmoniVox: Painting Voices to Match the Avatar's Soul," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 6720-6729. ACM, Dublin, Ireland, October 27-31, 2025.. <span class="text-lighten">(EI: 20255019681841, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiwei Xue, Zhensong Zhang, Minglei Li, Zonghong Dai, Fei Yu, Fei Ma, Zhiyong Wu. "VideoHumanMIB: Unlocking Appearance Decoupling for Video Human Motion In-betweening," [in] <i>International Joint Conference on Artificial Intelligence (IJCAI)</i>, pp. 4254-4262. Morgan Kaufmann, Montreal, Canada, August 16-22, 2025.. <span class="text-lighten">(EI: 20254719524923, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Peng Liu, Dongyang Dai, Zhiyong Wu. "RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction," [in] <i>International Conference on Learning Representations (ICLR)</i>, pp. 39921-39953. Singapore, April 24-28, 2025.. <span class="text-lighten">(EI: 20252818762417, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xu He, Zhiyong Wu, Xiaoyu Li, Di Kang, Chaopeng Zhang, Jiangnan Ye, Liyang Chen, Xiangjun Gao, Han Zhang, Haolin Zhuang. "MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, pp. 3437-3445. AAAI, Philadelphia, USA, February 25-March 4, 2025.. <span class="text-lighten">(EI: 20251818357154, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Renjie Yu, Runrui Cai, Yixuan Zhou, Runchuan Ye, Zhiyong Wu. "A Dual-Branch Ensemble Framework for Personality Recognition Based on Multimodal Emotion Features," [in] <i>International Workshop on Multimodal and Responsible Affective Computing (MRAC)</i>, pp. 51-57. Dublin, Ireland, October 31, 2025.. <span class="text-lighten">(EI, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yinlong Zhang, Jinjiang Liu, Jiawei Jin, Jiuxin Lin, Zhiyong Wu. "CDSS: Innovating Cross Differential Attention for Robust Monaural Multi-Speaker Audio-Visual Speech Separation," [in] <i>International Conference on Intelligent Computing (ICIC)</i>, pp. 1-18. Springer, Ningbo, China, July 26-29, 2025.. <span class="text-lighten">(EI, CCF-C)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zijian Lin, Yang Zhang, Yougen Yuan, Yuming Yan, Jinjiang Liu, Zhiyong Wu, Pengfei Hu, Qun Yu. "Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 5533-5537. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419420178, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jiawei Jin, Zhihan Yang, Yixuan Zhou, Zhiyong Wu. "In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1393-1397. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419419838, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Fengjin Li, Jie Wang, Yadong Niu, Yongqing Wang, Meng Meng, Jian Luan, Zhiyong Wu. "StarVC: A Unified Auto-Regressive Framework for Joint Text and Speech Generation in Voice Conversion," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4593-4597. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419419786, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Wei Chen, Binzhu Sha, Dan Luo, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu. "DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1263-1267. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419419812, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng. "DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 2113-2117. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419419432, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Jianwei Yu, Hangting Chen, Zhiyong Wu, Xixin Wu, Dong Yu, Rongzhi Gu, Yi Luo. "WAKE: Watermarking Audio with Key Enrichment," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 5093-5097. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419420040, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiyun Li, Zhiyong Wu, Xiaofeng Xie, Jingran Xie, Yaoxun Xu, Hanyang Peng. "VoiceMark: Zero-Shot Voice Cloning-Resistant Watermarking Approach Leveraging Speaker-Specific Latents," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 5108-5112. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419420043, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingran Xie, Xiang Li, Hui Wang, Yue Yu, Yang Xiang, Xixin Wu, Zhiyong Wu. "Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 2430-2434. ISCA, Rotterdam, The Netherlands, August 17-21, 2025.. <span class="text-lighten">(EI: 20254419420499, CCF-B, <font color="#FF0000">Best Student Paper Finalist</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Dan Luo, Chengyuan Ma, Weiqin Li, Jun Wang, Wei Chen, Zhiyong Wu. "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. <span class="text-lighten">(EI: 20254819583122, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Rui Niu, Weihao Wu, Jie Chen, Long Ma, Zhiyong Wu. "A Multi-Stage Framework for Multimodal Controllable Speech Synthesis," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. <span class="text-lighten">(EI: 20254819583370, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Weiqin Li, Dan Luo, Guangzhi Li, Shan Yang, Zhiyong Wu, Helen Meng, Xixin Wu. "UniSep: Universal Target Audio Separation with Language Models at Scale," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1-6. IEEE, Nantes, France, June 30-July 4, 2025.. <span class="text-lighten">(EI: 20254819583283, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingran Xie, Shun Lei, Yue Yu, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu. "Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20252718723593, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jie Gao, Haiyun Li, Zhisheng Zhang, Zhiyong Wu. "Black-Box Adversarial Defense Against Voice Conversion Using Latent Space Perturbation," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20252718723520, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Weihao Wu, Zhiwei Lin, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu. "DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20252718725633, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Wei Chen, Binzhu Sha, Jing Yang, Zhuo Wang, Fan Fan, Zhiyong Wu. "Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20251818342489, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiwei Xue, Zhensong Zhang, Minglei Li, Zonghong Dai, Zhiyong Wu. "Identity-Preserving Audio-Driven Holistic Human Motion Video Generation," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20252818737665, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Rui Niu, Jie Chen, Long Ma, Changhe Song, Weihao Wu, Zhiyong Wu. "Binary Representation Learning for Discriminative Acoustic Unit Discovery," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20252718725458, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhiqi Huang, Dan Luo, Jun Wang, Huan Liao, Zhiheng Li, Zhiyong Wu. "Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20251818340869, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2025,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Zhiyong Wu, Xixin Wu. "AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Hyderabad, India, April 6-11, 2025.. <span class="text-lighten">(EI: 20251818339733, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng. "SongCreator: Lyrics-based Universal Song Generation," [in] <i>Annual Conference on Neural Information Processing Systems (NeurIPS)</i>, pp. 1-34. MIT Press, Vancouver, Canada, December 10-15, 2024.. <span class="text-lighten">(EI:20240405449, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yixuan Zhou, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou, Shun Lei, Songtao Zhou, Zhiyong Wu, Jia Jia. "VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 554-563. ACM, Melbourne, Australia, October 28-November 1, 2024.. <span class="text-lighten">(EI:20244817417008, CCF-A, THU-A, <font color="#FF0000">Top 4% PaperTravel Grant</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu. "SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 1255-1264. ACM, Melbourne, Australia, October 28-November 1, 2024.. <span class="text-lighten">(SCI:INSPEC:25550569, EI:20244817417002, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xu He, Qiaochu Huang, Zhensong Zhang, Zhiwei Lin, Zhiyong Wu, Sicheng Yang, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu. "Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model," [in] <i>IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR)</i>, pp. 2263-2273. IEEE/CVF, Seattle, USA, June 16-22, 2024.. <span class="text-lighten">(SCI:WOS:001322555902059, EI:20240166196, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Hangting Chen, Jianwei Yu, Qiaochu Huang, Zhiyong Wu, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu. "SECap: Speech Emotion Captioning with Large Language Model," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, pp. 19323-19331. AAAI, Vancouver, Canada, February 20-27, 2024.. <span class="text-lighten">(EI:20241515874366, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zilin Wang, Haolin Zhuang, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen, Yu Yang, Boshi Tang, Zhiyong Wu. "Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, pp. 301-309. AAAI, Vancouver, Canada, February 20-27, 2024.. <span class="text-lighten">(EI:20241515854020, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng. "SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, pp. 15267-15275. AAAI, Vancouver, Canada, February 20-27, 2024.. <span class="text-lighten">(EI:20241515875846, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yunrui Cai, Runchuan Ye, Jingran Xie, Yixuan Zhou, Yaoxun Xu, Zhiyong Wu. "Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup," [in] <i>International Workshop on Multimodal and Responsible Affective Computing (MRAC)</i>, pp. 93-97. Melbourne, Australia, November 1, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Yixuan Zhou, Yunrui Cai, Jingran Xie, Runchuan Ye, Zhiyong Wu. "Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering," [in] <i>International Workshop on Multimodal and Responsible Affective Computing (MRAC)</i>, pp. 104-109. Melbourne, Australia, November 1, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingran Xie, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "ERVQ: Leverage Residual Vector Quantization for Speech Emotion Recognition," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 456-460. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingran Xie, Changhe Song, Yang Xiang, Hui Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "CMAST: Efficient Speech-Text Joint Training Method to Enhance Linguistic Features Learning of Speech Representations," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 656-660. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shuoyi Zhou, Yixuan Zhou, Weiqing Li, Jun Chen, Runchuan Ye, Weihao Wu, Zijian Lin, Shun Lei, Zhiyong Wu. "The Codec Language Model-Based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 496-500. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Rui Niu, Changhe Song, Zhiyong Wu. "NLPP: A Natural Language Prosodic Prominence Dataset Assisted by ChatGPT," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 441-445. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Wei Chen, Xintao Zhao, Jun Chen, Binzhu Sha, Zhiwei Lin, Zhiyong Wu. "RobustSVC: HuBERT-Based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 164-168. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhihan Yang, Chunfeng Wang, Zhiyong Wu, Jia Jia. "Inferring Agent Speaking Styles for Auditory-Visual User-Agent Conversation," [in] <i>International Symposium on Chinese Spoken Language Processing (ISCSLP)</i>, pp. 421-425. Beijing, China, November 7-10, 2024..
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Shixiong Zhang, Jianwei Yu, Zhiyong Wu, Dong Yu. "Comparing Discrete and Continuous Space LLMs for Speech Recognition," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 2509-2513. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(EI:20240390229, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng. "Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1785-1789. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(EI:20240315609, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shuochen Gao, Shun Lei, Fan Zhuo, Hangyu Liu, Feng Liu, Boshi Tang, Qiaochu Huang, Shiyin Kang, Zhiyong Wu. "An End-to-end Approach for Chord-Conditioned Song Generation," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1890-1894. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(EI:20240417880, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yunrui Cai, Zhiyong Wu, Jia Jia, Helen Meng. "LoRA-MER: Low-Rank Adaptation of Pre-Trained Speech Models for Multimodal Emotion Recognition Using Mutual Information," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4658-4662. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng. "CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4129-4133. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(EI:20240265774, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Hang Su, Yuxiang Kong, Lichun Fan, Peng Gao, Yujun Wang, Zhiyong Wu. "Speaker Change Detection with Weighted-sum Knowledge Distillation based on Self-supervised Pre-trained Models," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1655-1659. ISCA, Kos, Greece, September 1-5, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang. "Hydraformer: One Encoder for All Subsampling Rates," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1-6. IEEE, Niagara Falls, Canada, July 15-19, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Tianjiao Du, Jun Chen, Jiasheng Lu, Qinmei Xu, Huan Liao, Yupeng Chen, Zhiyong Wu. "Controllable Text-to-Audio Generation with Training-Free Temporal Guidance Diffusion," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1-6. IEEE, Niagara Falls, Canada, July 15-19, 2024.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Rui Niu, Zhiyong Wu, Changhe Song. "Representation Space Maintenance: Against Forgetting in Continual Learning," [in] <i>IEEE International Joint Conference on Neural Networks (IJCNN)</i>, pp. 1-7. IEEE, Yokohama, Japan, June 30-July 5, 2024.. <span class="text-lighten">(CCF-C, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Ming Cheng, Shun Lei, Dongyang Dai, Zhiyong Wu, Dading Chong. "NRAdapt: Noise-Robust Adaptive Text to Speech Using Untranscribed Data," [in] <i>IEEE International Joint Conference on Neural Networks (IJCNN)</i>, pp. 1-8. IEEE, Yokohama, Japan, June 30-July 5, 2024.. <span class="text-lighten">(CCF-C, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yixuan Zhou, Shuoyi Zhou, Shun Lei, Zhiyong Wu, Menglin Wu. "The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 71-72. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B, <font color="#FF0000">1st Place in Speaker Similarity</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng. "Improving Language Model-based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 12662-12666. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20242416240666, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xingda Li, Fan Zhuo, Dan Luo, Jun Chen, Shiyin Kang, Zhiyong Wu, Tao Jiang, Yang Li, Han Fang, Yahui Zhou. "Generating Stereophonic Music with Single-Stage Language Models," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1471-1475. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhiwei Lin, Jun Chen, Boshi Tang, Binzhu Sha, Jing Yang, Yaolong Ju, Fan Fan, Shiyin Kang, Zhiyong Wu, Helen Meng. "Multi-View MidiVAE: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 941-945. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20240038542, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Weinan Tong, Jiaxu Zhu, Jun Chen, Shiyin Kang, Tao Jiang, Yang Li, Zhiyong Wu, Helen Meng. "SCNet: Sparse Compression Network for Music Source Separation," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1276-1280. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yuanyuan Wang, Hangting Chen, Dongchao Yang, Jianwei Yu, Chao Weng, Zhiyong Wu, Helen Meng. "Consistent and Relevant: Rethink the Query Embedding in General Sound Separation," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 961-965. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Binzhu Sha, Xu Li, Zhiyong Wu, Ying Shan, Helen Meng. "Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 12577-12581. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20230450875, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Hui Lu, Xixin Wu, Haohan Guo, Songxiang Liu, Zhiyong Wu, Helen Meng. "Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 11141-11145. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng. "StyleSpeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 12316-12320. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20240002562, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng. "Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 12341-12345. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng. "Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 8185-8189. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20242416239330, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu. "FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 7945-7949. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20242416241075, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2024,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Haiwei Xue, Sicheng Yang, Zhensong Zhang, Zhiyong Wu, Minglei Li, Zonghong Dai, Helen Meng. "Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 8296-8300. IEEE, Seoul, Korea, April 14-19, 2024.. <span class="text-lighten">(EI:20240010380, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jingbei Li, Sipan Li, Ping Chen, Luwen Zhang, Yi Meng, Zhiyong Wu, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang. "Joint Multiscale Cross-Lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing," <i>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</i>, vol. 32, pp. 517-528. IEEE, November 10, 2023.. <span class="text-lighten">(SCI:, EI:20230175143, CCF-B, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xixin Wu, Hui Lu, Kun Li, Zhiyong Wu, Xunying Liu, Helen Meng. "Hiformer: Sequence Modeling Networks with Hierarchical Attention Mechanisms," <i>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</i>, vol. 31, pp. 3993-4003. IEEE, September 8, 2023.. <span class="text-lighten">(SCI:INSPEC:23688081, EI:20233814764513, CCF-B, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,jnl" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Xixin Wu, Shiyin Kang, Helen Meng. "MSStyleTTS: Multi-scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis," <i>IEEE Transactions on Audio, Speech, and Language Processing (TASLP)</i>, vol. 31, pp. 3290-3303. IEEE, August 2, 2023.. <span class="text-lighten">(SCI:, EI:20230281293, CCF-B, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Hui Lu, Xixin Wu, Zhiyong Wu, Helen Meng. "SpeechTripleNet: End-to-end Disentangled Speech Representation Learning for Content, Timbre and Prosody," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 2829-2837. ACM, Ottawa, Canada, October 29-November 3, 2023.. <span class="text-lighten">(EI:20235015224410, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sicheng Yang, Zilin Wang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Qiaochu Huang, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai. "UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons," [in] <i>ACM International Conference on Multimedia (ACM MM)</i>, pp. 1033-1044. ACM, Ottawa, Canada, October 29-November 3, 2023.. <span class="text-lighten">(EI:20230332184, CCF-A, THU-A, <font color="#FF0000">前2.5%</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao. "DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models," [in] <i>International Joint Conference on Artificial Intelligence (IJCAI)</i>, pp. 5860-5868. Morgan Kaufmann, Macao, China, August 19-25, 2023.. <span class="text-lighten">(EI:20233714713734, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Haolin Zhuang. "QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation," [in] <i>IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR)</i>, pp. 2321-2330. IEEE/CVF, Vancouver, Canada, June 18-22, 2023.. <span class="text-lighten">(EI:20230186667, CCF-A, THU-A, <font color="#FF0000">Highlight前2.5%</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhihan Yang, Zhiyong Wu, Ying Shan, Jia Jia. "What Does Your Face Sound Like? 3D Face Shape Towards Voice," [in] <i>AAAI Conference on Artificial Intelligence (AAAI)</i>, pp. 13905-13913. AAAI, Washington DC, USA, February 7-14, 2023.. <span class="text-lighten">(EI:20233414581264, CCF-A, THU-A)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sicheng Yang, Haiwei Xue, Zhensong Zhang, Minglei Li, Zhiyong Wu, Xiaofei Wu, Songcen Xu, Zonghong Dai. "The DiffuseStyleGesture+ entry to the GENEA Challenge 2023," [in] <i>ACM International Conference on Multimodal Interaction (ICMI)</i>, pp. 779-785. ACM, Paris, France, October 9-13, 2023.. <span class="text-lighten">(EI:20230317714, CCF-C, THU-B, <font color="#FF0000">Reproducibility Award</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Liyang Chen, Zhiyong Wu, Runnan Li, Weihong Bao, Jun Ling, Xu Tan, Sheng Zhao. "VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer," [in] <i>IEEE/CVF International Conference on Computer Vision (ICCV) Workshops</i>, pp. 2977-2987. Paris, France, October 2-6, 2023.. <span class="text-lighten">(EI:20230292957, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yunrui Cai, Jingran Xie, Boshi Tang, Yuanyuan Wang, Jun Chen, Haiwei Xue, Zhiyong Wu. "First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition," [in] <i>International Workshop on Multimodal and Responsible Affective Computing (MRAC)</i>, pp. 13-20. Ottawa, Canada, October 29, 2023.. <span class="text-lighten">(CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Yunrui Cai, Changhe Song, Boshi Tang, Dongyang Dai, Zhiyong Wu, Helen Meng. "Robust Representation Learning for Speech Emotion Recognition with Moment Exchange," [in] <i>Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)</i>, pp. 1002-1007. APSIPA, Taipei, China, October 31-November 3, 2023.. <span class="text-lighten">(EI:20235115257009)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang. "A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis," [in] <i>China Multimedia (ChinaMM)</i>, pp. 1-9. Kunming, China, August 2-4, 2023.. <span class="text-lighten">(EI:20230345194, <font color="#FF0000">Best Paper</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng. "Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4858-4862. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230201740, CCF-B, <font color="#FF0000">Best Student Paper</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng. "Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 3377-3381. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230331605, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zhihan Yang, Shansong Liu, Xu Li, Haozhe Wu, Zhiyong Wu, Ying Shan, Jia Jia. "Prosody Modeling with 3D Visual Information for Expressive Video Dubbing," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4863-4867. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20233814760588, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jiuxin Lin, Peng Wang, Heinrich Dinkel, Jun Chen, Zhiyong Wu, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang. "Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 2488-2492. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230232439, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jun Chen, Wei Rao, Zilin Wang, Jiuxin Lin, Yukai Ju, Shulin He, Yannan Wang, Zhiyong Wu. "MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4034-4038. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230233757, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xingchen Song, Di Wu, Binbin Zhang, Zhendong Peng, Bo Dang, Fuping Pan, Zhiyong Wu. "ZeroPrompt: Streaming Acoustic Encoders are Zero-Shot Masked LMs," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1648-1652. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230191878, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng. "SememeASR: Boosting Performance of End-to-end Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 3272-3276. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230330956, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng. "Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 1334-1338. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230333401, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Wenzhe Liu, Yupeng Shi, Jun Chen, Wei Rao, Shulin He, Andong Li, Yannan Wang, Zhiyong Wu. "Gesper: A Restoration-Enhancement Framework for General Speech Reconstruction," [in] <i>Annual Conference of the International Speech Communication Association (INTERSPEECH)</i>, pp. 4044-4048. ISCA, Dublin, Ireland, August 20-24, 2023.. <span class="text-lighten">(EI:20230216014, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng. "SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1703-1708. IEEE, Brisbane, Australia, July 10-14, 2023.. <span class="text-lighten">(EI:20230340577, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Xintao Zhao, Shuai Wang, Yang Chao, Zhiyong Wu, Helen Meng. "Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion," [in] <i>IEEE International Conference on Multimedia and Expo (ICME)</i>, pp. 1691-1696. IEEE, Brisbane, Australia, July 10-14, 2023.. <span class="text-lighten">(EI:20230198155, CCF-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,sel,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng. "Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. <span class="text-lighten">(EI:20230134346, CCF-B, THU-B, <font color="#FF0000">Top 3% Paper</font>)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu. "LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. <span class="text-lighten">(EI:20230340208, CCF-B, THU-B)</span>
</div>
</div>
</div>
<!-- paper -->
<div data-category="2023,cf" class="col-12 filtr-item">
<div class="card border-0 rounded-0 hover-shadow bg-gray p-2 mb-2">
<div class="text-dark">
Zilin Wang, Peng Liu, Jun Chen, Sipan Li, Jinfeng Bai, Gang He, Zhiyong Wu, Helen Meng. "A Synthetic Corpus Generation Method for Neural Vocoder Training," [in] <i>IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</i>, pp. 1-5. IEEE, Rhodes Island, Greece, June 4-10, 2023.. <span class="text-lighten">(EI:20234715106132, CCF-B, THU-B)</span>
</div>
</div>