2025年度に発表された文献の一覧
学術論文誌
- S. Luan, Y. Wakabayashi, T. Toda, "Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming," Applied Acoustics, Vol. 236, Article 110706, pp. 1-15. Apr. 2025.
- M. Eshghi, T. Toda, "Predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction," IEEE Access, Vol. 13, pp. 73831-73847, Apr. 2025.
- Y. Ohtani, T. Okamoto, T. Toda, H. Kawai, "Fast neural vocoder with fundamental frequency control using finite impulse response filters," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 1893-1906, Apr. 2025.
- D. Ma, Y. Choi, T. Fujimura, F. Li, C. Xie, K. Kobayashi, T. Toda, "Sequence-to-sequence voice conversion-based techniques for electrolaryngeal speech enhancement in noisy and reverberant conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e8, pp. 1-40, May 2025.
- C. Xie, T. Toda, "An investigation of noisy-to-noisy voice conversion performance in various noisy conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e10, pp. 1-30, June 2025.
- T. Fujimura, T. Toda, "Analysis and extension of noisy-target training for unsupervised target signal enhancement," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025.
- I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda, "Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025.
- J. He, T. Toda, "PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025.
- Y. Choi, C. Xie, T. Toda, "Noise and reverberation-controllable voice conversion," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2430-2443, June 2025.
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "Phoneme-level duration controllable neural text-to-speech with phoneme embedding skip connection and modified Gaussian duration modeling," IEEE Access, Vol. 13, pp. 118369-118380, July 2025.
- Y. Hashizume, L. Li, A. Miyashita, T. Toda, "Learning separated representations for instrument-based music similarity," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025.
- D. Ma, L.P. Violeta, K. Kobayashi, T. Toda, "Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025.
- S. Chen, T. Toda, "QHARMA-GAN: quasi-harmonic neural vocoder based on autoregressive moving average model," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3703-3719, Sep. 2025.
- Y. Yasuda, T. Toda, "Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment," Computer Speech and Language, Vol. 96, Article 101888, pp. 1-16, Sep. 2025.
- D. Yoshioka, Y. Nakata, Y. Yasuda, T. Toda, "Text- and speech-style control for lecture speech generation focusing on disfluency," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e26, pp. 1-31, Sep. 2025.
- J. He, X. Shi, C.-H. Hu, J. Mi, X. Li, T. Toda, "M4SER: multimodal, multirepresentation, multitask, and multistrategy learning for speech emotion recognition," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 4055-4070, Sep. 2025.
- 西尾 直樹, 小林 和弘, 戸田 智基, "喉頭摘出者における自己音声の再獲得 ~Save the Voice Project~," 気管食道科学会会報, Vol. 76, No. 5, pp. 255-263, Oct. 2025.
国際会議
- Y. Hashizume, T. Toda, "Investigation of perceptual music similarity focusing on each instrumental part," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- T. Fujimura, I. Kuroyanagi, T. Toda, "Improvements of discriminative feature space training for anomalous sound detection in unlabeled conditions," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda, "Investigating factors related to the naturalness of synthesized unison singing," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "Mora-level prosody prediction for text-to-speech using Japanese BERT without accentual labels," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- D. Ma, J. Mi, F. Li, L.P. Violeta, K. Kobayashi, T. Toda, "Improving electrolaryngeal speech enhancement via a representation learning method based on integrated text and speech representations," Proc. IEEE EMBC, 6 pages, Copenhagen, Denmark, July 2025.【3rd Place Award in EMBC 2025 Student Paper Competition(受賞者:Ding Ma)】
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "GST-BERT-TTS: prosody prediction without accentual labels for multi-speaker TTS using BERT with global style tokens," Proc. INTERSPEECH, pp. 444-448, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, X, Li, T. Toda, "Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation," Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025.
- W.-C. Huang, E. Cooper, T. Toda, "SHEET: a multi-purpose open-source speech human evaluation estimation toolkit," Proc. INTERSPEECH, pp. 2355-2359, Rotterdam, the Netherlands, Aug. 2025.
- J. He, N. Sawada, K. Miyazaki, T. Toda, "CMT-LLM: context-aware multi-talker ASR utilizing large language models," Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025.
- J. He, J. Mi, T. Toda, "GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints," Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025.
- B. Halpern, T. Tienkamp, T. Rebernik, R. van Son, M. Wieling, D. Abur, T. Toda, "Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer," Proc. INTERSPEECH, pp. 3733-3737, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, X. Li, T. Toda, "Speaker-aware multi-task learning for speech emotion recognition," Proc. INTERSPEECH, pp. 4333-4337, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, J. Mi, X. Li, T. Toda, "Advancing emotion recognition via ensemble learning: integrating speech, context, and text representations," Proc. INTERSPEECH, pp. 4693-4697, Rotterdam, the Netherlands, Aug. 2025.
- R. Yoneyama, M. Kawamura, R. Terashima, R. Yamamoto, T. Toda, "Comparative analysis of fast and high-fidelity neural vocoders for low-latency streaming synthesis in resource-constrained environments," Proc. INTERSPEECH, pp. 4888-4892, Rotterdam, the Netherlands, Aug. 2025.
- Z. Zhang, W.-C. Huang, X. Wang, X. Miao, J. Yamagishi, "Mitigating language mismatch in SSL-based speaker anonymization," Proc. INTERSPEECH, pp. 5133-5137, Rotterdam, the Netherlands, Aug. 2025.
- C.-H. Hu, Y. Yasuda, A. Yoshimoto, T. Toda, "Unifying listener scoring scales: comparison learning framework for speech quality assessment and continuous speech emotion recognition," Proc. INTERSPEECH, pp. 5428-5432, Rotterdam, the Netherlands, Aug. 2025.
- M. Murata, K. Miyazaki, T. Koriyama, T. Toda, "Eigenvoice synthesis based on model editing for speaker generation," Proc. INTERSPEECH, pp. 5523-5527, Rotterdam, the Netherlands, Aug. 2025.
- Y. Yasuda, J. Yamagishi, T. Toda, "Continual subjective evaluation method of speech by merging sort-based preference tests towards ever-expanding corpus of human ratings," Proc. SSW, pp. 14-20, Leeuwarden, the Netherlands, Aug. 2025.
- L.P. Violeta, W.-C. Huang, T. Toda, "Serenade: a singing style conversion framework based on audio infilling," Proc. EUSIPCO, pp. 411-415, Palermo, Italy, Sep. 2025.
- K. Ogita, R. Yoneyama, W.-C. Huang, T. Toda, "VAE-SiFiGAN: source-filter HiFi-GAN based on variational autoencoder representations with enhanced pitch controllability," Proc. EUSIPCO, pp. 531-535, Palermo, Italy, Sep. 2025.【Finalists of EUSIPCO Best Student Paper Award(対象者:Kenichi Ogita)】
- K. Sawada, W.-C. Huang, T. Toda, "Hierarchical symbolic music generation with variational autoencoder-based bar-wise feature sequences," Proc. APSIPA ASC, pp. 299-304, Singapore, Oct. 2025.
- S. Tang, Z. Liu, L. Chen, K.A. Lee, T. Toda, Z.-H. Ling, "A preliminary study on sectional voice anonymization and detection," Proc. APSIPA ASC, pp. 318-323, Singapore, Oct. 2025.
- K. Hattori, W.-C. Huang, K. Takeda, T. Toda, "An evaluation of supervised virtual microphone estimators in reverberant sound fields," Proc. APSIPA ASC, pp. 517-522, Singapore, Oct. 2025.
- M. Kaneko, W.-C. Huang, T. Toda, "Estimating speaker'ss seating position from monaural speech in a simulated vehicle interior sound field," Proc. APSIPA ASC, pp. 625-629, Singapore, Oct. 2025.
- H. Miyaji, K. Sawada, W.-C. Huang, T. Toda, "Designing a music difficulty measure for controllable automatic piano rearrangement," Proc. APSIPA ASC, pp. 834-839, Singapore, Oct. 2025.
- K. Niwa, K. Kobayashi, T. Toda, "Investigation of the effectiveness of converted speech auditory feedback in low-latency real-time voice conversion," Proc. APSIPA ASC, pp. 905-910, Singapore, Oct. 2025.
- Y. Nakata, D. Yoshioka, W.-C. Huang, T. Toda, "Disfluency disentanglement enhancement in spoken-text-style transfer for spontaneous speech synthesis," Proc. APSIPA ASC, pp. 2254-2259, Singapore, Oct. 2025.
- D. Yoon, T. Toda, "Neural semi-fragile watermarking for proactive deepfake speech detection," Proc. APSIPA ASC, pp. 2396-2401, Singapore, Oct. 2025.
- W.-C. Huang, "Advancing speech quality assessment through scientific challenges and open-source activities," Perspective paper, Proc. APSIPA ASC, pp. 2547-2552, Singapore, Oct. 2025.
- L. Chen, K.A. Lee, Z.-H. Ling, X. Wang, R.K. Das, T. Toda, H. Li, "Speaker privacy and security in the big data era: protection and defense against deepfake," Perspective paper, Proc. APSIPA ASC, pp. 2565-2570, Singapore, Oct. 2025.
- K. Wilkinghoff, T. Fujimura, K. Imoto, J. Le Roux, Z.-H. Tan, T. Toda, "Handling domain shifts for anomalous sound detection: a review of DCASE-related work," Proc. DCASE Workshop, pp. 20-24, Barcelona, Spain, Oct. 2025.
- M. Matsumoto, T. Fujimura, W.-C. Huang, T. Toda, "Adjusting bias in anomaly scores via variance minimization for domain-generalized discriminative anomalous sound detection," Proc. DCASE Workshop, pp. 25-29, Barcelona, Spain, Oct. 2025.
- T. Fujimura, K. Wilkinghoff, K. Imoto, T. Toda, "ASDKit: a toolkit for comprehensive evaluation of anomalous sound detection methods," Proc. DCASE Workshop, pp. 40-44, Barcelona, Spain, Oct. 2025.
- T. Fujimura, I. Kuroyanagi, T. Toda, "Discriminative anomalous sound detection using pseudo labels, target signal enhancement, and ensemble feature extractors," Proc. DCASE Workshop, pp. 180-184, Barcelona, Spain, Oct. 2025.
講習会
- W.-C. Huang, E. Cooper, J. Shi, "Automatic quality assessment for speech and beyond," Tutorial, INTERSPEECH, Rotterdam, the Netherlands, Aug. 2025.
招待講演
- 米山 怜於, "ニューラルボコーダ概説:生成モデルと実用性の観点から," 音学シンポジウム, 招待講演, 東京, June 2025.
- 戸田 智基, "音声研究の知見がニューラルボコーダの発展にもたらす効果," 音学シンポジウム, 招待講演, 東京, June 2025.
- T. Toda, "Recent advances and future directions in voice conversion," Survey Talk, INTERSPEECH, Rotterdam, the Netherlands, Aug. 2025.
研究会
- 藤村 拓弥, "ICASSP2025における異常音検知の動向," 信学技報, Vol. 125, No. 36, EA2025-1, pp. 1-6, May 2025.
応用音響研究会, オーガナイズドセッション, May 2025.
- 橋爪 優果, "ICASSP2025における音楽情報処理の動向," 信学技報, Vol. 125, No. 36, EA2025-3, pp. 13-17, May 2025.
- 米山 怜於, "ニューラルボコーダ概説:生成モデルと実用性の観点から," 情報処理研報, Vol. 2025-SLP-156, No. 3, 1 page, June 2025.
- 戸田 智基, "音声研究の知見がニューラルボコーダの発展にもたらす効果," 情報処理研報, Vol. 2025-SLP-156, No. 4, 1 page, June 2025.
- 宮司 光梨, 澤田 桂都, ホワン ウェンチン, 戸田 智基, "制御性の高いピアノ自動編曲に向けた楽曲難易度指標の設計," 情報処理研報, Vol. 2025-MUS-143, No. 8, pp. 1-7, June 2025.
- 山下 陽生, 岡本 拓磨, 高島 遼一, 大谷 大和, 滝口 哲也, 戸田 智基, 河井 恒, "重み付きAttentionのアライメント機構を用いた系列変換型声質変換," 情報処理研報, Vol. 2025-SLP-143, No. 75, pp. 1-6, June 2025.【音学シンポジウム2025優秀発表賞(受賞者:山下 陽生)】
- 服部 公宏, ホワン ウェンチン, 武田 一哉, 戸田 智基, "多様なシミュレーション音場における教師あり仮想マイクアレイ信号推定の汎化性能評価," 信学技報, Vol. 125, No. 74, SP2025-20, pp. 107-112, June 2025.
- W.-C. Huang, L.P. Violeta, T. Toda, "JATTS: a comparison-oriented Japanese text-to-speech open-sourced toolkit," 信学技報, Vol. 125, No. 74, SP2025-22, pp. 119-124, June 2025.
大会講演
- 中井 淳一, 藤村 拓弥, 高田 将典, 浅野 憲司, 若松 智之, 戸田 智基, "説明性向上マルチモーダルAIによるMOCの潜在的異常見える化," 第24回情報科学技術フォーラム(FIT2025), CH-006, 第3分冊, pp. 21-24, Sep. 2025.
- 小椋 忠志, 岡本 拓磨, 大谷 大和,Erica Cooper, 戸田 智基, 河井 恒, "GST-BERT-TTS:アクセントラベル不要な複数話者日本語TTS," 音講論, 1-1-1, pp. 1115-1118, Sep. 2025.
- 安田 裕介, 井本 桂右, 深山 覚, 戸田 智基, "アクセント制御音声合成と主観比較評価最適化による専門家非依存アクセントアノテーション法," 音講論, 1-1-3, pp. 1123-1126, Sep. 2025.
- Huang Wen-Chin, Wang Hui, Liu Cheng, Wu Yi-Chiao, Tjandra Andros, Hsu Wei-Ning, Cooper Erica, Yong Qin, 戸田 智基, "The AudioMOS Challenge 2025," 音講論, 1-1-16, pp. 1167-1170, Sep. 2025.
- 山下 陽生, 岡本 拓磨, 高島 遼一, 大谷 大和, 滝口 哲也, 戸田 智基, 河井 恒, "系列変換型声質変換モデルのモバイル端末実装," 音講論, 3-Q-21, pp. 1365-1366, Sep. 2025.
その他発表
- 西尾 直樹, 小林 和弘, 戸田 智基, 横井 紗矢香, 向山 宣昭, 和田 明久, 横井 麻衣, 重山 真由, 三谷 壮平, 曾根 三千彦, "電気のコエから自分のコエへ -Save the Voice Project-," 日本気管食道科学会会報, 特集5 パネルディスカッション1:喉頭摘出後のコミュニケーション支援, Vol. 76, No. 2, p. 108, Apr. 2025.
- W.-C. Huang, "Automatic quality assessment for speech and beyond," Talk, Conversational AI Reading Group, Mila/Concordia University, May 2025.
- T. Fujimura, I. Kuroyanagi, T. Toda, "The NU systems for DCASE 2025 Challenge Task 2," Technical report, DCASE Task 2, 5 pages, July 2025.
- F. Li, F. Shen, D. Ma, J. Zhou, L. Wang, F. Fan, X. Chen, T. Toda, H. Niu, "Mandarin speech reconstruction from neck and facial surface electromyography," Proc. IEEE EMBC, Research poster presentation, 4 pages, Copenhagen, Denmark, July 2025.
- L.P. Violeta, D. Ma, W.-C. Huang, T. Toda, "Pretraining and adaptation techniques for electrolaryngeal speech recognition," EUSIPCO, SPS journal paper presentation, Palermo, Italy, Sep. 2025.
- "名古屋大学「AIP加速課題:発声技能拡張PJ」," CEATEC 2025, 展示, 千葉, Oct. 2024.
博士論文
- Shuming Luan, "Generalized sound field interpolation in rotation-robust microphone array signal processing," 情報学研究科知能システム学専攻博士論文, July 2025.