2025年度に発表された文献の一覧
学術論文誌
- S. Luan, Y. Wakabayashi, T. Toda, "Generalized sound field interpolation for freely spaced microphone arrays in rotation-robust beamforming," Applied Acoustics, Vol. 236, Article 110706, pp. 1-15. Apr. 2025.
- M. Eshghi, T. Toda, "Predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction," IEEE Access, Vol. 13, pp. 73831-73847, Apr. 2025.
- Y. Ohtani, T. Okamoto, T. Toda, H. Kawai, "Fast neural vocoder with fundamental frequency control using finite impulse response filters," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 1893-1906, Apr. 2025.
- D. Ma, Y. Choi, T. Fujimura, F. Li, C. Xie, K. Kobayashi, T. Toda, "Sequence-to-sequence voice conversion-based techniques for electrolaryngeal speech enhancement in noisy and reverberant conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e8, pp. 1-40, May 2025.
- C. Xie, T. Toda, "An investigation of noisy-to-noisy voice conversion performance in various noisy conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e10, pp. 1-30, June 2025.
- T. Fujimura, T. Toda, "Analysis and extension of noisy-target training for unsupervised target signal enhancement," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e12, pp. 1-27, June 2025.
- I. Kuroyanagi, T. Fujimura, K. Takeda, T. Toda, "Improving anomalous sound detection through pseudo-anomalous set selection and pseudo-label utilization under unlabeled conditions," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e13, pp. 1-28, June 2025.
- J. He, T. Toda, "PMF-CEC: phoneme-augmented multimodal fusion for context-aware ASR error correction with error-specific selective decoding," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2402-2417, June 2025.
- Y. Choi, C. Xie, T. Toda, "Noise and reverberation-controllable voice conversion," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 2430-2443, June 2025.
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "Phoneme-level duration controllable neural text-to-speech with phoneme embedding skip connection and modified Gaussian duration modeling," IEEE Access, Vol. 13, pp. 118369-118380, July 2025.
- Y. Hashizume, L. Li, A. Miyashita, T. Toda, "Learning separated representations for instrument-based music similarity," APSIPA Transactions on Signal and Information Processing, Vol. 14, No. 1, e16, pp. 1-32, July 2025.
- D. Ma, L.P. Violeta, K. Kobayashi, T. Toda, "Pretraining and fine-tuning techniques for electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion," IEEE Transactions on Audio, Speech and Language Processing, Vol. 33, pp. 3189-3201, July 2025.
国際会議
- Y. Hashizume, T. Toda, "Investigation of perceptual music similarity focusing on each instrumental part," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- T. Fujimura, I. Kuroyanagi, T. Toda, "Improvements of discriminative feature space training for anomalous sound detection in unlabeled conditions," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- K. Nishizawa, R. Yamamoto, W.-C. Huang, T. Toda, "Investigating factors related to the naturalness of synthesized unison singing," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "Mora-level prosody prediction for text-to-speech using Japanese BERT without accentual labels," Proc. IEEE ICASSP, 5 pages, Hyderabad, India, Apr. 2025.
- D. Ma, J. Mi, F. Li, L.P. Violeta, K. Kobayashi, T. Toda, "Improving electrolaryngeal speech enhancement via a representation learning method based on integrated text and speech representations," Proc. IEEE EMBC, 6 pages, Copenhagen, Denmark, July 2025.【3rd Place Award in EMBC 2025 Student Paper Competition(受賞者:Ding Ma)】
- T. Ogura, T. Okamoto, Y. Ohtani, E. Cooper, T. Toda, H. Kawai, "GST-BERT-TTS: prosody prediction without accentual labels for multi-speaker TTS using BERT with global style tokens," Proc. INTERSPEECH, pp. 444-448, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, X, Li, T. Toda, "Who, When, and What: leveraging the "Three Ws" concept for emotion recognition in conversation," Proc. INTERSPEECH, pp. 1763-1767, Rotterdam, the Netherlands, Aug. 2025.
- W.-C. Huang, E. Cooper, T. Toda, "SHEET: a multi-purpose open-source speech human evaluation estimation toolkit," Proc. INTERSPEECH, pp. 2355-2359, Rotterdam, the Netherlands, Aug. 2025.
- J. He, N. Sawada, K. Miyazaki, T. Toda, "CMT-LLM: context-aware multi-talker ASR utilizing large language models," Proc. INTERSPEECH, pp. 2575-2579, Rotterdam, the Netherlands, Aug. 2025.
- J. He, J. Mi, T. Toda, "GIA-MIC: multimodal emotion recognition with gated interactive attention and modality-invariant learning constraints," Proc. INTERSPEECH, pp. 2695-2699, Rotterdam, the Netherlands, Aug. 2025.
- B. Halpern, T. Tienkamp, T. Rebernik, R. van Son, M. Wieling, D. Abur, T. Toda, "Relationship between objective and subjective perceptual measures of speech in individuals with head and neck cancer," Proc. INTERSPEECH, pp. 3733-3737, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, X. Li, T. Toda, "Speaker-aware multi-task learning for speech emotion recognition," Proc. INTERSPEECH, pp. 4333-4337, Rotterdam, the Netherlands, Aug. 2025.
- X. Shi, J. Mi, X. Li, T. Toda, "Advancing emotion recognition via ensemble learning: integrating speech, context, and text representations," Proc. INTERSPEECH, pp. 4693-4697, Rotterdam, the Netherlands, Aug. 2025.
- R. Yoneyama, M. Kawamura, R. Terashima, R. Yamamoto, T. Toda, "Comparative analysis of fast and high-fidelity neural vocoders for low-latency streaming synthesis in resource-constrained environments," Proc. INTERSPEECH, pp. 4888-4892, Rotterdam, the Netherlands, Aug. 2025.
- Z. Zhang, W.-C. Huang, X. Wang, X. Miao, J. Yamagishi, "Mitigating language mismatch in SSL-based speaker anonymization," Proc. INTERSPEECH, pp. 5133-5137, Rotterdam, the Netherlands, Aug. 2025.
- C.-H. Hu, Y. Yasuda, A. Yoshimoto, T. Toda, "Unifying listener scoring scales: comparison learning framework for speech quality assessment and continuous speech emotion recognition," Proc. INTERSPEECH, pp. 5428-5432, Rotterdam, the Netherlands, Aug. 2025.
- M. Murata, K. Miyazaki, T. Koriyama, T. Toda, "Eigenvoice synthesis based on model editing for speaker generation," Proc. INTERSPEECH, pp. 5523-5527, Rotterdam, the Netherlands, Aug. 2025.
- Y. Yasuda, J. Yamagishi, T. Toda, "Continual subjective evaluation method of speech by merging sort-based preference tests towards ever-expanding corpus of human ratings," Proc. SSW, pp. 14-20, Leeuwarden, the Netherlands, Aug. 2025.
講習会
- W.-C. Huang, E. Cooper, J. Shi, "Automatic quality assessment for speech and beyond," Tutorial, INTERSPEECH, Rotterdam, the Netherlands, Aug. 2025.
招待講演
- 米山 怜於, "ニューラルボコーダ概説:生成モデルと実用性の観点から," 音学シンポジウム, 招待講演, 東京, June 2025.
- 戸田 智基, "音声研究の知見がニューラルボコーダの発展にもたらす効果," 音学シンポジウム, 招待講演, 東京, June 2025.
- T. Toda, "Recent advances and future directions in voice conversion," Survey Talk, INTERSPEECH, Rotterdam, the Netherlands, Aug. 2025.
研究会
- 藤村 拓弥, "ICASSP2025における異常音検知の動向," 信学技報, Vol. 125, No. 36, EA2025-1, pp. 1-6, May 2025.
応用音響研究会, オーガナイズドセッション, May 2025.
- 橋爪 優果, "ICASSP2025における音楽情報処理の動向," 信学技報, Vol. 125, No. 36, EA2025-3, pp. 13-17, May 2025.
- 米山 怜於, "ニューラルボコーダ概説:生成モデルと実用性の観点から," 情報処理研報, Vol. 2025-SLP-156, No. 3, 1 page, June 2025.
- 戸田 智基, "音声研究の知見がニューラルボコーダの発展にもたらす効果," 情報処理研報, Vol. 2025-SLP-156, No. 4, 1 page, June 2025.
- 宮司 光梨, 澤田 桂都, ホワン ウェンチン, 戸田 智基, "制御性の高いピアノ自動編曲に向けた楽曲難易度指標の設計," 情報処理研報, Vol. 2025-MUS-143, No. 8, pp. 1-7, June 2025.
- 山下 陽生, 岡本 拓磨, 高島 遼一, 大谷 大和, 滝口 哲也, 戸田 智基, 河井 恒, "重み付きAttentionのアライメント機構を用いた系列変換型声質変換," 情報処理研報, Vol. 2025-SLP-143, No. 75, pp. 1-6, June 2025.【音学シンポジウム2025優秀発表賞(受賞者:山下 陽生)】
- 服部 公宏, ホワン ウェンチン, 武田 一哉, 戸田 智基, "多様なシミュレーション音場における教師あり仮想マイクアレイ信号推定の汎化性能評価," 信学技報, Vol. 125, No. 74, SP2025-20, pp. 107-112, June 2025.
- W.-C. Huang, L.P. Violeta, T. Toda, "JATTS: a comparison-oriented Japanese text-to-speech open-sourced toolkit," 信学技報, Vol. 125, No. 74, SP2025-22, pp. 119-124, June 2025.
その他発表
- 西尾 直樹, 小林 和弘, 戸田 智基, 横井 紗矢香, 向山 宣昭, 和田 明久, 横井 麻衣, 重山 真由, 三谷 壮平, 曾根 三千彦, "電気のコエから自分のコエへ -Save the Voice Project-," 日本気管食道科学会会報, 特集5 パネルディスカッション1:喉頭摘出後のコミュニケーション支援, Vol. 76, No. 2, p. 108, Apr. 2025.
- W.-C. Huang, "Automatic quality assessment for speech and beyond," Talk, Conversational AI Reading Group, Mila/Concordia University, May 2025.
- T. Fujimura, I. Kuroyanagi, T. Toda, "The NU systems for DCASE 2025 Challenge Task 2," Technical report, DCASE Task 2, 5 pages, July 2025.
- F. Li, F. Shen, D. Ma, J. Zhou, L. Wang, F. Fan, X. Chen, T. Toda, H. Niu, "Mandarin speech reconstruction from neck and facial surface electromyography," Proc. IEEE EMBC, Research poster presentation, 4 pages, Copenhagen, Denmark, July 2025.