Publications

2025

  1. eba.png
    Gaze Beyond the Frame: Forecasting Egocentric 3D Visual Span
    Heeseung Yun, Joonil Na,  Jaeyeon Kim , Calvin Murdock, and Gunhee Kim
    In NeurIPS (Spotlight) , 2025
  2. wow_bench.png
    WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
    Jaeyeon Kim , Heeseung Yun, Sang Hoon Woo, Chao-Han Huck Yang, and Gunhee Kim
    arXiv preprint arXiv:2508.20976, 2025
  3. dcase_aqa.png
    Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge
    Chao-Han Huck Yang, Sreyan Ghosh, Qing Wang,  Jaeyeon Kim , Hengyi Hong , and 6 more authors
    arXiv preprint arXiv:2505.07365, 2025
  4. visage.png
    ViSAGe: Video-to-Spatial Audio Generation
    Jaeyeon Kim , Heeseung Yun, and Gunhee Kim
    In ICLR , 2025

2024

  1. enclap++.png
    EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
    Jaeyeon Kim , Minjeong Jeon, Jaeyoon Jung, Sang Hoon Woo, and Jinjoo Lee
    In DCASE2024 Workshop , 2024
  2. Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
    Jaeyeon Kim , Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, and Jinjoo Lee
    DCASE2024 Challenge Technical Report, 2024
  3. learning_semantic.png
    Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
    Jaeyeon Kim , Injune Hwang, and Kyogu Lee
    In ICASSP , 2024
  4. enclap.png
    EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
    Jaeyeon Kim , Jaeyoon Jung, Jinjoo Lee, and Sang Hoon Woo
    In ICASSP , 2024

2023

  1. pits.png
    PITS: Variational pitch inference without fundamental frequency for end-to-end pitch-controllable TTS
    Junhyeok Lee, Wonbin Jung, Hyunjae Cho,  Jaeyeon Kim , and Jaehwan Kim
    In ICML Workshop on Structured Probabilistic Inference & Generative Modeling , 2023