Publications

2025

  1. visage.png
    ViSAGe: Video-to-Spatial Audio Generation
    Jaeyeon Kim , Heeseung Yun, and Gunhee Kim
    In ICLR , 2025

2024

  1. enclap++.png
    EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
    Jaeyeon Kim , Minjeong Jeon, Jaeyoon Jung, Sang Hoon Woo, and Jinjoo Lee
    In DCASE2024 Workshop , 2024
  2. Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
    Jaeyeon Kim , Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, and Jinjoo Lee
    DCASE2024 Challenge Technical Report, 2024
  3. learning_semantic.png
    Learning Semantic Information from Raw Audio Signal Using Both Contextual and Phonetic Representations
    Jaeyeon Kim , Injune Hwang, and Kyogu Lee
    In ICASSP , 2024
  4. enclap.png
    EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
    Jaeyeon Kim , Jaeyoon Jung, Jinjoo Lee, and Sang Hoon Woo
    In ICASSP , 2024

2023

  1. pits.png
    PITS: Variational pitch inference without fundamental frequency for end-to-end pitch-controllable TTS
    Junhyeok Lee, Wonbin Jung, Hyunjae Cho,  Jaeyeon Kim , and Jaehwan Kim
    In ICML Workshop on Structured Probabilistic Inference & Generative Modeling , 2023