Skip to content

Audio tour guide website with K-POP celebrity's voice using Text-To-Speech model

Notifications You must be signed in to change notification settings

PSY222/CelebVoiceTour

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tour AI Hackathon _ CelebVoiceTour ⭐🔊

한국관광공사 오디오 해설 API에 음성변환AI를 적용해, 한국을 방문하는 외국인 관광객에게 특별한 관광경험을 선사하는 '내귀에 셀럽'입니다. '내귀에 셀럽'은 최애 K-POP 아이돌의 목소리로 오디오 해설을 듣고, 해당 가수의 음원을 함께 들을 수 있는 플레이리스트를 제공하여 여행에 특별함을 더하는 웹 서비스 입니다.

Bringing a unique tourism experience to foreign visitors exploring Korea, the 'CelebVoiceTour' utilizes voice conversion AI in the Korea Tourism Organization's audio guide API. With 'CelebVoiceTour', tourists can enjoy audio guides narrated by their favorite K-POP idols, accompanied by playlists featuring the artist's music, adding an extra layer of specialness to their journey.

🔨 Tech Stack

  • IDE: IntelliJ IDEA
  • Automation tool: Gradle
  • DB System: MySQL
  • Web Framework : Spring Boot

📌AI Tech

Bark Model : Transformer based text-to-audio model by Suno AI
The Bark model consists of three transformer models designed to convert text into audio, with distinct stages of processing. It begins by transforming text into semantic tokens using the BERT tokenizer from Hugging Face. These semantic tokens encode the audio content to be generated. In the subsequent step, the model converts semantic tokens into coarse tokens using the EnCodec Codec's first two codebooks from Facebook. Finally, the process involves transforming the first two codebooks from EnCodec into 8 codebooks, providing finer audio details. (HuggingFace model details)

This service utilized voice cloning based on the incredible work from Serp-Ai. K-POP celebrity's voice was extracted from YouTube video and adjusted noise using Audacity. I transformed K-POP Celebrity's audio file to semantic token using HuBERT model and tokenizer. Then, I went through numerous experiments of generating custom voice by processing pre-generated voice with target celebrity's npz token.

I shared 'VITS_model_with Whisper tutorial' for anyone who wants to try customized voice cloning using VITS model with Whisper.

Refer to this article for more step-by-step approach.

About

Audio tour guide website with K-POP celebrity's voice using Text-To-Speech model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published