AI Voiceover with GPT4V

This Streamlit application, along with a Jupyter notebook implementation, demonstrates the use of AI and machine learning to automate the process of generating voiceovers for videos. The solution involves processing video, generating narratives based on the video content, converting the narratives to audio, and then merging the audio back into the video for a complete voiceover experience.

Demo

huggingface.co/spaces/martintmv/gpt4v-voiceover

Input Video

input.mp4

Output Video

AI-output.mp4

Jupyter Notebook

Features

Video Processing: Converts a video into frames using OpenCV.
Narrative Generation: Utilizes OpenAI's GPT-4 Vision model to create stories or scripts based on the video frames.
Voiceover Generation: Converts the generated text into a voiceover using ElevenLabs's text-to-speech API.
Audio and Video Merging: Combines the generated voiceover with the original video, extending or trimming the video as needed to match the voiceover duration.

Workflow

Environment Setup: Load necessary API keys and configurations.
Video to Frames: Convert a video into individual frames suitable for AI processing.
AI-Generated Script: Use OpenAI's GPT-4 model to create a script based on the video frames.
Text to Speech: Convert the script to audio with OpenAI's or ElevenLabs's TTS service.
Video Finalization: Merge the audio back into the video, adjusting the video duration to match the audio if necessary.

Jupyter Notebook Implementation

The Jupyter notebook voiceover_jupyter-notebook.ipynb includes the full implementation of the AI voiceover process:

Extracting Video Frames: Load a video file and extract frames as base64-encoded images.
AI Script Generation: Send the frames to OpenAI's GPT-4 model to generate a voiceover script.
Text-to-Speech Conversion: Convert the script into a voiceover audio file using OpenAI's or ElevenLabs's TTS service.

The notebook provides a step-by-step guide, complete with code and markdown explanations, to illustrate the entire process of creating an AI-generated voiceover for video content.

Dependencies

python-dotenv: For loading environment variables.
moviepy: For video and audio processing.
opencv-python: For handling video frames.
openai: For accessing OpenAI's GPT-4 API.
requests: For making HTTP requests to the TTS API.
streamlit: For creating the web-based UI (for the Streamlit app).

Requirements

An OpenAI API key and/or ElevenLabs API key are required.
Python 3.x and the above-mentioned libraries.

Disclaimer

This project is for demonstration purposes and showcases the integration of AI models with video and audio processing in Python, using both a Streamlit app and a Jupyter Notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
voiceover.ipynb		voiceover.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

app.py

app.py

requirements.txt

requirements.txt

voiceover.ipynb

voiceover.ipynb

Repository files navigation

AI Voiceover with GPT4V

Demo

Input Video

Output Video

Jupyter Notebook

Features

Workflow

Jupyter Notebook Implementation

Dependencies

Requirements

Disclaimer

About

Releases

Packages

Languages

License

martintmv-git/gpt4v-streamlit-voiceover

Folders and files

Latest commit

History

Repository files navigation

AI Voiceover with GPT4V

Demo

Input Video

Output Video

Jupyter Notebook

Features

Workflow

Jupyter Notebook Implementation

Dependencies

Requirements

Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Languages