Speech to Text to Image Generation

I just built an app where you can record your voice and see the text extracted from your voice and the image generated.

I turn my audio into text using Whisper which is an OpenAI Speech Recognition Model that turns audio into text with up to 99% accuracy. Whisper is a speech transcription system form the creators of ChatGPT. Anyone can use it, and it is completely free. The system is trained on 680 000 hours of speech data from the network and recognizes 99 languages.

I generated images from texts using Replicate. Replicate runs machine learning models on the cloud. They have a library of open-source models that we can run with a few lines of code.

To get started.

       Clone the repository

       git clone https://github.com/Ashot72/Speech-to-Text-to-Image
       cd Speech-to-Text-to-Image

       Add your keys to .env file
       
       # installs dependencies
       npm install

       # to run locally
        npm start

Go to Speech To Text to Image Generation Video page

Go to Speech To Text to Image Generation description page

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
docs		docs
public		public
uploads		uploads
.env		.env
.gitignore		.gitignore
README.md		README.md
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

public

public

uploads

uploads

.env

.env

.gitignore

.gitignore

README.md

README.md

index.ts

index.ts

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

Repository files navigation

Speech to Text to Image Generation

About

Releases

Packages

Languages

Ashot72/Speech-to-Text-to-Image

Folders and files

Latest commit

History

Repository files navigation

Speech to Text to Image Generation

About

Topics

Resources

Stars

Watchers

Forks

Languages