Engage with Google's Gemini AI directly from your terminal with vibrant colored outputs. Seamlessly switch between text queries and interactive image inputs for a dynamic AI interaction experience. Perfect for Linux Enthusiasts, developers and AI enthusiasts alike!
- Terminal-Based Interface - Interact with Google's Gemini AI directly from the comfort of yor Terminal.
- MultiModel Support - Ability to Input Images and Text.
- Text-to-Text(T2T) Model - Engage in conversations by entering text queries and receiving text responses.
- ANSI Formatting - Enhanced user experience with colorful text outputs.
- Dynamic Interactions - Seamless switching between text and Image modes for versitile interactions.
- Configuration - Easily configure settings through 'keys.json' file including API key.
- As part of my
Go
learning journey, I've decided to recreate this project usingGo
. This serves as a practical exercise to enhance my skills and gain hands-on experience. Expect that on near future on my Github profile
-
To use the Gemini API, you need an API key. You can easily create a key with one click on Google AI Studio. To read the documentaion visit ai.google.dev.
-
To get the API key visit Google AI Studio
- Pyhon 3.x
- Pip
-
google-generativeai
Enables developers to use Google's state-of-the-art generative AI models to build AI-powered features and applications.pip install google-generativeai
-
pillow
Library gives image processing capabilities.pip install pillow
-
pyperclip
Is a cross-platform Python module to interact with clipboardpip install pyperclip
git clone https://github.com/mr-alham/Google-Gemini-AI-on-the-Terminal.git
pip install -r requirements.txt
-
API key
- Locate the
key.json
file in this project's directory. - Inside the file, you'll find a line that looks like this:
"GEMINI_API_KEY": "your gemini key here"
- Replace "your gemini key here" with the API key you obtained in above step, Be sure to keep the quotation marks around your key.
- Save the key.json file.
- Locate the
-
Gemini Model *not required
- The Gemini API offers different models that are optimized for specific use cases.
- Read the documentation and select your preferred one.
- The Default model is
gemini-1.5-pro-latest
-
Safety Settings & Generation Configuration *not required
- Read the documentaion and edit at your will
- Model Parameters Documentaion
- Safety Settings Documentaion
Run the script using Python
python3 main.py
Command-Line Arguments
--image
: will start the script in MultiModel mode where you can give a file path and give the prompt.
Text-to-Text(T2T) Model
- If the script didn't get any arguments the script will start with Text-to-Text Model.
- Using Text-to-Text Model user can develop a conversation with the Gemini.
- If you need to switch to MultiModel type
Image Mode
MultiMode Model
- If the script recieved the argument
--image
then the script will start in MultiMode mode. - Or if the query is equla to
Image Mode
on T2T It will redirect to MuliModel. - You can give the path to the image file by,
- Manually entering the path to Image
- Copying the path to the clipboard and giving input as
clip
- Or copy the path and press
Enter
key
- If need to switch to normal mode type
Text Mode
as the file path.
This project is licensed under the MIT License.
For any inquiries or support, please contact You can contact us at alham@duck.com.
Feel free to contribute by submitting pull requests.