A robust, lightweight Python wrapper for the Google Gemini API.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Gemini API Toolkit

Maintenance

Python wrapper for the Google Gemini API (using the official google-genai SDK).

Comparison: Manually vs. Toolkit

The following example demonstrates a complex multimodal workflow: Processing images and a PDF to generate an insurance claim decision using the gemini-3-pro-preview model.

Raw SDK (Manual Implementation):

import time
import mimetypes
import pathlib
from google import genai
from google.genai import types

# 1. Setup Client (Requires v1alpha for specific Gemini 3 features)
client = genai.Client(http_options={'api_version': 'v1alpha'})

# 2. Handle Inputs
media_files = ["front_bumper.jpg", "side_panel.jpg", "police_report.pdf"]
parts = []

for path_str in media_files:
    path = pathlib.Path(path_str)
    mime_type, _ = mimetypes.guess_type(path)
    
    if mime_type == "application/pdf":
        print(f"Uploading {path}...")
        with open(path, "rb") as f:
            # Upload via Files API
            uploaded_file = client.files.upload(file=f, config={'mime_type': mime_type})
        
        # Poll for processing
        while True:
            file_meta = client.files.get(name=uploaded_file.name)
            if file_meta.state.name == "ACTIVE":
                print("File Active.")
                break
            elif file_meta.state.name == "FAILED":
                raise Exception("File upload failed")
            time.sleep(2)
            
        parts.append(types.Part(
            file_data=types.FileData(file_uri=uploaded_file.uri, mime_type=mime_type),
            media_resolution={"level": "media_resolution_high"}
        ))
    else:
        # Send Images Inline
        parts.append(types.Part(
            inline_data=types.Blob(
                data=path.read_bytes(), 
                mime_type=mime_type
            ),
            media_resolution={"level": "media_resolution_high"}
        ))

# 3. Add Prompt
parts.append(types.Part(text="Analyze these images and the report. Determine if the insurance claim should be approved and explain why."))

# 4. Configure Gemini 3 Specifics
generate_config = types.GenerateContentConfig(
    temperature=1.0, # Recommended default for Gem 3
    thinking_config=types.ThinkingConfig(
        thinking_level="HIGH",
        include_thoughts=True
    )
)

# 5. Generate
print("Generating...")
response = client.models.generate_content(
    model="gemini-3-pro-preview",
    contents=[types.Content(parts=parts)],
    config=generate_config
)

print(response.text)

With Gemini API Toolkit:

from gemini_kit import prompt_gemini_3

# 1. Define Inputs
media = ["front_bumper.jpg", "side_panel.jpg", "police_report.pdf"]
prompt_text = "Analyze these images and the report. Determine if the insurance claim should be approved and explain why."

# 2. Call the Function
result, tokens = prompt_gemini_3(
    model="gemini-3-pro-preview",
    prompt=prompt_text,
    media_attachments=media,
    media_resolution="high", 
    thinking_level="high"
)

print(result)

Installation

Using Pip:

pip install gemini-api-toolkit

From Source:

git clone https://github.com/Danielnara24/gemini-api-toolkit.git
cd gemini-api-toolkit
pip install -e .

Usage

1. Basic Text & Google Search

from gemini_kit import prompt_gemini

prompt = "What are the latest specs of the Steam Deck OLED vs the ROG Ally X?"

response, tokens = prompt_gemini(
    model="gemini-2.5-flash",
    prompt=prompt,
    google_search=True,  # Enables Search Tool
    thinking=True        # Enables Thinking
)

print(response)

2. Mixed Media (Video, PDF, Images, Audio)

Pass local file paths or YouTube URLs. The kit handles upload/inline logic automatically.

from gemini_kit import prompt_gemini

files = ["./downloads/tutorial.mp4", "./documents/specification.pdf"]

response, tokens = prompt_gemini(
    model="gemini-2.5-pro",
    prompt="Compare the specifications in the PDF with the device shown in the video.",
    media_attachments=files
)

print(response)

3. Structured Output (Pydantic)

Enforce a JSON schema on the output. Note: In Gemini 2.5, you cannot combine Structured Output with Tools (Search/Code).

from pydantic import BaseModel
from gemini_kit import prompt_gemini

class MovieIdea(BaseModel):
    title: str
    logline: str
    estimated_budget: int

response_obj, tokens = prompt_gemini(
    model="gemini-2.5-flash",
    prompt="Generate a movie idea about a robot learning to paint.",
    response_schema=MovieIdea
)

# Returns a MovieIdea object directly
print(f"Title: {response_obj.title}")
print(f"Budget: ${response_obj.estimated_budget}")

4. Gemini 3: Search + Code + JSON

from pydantic import BaseModel
from gemini_kit import prompt_gemini_3

class CryptoRatio(BaseModel):
    btc_price: float
    eth_price: float
    ratio: float
    summary: str

response_obj, tokens = prompt_gemini_3(
    prompt="Find current BTC and ETH prices and calculate the ETH/BTC ratio.",
    response_schema=CryptoRatio, 
    google_search=True,
    code_execution=True,
    thinking_level="high"
)

print(f"Ratio: {response_obj.ratio} | Summary: {response_obj.summary}")

5. Cleanup

Free up server storage space (deletes files uploaded via Files API).

from gemini_kit import delete_all_uploads

delete_all_uploads()

[!TIP] The examples/ folder in this repository contains scripts demonstrating specific use cases.

Arguments for prompting functions

model: The name of the Gemini model to use (e.g., "gemini-2.5-flash", "gemini-3-pro-preview").
prompt: The text instruction sent to the model.
response_schema: Pydantic model or Enum class to enforce structured JSON output. (Note: In prompt_gemini, this disables tools).
media_attachments: List of file paths (audio, images, videos, PDFs) or YouTube URLs to analyze.
upload_threshold_mb: Files larger than this (in MB) are uploaded via Files API; smaller are sent inline.
thinking_level: Controls reasoning depth for Gemini 3 ("low" or "high").
thinking: Boolean to enable/disable the thinking process for Gemini 2.5.
media_resolution: Sets token usage/quality for inputs ("low", "medium", "high") for Gemini 3.
temperature: Controls output randomness (0.0 to 2.0).
google_search: Boolean to enable Grounding with Google Search.
code_execution: Boolean to enable the Python code interpreter tool.
url_context: Boolean to enable the model to read/process content from URLs in the prompt.
max_retries: Number of times to retry the API call if it fails. 0 by default.

Spatial Understanding

The toolkit provides dedicated functions for 2D detection (bounding boxes), pointing, and segmentation generation. These functions return both the raw JSON data and a visualized Pillow image.

1. 2D Object Detection

Detect objects with bounding boxes using any Gemini model.

from gemini_kit import detect_2d

# Returns JSON data and a PIL Image with drawn boxes
json_data, visual_image = detect_2d(
    model="gemini-2.5-pro", 
    prompt="Detect all faces in the image. Label what they are wearing.",
    image_path="street.jpg",
    visual=True
)

visual_image.show()

2D Detection Example

2. Pointing

Identify the precise location of objects (y, x coordinates).

from gemini_kit import pointing

json_data, visual_image = pointing(
    model="gemini-3-pro-preview", 
    prompt="Label each part of the motherboard in the image.",
    image_path="motherboard.png",
    visual=True
)

visual_image.show()

Pointing Example

3. Segmentation

Generate pixel-level masks for objects.
Note: Only supported on Gemini 2.5 models.

from gemini_kit import segmentation

# visual=True returns a combined overlay image
# output_path saves individual mask files to disk
json_data, visual_image = segmentation(
    model="gemini-2.5-pro", 
    prompt="Segment all cupcakes in the image, label 'sprinkles' or 'no sprinkles'",
    image_path="cupcakes.jpeg",
    visual=True,
    output_path="output_samples" 
)

visual_image.show()

Segmentation Example

Arguments for Spatial Understanding Functions

model: The name of the Gemini model to use (e.g., "gemini-2.5-flash", "gemini-3-pro-preview").
prompt: The text instruction sent to the model.
image_path: Local path or URL of the image to use.
visual: If True, returns a PIL Image with the visualization.
output_path: The path to save PIL images or masks and overlays. Won't save if not specified.
temperature: Controls output randomness (0.0 to 2.0). 0.5 Recommended.
max_retries: Number of times to retry the API call if it fails. 0 by default.

Disclaimer

This is an unofficial open-source utility and is not affiliated with, endorsed by, or connected to Google. The code is provided "as is" to help developers interact with the Gemini API more easily. Users are responsible for their own API usage, costs, and adherence to Google's Terms of Service.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.5.5

Dec 13, 2025

This version

0.5.4

Dec 12, 2025

0.4.0

Dec 10, 2025

0.3.0

Dec 3, 2025

0.2.0

Dec 3, 2025

0.1.3

Nov 29, 2025

0.1.2

Nov 29, 2025

0.1.1

Nov 29, 2025

0.1.0

Nov 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_api_toolkit-0.5.4.tar.gz (333.8 kB view details)

Uploaded Dec 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemini_api_toolkit-0.5.4-py3-none-any.whl (18.6 kB view details)

Uploaded Dec 12, 2025 Python 3

File details

Details for the file gemini_api_toolkit-0.5.4.tar.gz.

File metadata

Download URL: gemini_api_toolkit-0.5.4.tar.gz
Upload date: Dec 12, 2025
Size: 333.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_api_toolkit-0.5.4.tar.gz
Algorithm	Hash digest
SHA256	`fb1388d27ad87416b84b3b7cc351fb0da55c9df97153106f9a1965005d005e49`
MD5	`6f5a4f2876fe119da3e6ac3df45886ab`
BLAKE2b-256	`f745e57c6eb405d6ed27853f1223bf70ab5f2cf2e86c74736411ca30b3b890c9`

See more details on using hashes here.

File details

Details for the file gemini_api_toolkit-0.5.4-py3-none-any.whl.

File metadata

Download URL: gemini_api_toolkit-0.5.4-py3-none-any.whl
Upload date: Dec 12, 2025
Size: 18.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for gemini_api_toolkit-0.5.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`043d4fa29126e6e140582c1b67c8dad43ab3b605f1759b65cea865b14f75b616`
MD5	`9e6320d9c2c7f364651d557b0f1f25a6`
BLAKE2b-256	`ddf2a13dad34ec0944ec2f0b07c68920460d1674e0a48350fa3ab10bd37e7711`

See more details on using hashes here.

gemini-api-toolkit 0.5.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Gemini API Toolkit

Comparison: Manually vs. Toolkit

Installation

Usage

1. Basic Text & Google Search

2. Mixed Media (Video, PDF, Images, Audio)

3. Structured Output (Pydantic)

4. Gemini 3: Search + Code + JSON

5. Cleanup

Arguments for prompting functions

Spatial Understanding

1. 2D Object Detection

2. Pointing

3. Segmentation

Arguments for Spatial Understanding Functions

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes