Smart Gemini API key manager with token-aware sliding window scheduling
Project description
gemini-flux ๐ฅ
Smart Gemini API key manager. Give it N keys. It handles the rest.
Author: Muhammad Ali โ malikasana2810@gmail.com
Is this for you?
You're hitting 429 RESOURCE_EXHAUSTED on Gemini's free tier. You've noticed that creating multiple API keys in the same Google Cloud project doesn't help (because rate limits are per-project, not per-key). You don't want to pay yet. You want your request-heavy app to just keep running.
gemini-flux is for you. Give it 8 keys from 8 projects and it squeezes ~10,000 free requests/day out of Gemini with zero manual babysitting.
Install
pip install gemini-flux
30-second example
from gemini_flux import GeminiFlux
flux = GeminiFlux(keys=["key1", "key2", "key3", "key4", "key5", "key6", "key7", "key8"])
response = flux.generate("Translate this transcript to Spanish...")
print(response["response"])
That's it. gemini-flux picks the right key, the right model, waits exactly as long as it has to, and falls back through models when one runs out for the day.
Loading keys from .env (recommended)
Hardcoding keys is fine for testing but not for real use. The recommended way is a .env file:
GEMINI_KEY_1=AIza...
GEMINI_KEY_2=AIza...
GEMINI_KEY_3=AIza...
GEMINI_KEY_4=AIza...
GEMINI_KEY_5=AIza...
GEMINI_KEY_6=AIza...
GEMINI_KEY_7=AIza...
GEMINI_KEY_8=AIza...
GEMINI_MODE=both
GEMINI_LOG=true
Then load them in code:
import os
from dotenv import load_dotenv
from gemini_flux import GeminiFlux
load_dotenv()
keys = []
i = 1
while True:
key = os.environ.get(f"GEMINI_KEY_{i}")
if not key:
break
keys.append(key)
i += 1
flux = GeminiFlux(
keys=keys,
mode=os.environ.get("GEMINI_MODE", "both"),
log=os.environ.get("GEMINI_LOG", "true").lower() == "true"
)
response = flux.generate("your prompt here")
print(response["response"])
Copy .env.example from the repo as your starting template โ it has the right format with instructions.
Who this is built for
- Translation and dubbing pipelines
- Long-running batch jobs over large documents
- RAG systems with high request volume
- Any app that burns through Gemini quota faster than the free tier allows
- Anyone stuck in "can't justify paying yet, but the free tier keeps dying" purgatory
The problem Google doesn't tell you about
Gemini rate limits are per Google Cloud project, not per API key. Ten keys in one project = one quota, shared. Useless.
The fix: create multiple projects. Each Google account gets up to 10. Each project has its own independent quota. Two accounts with a few projects each gets you 8 independent rate limits without touching a credit card.
With 8 keys on the free tier:
| Model | RPD per key | ร 8 keys | Daily total |
|---|---|---|---|
| gemini-2.5-pro | 100 | ร 8 | 800 |
| gemini-2.5-flash | 250 | ร 8 | 2,000 |
| gemini-2.5-flash-lite | 1,000 | ร 8 | 8,000 |
| Total | ~10,800/day, free |
Managing 8 keys manually is hell. That's what gemini-flux automates.
How it schedules โ the math
Naive rotators cycle keys on a fixed timer (e.g. "use the next key every 30 seconds"). That's wrong, because the real cooldown depends on how many tokens you sent.
Gemini's free tier allows 250,000 tokens per minute (TPM) per project. So:
cooldown = token_count / 250,000
1M tokens โ 4 min cooldown
500k tokens โ 2 min cooldown
100k tokens โ 24 sec cooldown
10k tokens โ 2.4 sec cooldown
With N keys rotating:
interval_between_requests = cooldown / N
1M token request, 8 keys โ 30 sec between requests
10k token request, 8 keys โ 0.3 sec โ nearly instant
gemini-flux counts tokens using Google's free count_tokens endpoint (zero quota cost) before every request, maintains a 60-second sliding window per key, and sends via whichever key has capacity right now. No fixed timers. No wasted seconds.
What you get out of the box
Token-aware scheduling โ Every request routed to the key with real-time capacity. If none are ready, waits precisely as long as needed, not a second more.
Model exhaustion chain โ When a key hits its daily cap on one model, gemini-flux falls through to the next:
1. gemini-2.5-pro โ 100 RPD per key
2. gemini-2.5-flash โ 250 RPD per key โ main workhorse
3. gemini-2.5-flash-lite โ 1000 RPD per key
4. gemini-3.1-pro-preview โ newest pro generation
5. gemini-3-flash-preview โ newest flash generation
6. gemini-3.1-flash-lite-preview โ newest lite generation
You don't lose the key, just that model on that key, until midnight PT reset.
Self-updating policy โ On startup, gemini-flux asks Gemini what the current free-tier limits are and caches them for 7 days. When Google changes limits (they do, without warning), gemini-flux catches it.
Key health report on startup โ Invalid keys removed. Exhausted keys flagged. You know the state of your pool before you send a single request.
Automatic daily reset โ Exhausted keys come back online at midnight Pacific without any manual intervention.
Setup in 3 steps
1. Create projects. Go to console.cloud.google.com and create up to 10 projects. Each gets independent quota. Use a second Google account to get more.
2. Get one API key per project. APIs & Services โ Credentials โ Create Credentials โ API Key.
3. Drop them in .env:
GEMINI_KEY_1=AIza...
GEMINI_KEY_2=AIza...
# ... up to as many as you want
GEMINI_MODE=both
GEMINI_LOG=true
Full usage
response = flux.generate(
prompt="Translate this transcript to Urdu with natural dubbing tone...",
images=["base64_image..."],
files=["base64_pdf..."],
mode="flash_only",
preferred_key=3,
max_tokens=2000,
temperature=0.5,
retry=True
)
Every response includes:
{
"response": "Gemini's reply...",
"key_used": 3,
"model_used": "gemini-2.5-flash",
"tokens_used": 45231,
"wait_applied": 1.8,
"retried": False
}
Runtime controls
flux.set_mode("flash_only") # change mode anytime
flux.disable_key(3) # disable key #3
flux.enable_key(3) # re-enable key #3
flux.refresh_policy() # force re-fetch Gemini policy
flux.status() # full key pool status
Modes
| Mode | Description |
|---|---|
both |
Full exhaustion chain, pro โ lite (default) |
pro_only |
Only Pro models |
flash_only |
Only Flash models |
flash_lite_only |
Only Flash-Lite models |
Other ways to run it
Git clone (development)
git clone https://github.com/malikasana/gemini-flux
cd gemini-flux
pip install -r requirements.txt
cp .env.example .env
from core import GeminiFlux
Docker microservice
docker build -t gemini-flux .
docker run -p 8000:8000 --env-file .env gemini-flux
from gemini_flux import GeminiFluxClient
client = GeminiFluxClient(base_url="http://localhost:8000")
response = client.generate("...")
Kaggle notebook
!pip install gemini-flux
import os
from gemini_flux import GeminiFlux
# paste keys directly or load from Kaggle secrets
keys = ["key1", "key2", ...]
flux = GeminiFlux(keys=keys)
response = flux.generate("your prompt here")
HTTP API (when running as microservice)
| Endpoint | Method | Description |
|---|---|---|
/generate |
POST | Send prompt, get response |
/status |
GET | Key pool status and usage |
/refresh-policy |
POST | Force policy re-fetch |
/config |
POST | Change mode, enable/disable keys |
/health |
GET | Health check |
What it looks like running
==================================================
gemini-flux ๐ฅ Starting up with 8 keys
==================================================
[STARTUP] Checking 8 keys...
[KEY 1] โ
Healthy
[KEY 2] โ
Healthy
[KEY 3] โ ๏ธ Exhausted โ resets at midnight PT
[KEY 4] โ Invalid โ removed from pool
[STARTUP] Pool ready: 6 healthy, 1 exhausted, 1 invalid
[POLICY] Using cached policy (1.2 days old)
[STARTUP] Dynamic interval: 240s / 6 keys = 40.0s (worst case)
[STARTUP] โ
gemini-flux ready! Mode: BOTH
[REQUEST] Incoming โ 450,000 tokens detected
[SCHEDULER] Key #2 selected โ sending via gemini-2.5-flash
[RESPONSE] โ
Success via Key #2 (gemini-2.5-flash)
[KEY 2] gemini-2.5-flash: 1/250 requests used today
Project structure
gemini-flux/
โโโ core/
โ โโโ __init__.py # Public interface
โ โโโ flux.py # Main GeminiFlux class
โ โโโ scheduler.py # Token-aware sliding window brain
โ โโโ key_pool.py # Key validation and tracking
โ โโโ policy.py # Smart policy fetcher
โโโ service/
โ โโโ main.py # FastAPI microservice
โโโ client/
โ โโโ __init__.py
โ โโโ client.py # Lightweight HTTP client
โโโ .env.example # Environment template
โโโ Dockerfile
โโโ pyproject.toml
โโโ requirements.txt
โโโ test.py
โโโ README.md
The backstory
I'm building a video dubbing application. Continuous transcript โ LLM โ translated transcript, chunk after chunk, video after video. Gemini free tier seemed perfect until I hit the 429 wall.
I created a second key in the same project. Same error. That's when I learned rate limits are per-project, not per-key. Made ten keys โ still useless. Then I found the multi-project trick, built a naive rotator, realized that was also wrong because it ignored how many tokens I was actually sending.
So I wrote the math down. Then I wrote the code. Then I realized other people are going to hit this exact wall, and nobody should have to lose a weekend to it twice.
That's gemini-flux. Built out of frustration. Powered by math. Open-sourced so the next person doesn't have to rebuild it.
Security
- Never commit your
.envโ it's in.gitignoreby default - Use
.env.exampleas a template - Every key validated on startup โ invalid ones removed before any request is sent
License
MIT. Use it, fork it, ship it.
Author
Muhammad Ali โ malikasana2810@gmail.com
Built out of frustration with rate limits. Powered by math.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gemini_flux-1.1.1.tar.gz.
File metadata
- Download URL: gemini_flux-1.1.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1916b55a10a3d8d7db5b64e9f28c9aec1552c4141fb72365a31ecbaeb5bddf5
|
|
| MD5 |
3e46670c686540782c371e4d961df2af
|
|
| BLAKE2b-256 |
a75b3c00dac3afef6e2d869cdfe93a9b97c026467c097a58a7c4f8069686da33
|
File details
Details for the file gemini_flux-1.1.1-py3-none-any.whl.
File metadata
- Download URL: gemini_flux-1.1.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecd60219cf19bb0310dcad3310214edb9eeb7b88f492306d55b0b3a3096a1bd1
|
|
| MD5 |
d4eef749a4e8ac81a0afa022ada2ca64
|
|
| BLAKE2b-256 |
7fd301ff95c0b1c822cd777b47b3cc4582ebaad762c89afd130677c14e2653c1
|