Declarative REST API ingestion for PySpark

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DanielTom

These details have not been verified by PyPI

Project description

Polymo

Welcome to Polymo

Polymo is a helper for pyspark that turns everyday web APIs into tables you can analyse. Point it at an API, tell it what you want to grab, and Polymo does the heavy lifting of fetching the data and lining it up neatly.

Why people use Polymo

No custom code required. Describe your API once in a short, friendly YAML file or through the point-and-click Builder.
See results before you commit. Preview the real responses, record-by-record, so you can fix issues early.
Works with Spark-based tools. When you are ready, Polymo serves the data to your analytics stack using the same interface Spark already understands.
Designed for teams. Save reusable connectors, share them across projects, and keep secrets (like tokens) out of files.

Pick your path

Mostly clicking? Open the Builder UI and follow the guided screens. It is the easiest way to create a connector from scratch.
Prefer a checklist? Read the Configuration guide for a plain-language tour of every field in the YAML file.
Power user? Jump straight to the CLI or the Python helpers to automate things.

Before you start

Install Polymo with pip install polymo. If you want the Builder UI, add the extras: pip install "polymo[builder]".
Make sure you have access to the API you care about (base URL, token if needed, and any sample request parameters).
Check that PySpark version 4 or newer is available. Polymo uses Spark under the hood to keep data consistent.

Quick tour

Launch the Builder (optional but recommended). Run polymo builder --port 9000 and open the provided link in your browser.
Describe your API. Fill in a base URL like https://jsonplaceholder.typicode.com, pick the endpoint /posts, and add filters such as _limit: 20 if you only need a sample.
Preview the data. Press the Preview button to see a table of records, the raw API replies, and any error messages.
Save the connector. Download the YAML config or write it directly to your project folder. Tokens stay out of the file and are passed in later.
Use it in Spark. Load the file with the short code snippet below or copy/paste from the Builder’s tips panel.

The Builder keeps a local library of every connector you work on. Use the header’s connector picker to hop between drafts, open the library to rename or export them, and never worry about losing your place. The header even shows the installed Polymo version for quick support checks.

from pyspark.sql import SparkSession
from polymo import ApiReader

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

df = (
    spark.read.format("polymo")
    .option("config_path", "./config.yml")  # YAML you saved from the Builder
    .option("token", "YOUR_TOKEN")  # Only if the API needs one
    .load()
)

df.show()

Streaming too

Structured Streaming works out of the box:

stream_df = (
    spark.readStream.format("polymo")
    .option("config_path", "./config.yml")
    .option("stream_batch_size", 100)
    .option("stream_progress_path", "/tmp/polymo-progress.json")
    .load()
)

query = stream_df.writeStream.format("memory").outputMode("append").queryName("polymo")
query.start()

Use the same runtime options as read (tokens, OAuth2 client secrets, incremental state paths, etc.). stream_batch_size caps the number of rows per micro-batch and stream_progress_path stores a tiny JSON file so restarts resume from the same offset.

Want a quick check without writing code? Run polymo smoke --streaming and the CLI will execute a one-off micro-batch using the bundled JSONPlaceholder example (or a YAML you pass in).

Incremental syncs in one minute

Add cursor_param and cursor_field under incremental: in your YAML to tell Polymo which API field to track.
Pass .option("incremental_state_path", ...) when reading with Spark. Local paths and remote URLs (S3, GCS, Azure, etc.) work out of the box.
On the first run, seed a starting value with .option("incremental_start_value", "..."). Future runs reuse the stored cursor automatically.
Override the stored entry name with .option("incremental_state_key", "...") if you share a state file across connectors.
Skip the state path to keep cursors only in memory during the Spark session, or disable that cache with .option("incremental_memory_state", "false") if you always want a cold start.

Handling flaky APIs with retries

Add an error_handler block under stream: when you want to customise retries. By default Polymo retries 5× on HTTP 5XX and 429 responses with exponential backoff.
Override the defaults to catch extra status codes or adjust the timing:

stream:
  path: /orders
  error_handler:
    max_retries: 6
    retry_statuses:
      - 5XX
      - 429
      - 404
    retry_on_timeout: true
    retry_on_connection_errors: true
    backoff:
      initial_delay_seconds: 1
      max_delay_seconds: 60
      multiplier: 1.8

Omit the block to keep the safe defaults. The Builder UI exposes the same fields if you prefer toggles over YAML edits.

What’s inside this project

src/polymo/ keeps the logic that speaks to APIs and hands data to Spark.
polymo builder is a small web app (FastAPI + React) that guides you through every step. No need to run npm, the app is bundled with pip and ready to go.
examples/ contains ready-made configs you can copy, tweak, and use for smoke tests.
tests/test_datasource.py::test_stream_reader_batches exercises the streaming reader end to end; run it with pytest -k stream_reader_batches for a quick smoke test.

Run the Builder in Docker

Build the dev-friendly image and launch the Builder with hot reload:

docker compose up --build builder

The service listens on port 8000; open http://localhost:8000 once Uvicorn reports it is running.
The image already bundles PySpark and OpenJDK 21;
Stop with docker compose down and restart quickly using the cached image via docker compose up builder.

Have fun building connectors!

Where to Next

Read the docs here

Contributions and early feedback welcome!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

DanielTom

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.1

Oct 11, 2025

0.8.4

Oct 11, 2025

0.8.2

Oct 8, 2025

0.8.1

Oct 8, 2025

0.8.0

Oct 8, 2025

0.7.2

Oct 8, 2025

0.7.1

Oct 4, 2025

0.7.0

Oct 4, 2025

This version

0.6.0

Oct 4, 2025

0.5.1

Oct 4, 2025

0.5.0

Oct 4, 2025

0.4.0

Oct 4, 2025

0.3.0

Sep 30, 2025

0.2.0

Sep 29, 2025

0.1.2

Sep 29, 2025

0.1.1

Sep 28, 2025

0.1.0

Sep 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polymo-0.6.0.tar.gz (196.0 kB view details)

Uploaded Oct 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

polymo-0.6.0-py3-none-any.whl (201.7 kB view details)

Uploaded Oct 4, 2025 Python 3

File details

Details for the file polymo-0.6.0.tar.gz.

File metadata

Download URL: polymo-0.6.0.tar.gz
Upload date: Oct 4, 2025
Size: 196.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`2a1aac1199d64e659efa7f0e2b82d0e3629c93f507518c4511ded3dacd6fe8a0`
MD5	`0298372c86a698f0e66a6788c3e04390`
BLAKE2b-256	`e4cd1db55dd04e65427fd87f5b7873a9bfcd6a5c4be9739e86ceabc569452dc7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.6.0.tar.gz:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polymo-0.6.0.tar.gz
- Subject digest: 2a1aac1199d64e659efa7f0e2b82d0e3629c93f507518c4511ded3dacd6fe8a0
- Sigstore transparency entry: 583951429
- Sigstore integration time: Oct 4, 2025
Source repository:
- Permalink: dan1elt0m/polymo@89c0e08ebbae5c2f6d467ea2160cafe17c0a9c07
- Branch / Tag: refs/tags/0.6.0
- Owner: https://github.com/dan1elt0m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@89c0e08ebbae5c2f6d467ea2160cafe17c0a9c07
- Trigger Event: release

File details

Details for the file polymo-0.6.0-py3-none-any.whl.

File metadata

Download URL: polymo-0.6.0-py3-none-any.whl
Upload date: Oct 4, 2025
Size: 201.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0db80c0ad5539ff8bdcad0b9227b0b13d2bfdb2683217f0d5d5b6e27a0d2a11d`
MD5	`f1a90a59bf9ee1bd323ef32f171d5809`
BLAKE2b-256	`30b436f9f62d4413db510bf82e2b50d09fa8ac14776cbbcfc0ec95e4ab581efa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.6.0-py3-none-any.whl:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: polymo-0.6.0-py3-none-any.whl
- Subject digest: 0db80c0ad5539ff8bdcad0b9227b0b13d2bfdb2683217f0d5d5b6e27a0d2a11d
- Sigstore transparency entry: 583951430
- Sigstore integration time: Oct 4, 2025
Source repository:
- Permalink: dan1elt0m/polymo@89c0e08ebbae5c2f6d467ea2160cafe17c0a9c07
- Branch / Tag: refs/tags/0.6.0
- Owner: https://github.com/dan1elt0m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@89c0e08ebbae5c2f6d467ea2160cafe17c0a9c07
- Trigger Event: release

polymo 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Welcome to Polymo

Why people use Polymo

Pick your path

Before you start

Quick tour

Streaming too

Incremental syncs in one minute

Handling flaky APIs with retries

What’s inside this project

Run the Builder in Docker

Where to Next

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance