API ingestion for PySpark
Project description
Polymo turns REST APIs into Spark DataFrames with a single declarative configuration file. The library builds on top of the DataSource V2 implementation for PySpark 4, while the companion builder UI helps you design, validate, and preview connectors without writing code.
Highlights
- Visual builder keeps a form-based editor and live YAML in sync, with validation and previews.
- Configuration is plain YAML: describe the base URL, pagination, query parameters, and optional record selectors—no custom code required.
- Spark-native DataSource exposes
spark.read.format("polymo"), so connectors slot into existing ETL jobs, notebooks, or scheduled (lakeflow) pipelines. - Fast sampling pipeline shows both DataFrame output and raw API pages, making it easy to debug response shapes and headers.
- Jinja templating, environment variable lookups, and runtime Spark options let you parameterise connectors for different environments.
- Incremental sync support seeds API cursors from JSON state files (local or remote via
fsspec) and updates them automatically between runs.
Install
# Lightweight core package with only httpx, pydantic and jinja2 dependencies
pip install polymo
# Adds Spark, FastAPI/uvicorn and frontend assets for the builder UI
pip install "polymo[builder]"
Polymo requires PySpark 4.x. The CLI enforces this requirement before launching the builder or smoke test helpers.
Quick Start
-
Describe a stream in YAML:
version: 0.1 source: type: rest base_url: https://jsonplaceholder.typicode.com stream: path: /posts params: _limit: 20 infer_schema: true
-
Read the API with Spark:
from pyspark.sql import SparkSession from polymo import ApiReader spark = SparkSession.builder.getOrCreate() spark.dataSource.register(ApiReader) df = ( spark.read.format("polymo") .option("config_path", "./config.yml") .option("token", "<YOUR_BEARER_TOKEN>") .load() df.show()
-
Use the builder UI to iterate faster:
polymo builder --port 9000
The browser app walks you through the same settings, validates the YAML against the Python backend, runs sample requests, and lets you save polished configs.
Project Pieces
src/polymo/– PySpark DataSource, config validation, and REST client.polymo builder– FastAPI backend with a React/Tailwind single-page app underbuilder-ui/.examples/– Ready-to-run connector samples used by the smoke test and the builder landing screen.
Where to Next
Read the docs here
Contributions and early feedback welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polymo-0.2.0.tar.gz.
File metadata
- Download URL: polymo-0.2.0.tar.gz
- Upload date:
- Size: 172.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d167a74957ce2f3232f0dd3c6db665b26eea94a621ea45481f3758844d672568
|
|
| MD5 |
318803e64f28d6fd1960b2adf4570043
|
|
| BLAKE2b-256 |
99c430f9425161cdba50acee55c7fe321e9ff80d51f0a1eb5a661c81872cc997
|
Provenance
The following attestation bundles were made for polymo-0.2.0.tar.gz:
Publisher:
release.yml on dan1elt0m/polymo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polymo-0.2.0.tar.gz -
Subject digest:
d167a74957ce2f3232f0dd3c6db665b26eea94a621ea45481f3758844d672568 - Sigstore transparency entry: 569650438
- Sigstore integration time:
-
Permalink:
dan1elt0m/polymo@c06dd735ce4ccd5b3dcb28f9254e78e5ebf0d0c8 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/dan1elt0m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c06dd735ce4ccd5b3dcb28f9254e78e5ebf0d0c8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file polymo-0.2.0-py3-none-any.whl.
File metadata
- Download URL: polymo-0.2.0-py3-none-any.whl
- Upload date:
- Size: 177.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09a20bfdb44878846aa5b0b4b0d8fbdbf1b60885a0936f9329b8f3cad1be711a
|
|
| MD5 |
48dfe1ee7de300dbbe9a138c4eec79e8
|
|
| BLAKE2b-256 |
4c01349814d9223420070a41daf2464ea3d1a8c2c49b216f4af2ec1d7e9c245b
|
Provenance
The following attestation bundles were made for polymo-0.2.0-py3-none-any.whl:
Publisher:
release.yml on dan1elt0m/polymo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polymo-0.2.0-py3-none-any.whl -
Subject digest:
09a20bfdb44878846aa5b0b4b0d8fbdbf1b60885a0936f9329b8f3cad1be711a - Sigstore transparency entry: 569650446
- Sigstore integration time:
-
Permalink:
dan1elt0m/polymo@c06dd735ce4ccd5b3dcb28f9254e78e5ebf0d0c8 -
Branch / Tag:
refs/tags/0.2.0 - Owner: https://github.com/dan1elt0m
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@c06dd735ce4ccd5b3dcb28f9254e78e5ebf0d0c8 -
Trigger Event:
release
-
Statement type: