Skip to main content

Declarative REST API ingestion for PySpark

Project description

Polymo

Turn REST APIs into Spark DataFrames with just a YAML file

test docs PyPI - Status PyPI - Python Version License BSD 3

Welcome to Polymo

Polymo makes it super easy to ingest APIs with Pyspark. You only need to define a YAML file or a Pydantic config.

My vision is that API ingestion doesn't need heavy, third party tools or hard to maintain custom code. The heck, you don't even need Pyspark skills.

Polymo Builder UI - connector preview screen

How does it work?

Define a config file manually or use the recommended, lightweight builder UI. Once you are happy with your config, all you need to do is register the Polymo reader and tell Spark where to find the config:

from pyspark.sql import SparkSession
from polymo import ApiReader

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

df = (
    spark.read.format("polymo")
    .option("config_path", "./config.yml")  # YAML you saved from the Builder
    .option("token", "YOUR_TOKEN")  # Only if the API needs one
    .load()
)

df.show()

Streaming works too:

spark.readStream.format("polymo")

Prefer everything in the code? Build the config with PolymoConfig model.

from pyspark.sql import SparkSession
from polymo import ApiReader, PolymoConfig

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

jp_posts = PolymoConfig(
    base_url="https://jsonplaceholder.typicode.com",
    path="/posts",
)

df = (
    spark.read.format("polymo")
    .option("config_json", jp_posts.config_json())
    .load()
)
df.show()

Does it perform? Polymo can read in batches (pages in parallel) and therefore is much faster than row based solutions like UDFs.

How to start?

Locally you probably want to install polymo along with the Builder UI:

pip install "polymo[builder]"

This comes with all UI deps such as pyspark

Running Polymo on a spark cluster usually doesn't require these UI deps. In that case, just install the bare minimum deps with

pip install polymo

Launch the builder UI

polymo builder

(Optional) Run the Builder in Docker

docker compose up --build builder

Where to Next

Read the docs here

Contributing

It's still early days, but Polymo already supports a lot of features! Is there something missing? Raise an issue or contribute!

Contributions and early feedback welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polymo-0.9.1.tar.gz (203.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polymo-0.9.1-py3-none-any.whl (217.1 kB view details)

Uploaded Python 3

File details

Details for the file polymo-0.9.1.tar.gz.

File metadata

  • Download URL: polymo-0.9.1.tar.gz
  • Upload date:
  • Size: 203.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.9.1.tar.gz
Algorithm Hash digest
SHA256 be48137992b06d18af3cf48ab0a9eeb76df2790a20e2f2b2cf404fd007b34ec5
MD5 1ee00fd1d621b3a817b7632648b79c3f
BLAKE2b-256 9962b5f1accfda44dd6d84d3bce328a52cdd09530f00b4463feecf2eac13c04f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.9.1.tar.gz:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polymo-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: polymo-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 217.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 24f571ce3269a978f4859182771d1544aa109c5b9fca153eb13bc3112bb7d78f
MD5 7da05e5eed05e6fb671527522331de41
BLAKE2b-256 4e980e35c7c3eaa39444c37cd22f05e01fab4b144dc8390c2fd0229109e96104

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.9.1-py3-none-any.whl:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page