Skip to main content

API ingestion for PySpark

Project description

Polymo

Polymo turns REST APIs into Spark DataFrames with a single declarative configuration file. The library builds on top of the DataSource V2 implementation for PySpark 4, while the companion builder UI helps you design, validate, and preview connectors without writing code.

Highlights

  • Visual builder keeps a form-based editor and live YAML in sync, with validation and previews.
  • Configuration is plain YAML: describe the base URL, pagination, query parameters, and optional record selectors—no custom code required.
  • Spark-native DataSource exposes spark.read.format("polymo"), so connectors slot into existing ETL jobs, notebooks, or scheduled (lakeflow) pipelines.
  • Fast sampling pipeline shows both DataFrame output and raw API pages, making it easy to debug response shapes and headers.
  • Jinja templating, environment variable lookups, and runtime Spark options let you parameterise connectors for different environments.
  • Incremental sync support seeds API cursors from JSON state files (local or remote via fsspec) and updates them automatically between runs.

Install

# Lightweight core package with only httpx, pydantic and jinja2 dependencies
pip install polymo
# Adds Spark, FastAPI/uvicorn and frontend assets for the builder UI
pip install "polymo[builder]"

Polymo requires PySpark 4.x. The CLI enforces this requirement before launching the builder or smoke test helpers.

Quick Start

  1. Describe a stream in YAML:

    version: 0.1
    source:
      type: rest
      base_url: https://jsonplaceholder.typicode.com
    stream:
      path: /posts
      params:
        _limit: 20
      infer_schema: true
    
  2. Read the API with Spark:

    from pyspark.sql import SparkSession
    from polymo import ApiReader
    
    spark = SparkSession.builder.getOrCreate()
    spark.dataSource.register(ApiReader)
    
    df = (
     spark.read.format("polymo")
     .option("config_path", "./config.yml")
     .option("token", "<YOUR_BEARER_TOKEN>")
     .load()
    
    df.show()
    
  3. Use the builder UI to iterate faster:

    polymo builder --port 9000
    

    The browser app walks you through the same settings, validates the YAML against the Python backend, runs sample requests, and lets you save polished configs.

Project Pieces

  • src/polymo/ – PySpark DataSource, config validation, and REST client.
  • polymo builder – FastAPI backend with a React/Tailwind single-page app under builder-ui/.
  • examples/ – Ready-to-run connector samples used by the smoke test and the builder landing screen.

Where to Next

Read the docs here

Contributions and early feedback welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polymo-0.2.0.tar.gz (172.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polymo-0.2.0-py3-none-any.whl (177.3 kB view details)

Uploaded Python 3

File details

Details for the file polymo-0.2.0.tar.gz.

File metadata

  • Download URL: polymo-0.2.0.tar.gz
  • Upload date:
  • Size: 172.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d167a74957ce2f3232f0dd3c6db665b26eea94a621ea45481f3758844d672568
MD5 318803e64f28d6fd1960b2adf4570043
BLAKE2b-256 99c430f9425161cdba50acee55c7fe321e9ff80d51f0a1eb5a661c81872cc997

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.2.0.tar.gz:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polymo-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: polymo-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 177.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polymo-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 09a20bfdb44878846aa5b0b4b0d8fbdbf1b60885a0936f9329b8f3cad1be711a
MD5 48dfe1ee7de300dbbe9a138c4eec79e8
BLAKE2b-256 4c01349814d9223420070a41daf2464ea3d1a8c2c49b216f4af2ec1d7e9c245b

See more details on using hashes here.

Provenance

The following attestation bundles were made for polymo-0.2.0-py3-none-any.whl:

Publisher: release.yml on dan1elt0m/polymo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page