Skip to main content

A package for managing singer.io taps and targets

Project description

Alto (WIP)

A lightweight yet intelligent way to manage Singer based ELT.

How is this different than what exists today?

Using Meltano as the baseline of comparison, there are some noteworthy differences.

  • Significantly smaller dependency footprint by an order of magnitude. Alto only has 4 direct dependencies with no C or rust extensions in the dependency tree. The below comparison includes transitives:
    • Meltano: 151
    • Alto: 7
  • Because of its dependency footprint, it can be installed in very tiny containers and packaged formats such as PEX are cross platform compatible. It can also be used with PyOxide or Nuitka.
  • We use PEX (PythonEXecutable) for all plugins instead of loose venvs making plugins single files that are straightforward to cache.
  • We use a (simple) caching algorithm that makes the plugins re-usable across machines when combined with a remote filesystem.
  • We use fsspec to provide a filesystem abstraction layer that provides the exact same experience locally on a single machine as when plugged into a remote blob store such as s3, gcs, or any supported fsspec storage.
  • An order of magnitude (>85%) less code which makes iteration/maintenance or forking easier (in theory)
  • We use Dynaconf to manage configuration
    • This gives us uniform support for json, toml, and yaml out of the box
    • We get environment management
    • We get configuration inheritance / deep merging
    • We get .env support
    • We get unique ways to render vars with '@format tokens

Meltano

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python                     154     26842     2402      4262    20178       1106

Alto

───────────────────────────────────────────────────────────────────────────────
Language                 Files     Lines   Blanks  Comments     Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python                      12      2892      226       164     2502        190

Example

An entire timed end-to-end example can be carried out via the below command.

From start to finish, it will:

  1. Create a directory
  2. Initialize an alto project (create the alto.toml file)
  3. Run an extract -> load of an open API to target jsonl
    1. Build PEX plugins for tap-carbon-intensity and target-jsonl
    2. Dynamically generate config for the Singer plugin based on the toml file (supports toml/yaml/json)
    3. Run discovery and cache catalog to ~/.alto/(project-name)/catalog
    4. Apply user configuration to the catalog
    5. Run the pipeline
    6. Clean up the staging directory
    7. Manage and persist the state
# Create a dir, init a project, run an end-2-end pipeline, show some output as proof
mkdir example_project \
&& cd $_; yes | alto init; \
time alto tap-carbon-intensity:target-jsonl; \
cat output/* | head -8; ls -l output; cd -; \
tree example_project

Resulting in the below output:

example_project
├── .alto
│   ├── logs
│   │   └── dev
│   └── plugins
│       ├── 263b729b56cf48f4bc3d08b687045ad3f81713ce
│       └── 60e33af4f316a41812ee404136d7a747011ba811
├── .alto.json
├── alto.secrets.toml
├── alto.toml
└── output
    ├── entry-20230228T205342.jsonl
    ├── generationmix-20230228T205342.jsonl
    └── region-20230228T205342.jsonl

5 directories, 8 files

>>> cat alto.toml

[default]
project_name = "4c167d53"
extensions = []
namespace = "raw"

[default.taps.tap-carbon-intensity]
pip_url = "git+https://gitlab.com/meltano/tap-carbon-intensity.git#egg=tap_carbon_intensity"
namespace = "carbon_intensity"
capabilities = ["state", "catalog"]
select = ["*.*"]

[default.taps.tap-carbon-intensity.config]

[default.targets.target-jsonl]
pip_url = "target-jsonl==0.1.4"

[default.targets.target-jsonl.config]
destination_path = "output"

The tale of a tiny binary

One can produce a sub 50mb binary with nuitka that can be built in a multistage docker image and copied into the final stage producing incredibly small containers.

nuitka3 --standalone --onefile --output-dir=build --output-filename=alto alto/main.py

Resulting image based on bundled Dockerfile inspected with dive:

❯ CI=true dive tinysinger:test
  Using default CI config
Image Source: docker://tinysinger:test
Fetching image... (this can take a while for large images)
Analyzing image...
  efficiency: 100.0000 %
  wastedBytes: 0 bytes (0 B)
  userWastedPercent: 0.0000 %
Inefficient Files:
Count  Wasted Space  File Path
None
Results:
  PASS: highestUserWastedPercent
  SKIP: highestWastedBytes: rule disabled
  PASS: lowestEfficiency
Result:PASS [Total:3] [Passed:2] [Failed:0] [Warn:0] [Skipped:1]

So the example above could be ran like this:

mkdir example_project \
&& cd $_; yes | docker run -i -v$(pwd):/stage z3z1ma/alto:test-1 -- --root /stage init; \
time docker run -v$(pwd):/stage z3z1ma/alto:test-1 -- --root /stage tap-carbon-intensity:target-jsonl; \
cat output/* | head -8; ls -l output; cd -; \
tree example_project

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer_alto-0.1.3.tar.gz (30.6 kB view details)

Uploaded Source

Built Distribution

singer_alto-0.1.3-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file singer_alto-0.1.3.tar.gz.

File metadata

  • Download URL: singer_alto-0.1.3.tar.gz
  • Upload date:
  • Size: 30.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.10 Darwin/21.3.0

File hashes

Hashes for singer_alto-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d11fd585f3e58c375219d2ffa6345daca7c28536e9aa1a2cacb1239f6af3a43e
MD5 b115df4742765800ca653e75d1900594
BLAKE2b-256 ecf377edf4a6fd0ce6f66c7fa75cadcb6ebea7c7a2c42038cce46edc9a51e3dc

See more details on using hashes here.

File details

Details for the file singer_alto-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: singer_alto-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.10 Darwin/21.3.0

File hashes

Hashes for singer_alto-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f0f0e4ff2b600f98577bce9a712dbc487dc7987835ddbd3440f6e3c1629a5cd3
MD5 6a92bf05d7d2d54562e9eb78684a7ec5
BLAKE2b-256 8ee70fe068aed1e23ced15ba1bd935954937d7fe20f49d251fd2f14b5a80e929

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page