A package for managing singer.io taps and targets
Project description
Alto (WIP)
A lightweight yet intelligent way to manage Singer based ELT.
How is this different than what exists today?
Using Meltano as the baseline of comparison, there are some noteworthy differences.
- Significantly smaller dependency footprint by an order of magnitude. Alto only has 4 direct dependencies with no C or rust extensions in the dependency tree. The below comparison includes transitives:
- Meltano: 151
- Alto: 7
- Because of its dependency footprint, it can be installed in very tiny containers and packaged formats such as
PEX
are cross platform compatible. It can also be used withPyOxide
orNuitka
. - We use
PEX
(PythonEXecutable) for all plugins instead of loose venvs making plugins single files that are straightforward to cache. - We use a (simple) caching algorithm that makes the plugins re-usable across machines when combined with a remote filesystem.
- We use
fsspec
to provide a filesystem abstraction layer that provides the exact same experience locally on a single machine as when plugged into a remote blob store such ass3
,gcs
, or any supportedfsspec
storage. - An order of magnitude (
>85%
) less code which makes iteration/maintenance or forking easier (in theory) - We use
Dynaconf
to manage configuration- This gives us uniform support for json, toml, and yaml out of the box
- We get environment management
- We get configuration inheritance / deep merging
- We get
.env
support - We get unique ways to render vars with
'@format
tokens
Meltano
───────────────────────────────────────────────────────────────────────────────
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python 154 26842 2402 4262 20178 1106
Alto
───────────────────────────────────────────────────────────────────────────────
Language Files Lines Blanks Comments Code Complexity
───────────────────────────────────────────────────────────────────────────────
Python 12 2892 226 164 2502 190
Example
An entire timed end-to-end example can be carried out via the below command.
From start to finish, it will:
- Create a directory
- Initialize an alto project (create the
alto.toml
file) - Run an extract -> load of an open API to target jsonl
- Build PEX plugins for
tap-carbon-intensity
andtarget-jsonl
- Dynamically generate config for the Singer plugin based on the toml file (supports toml/yaml/json)
- Run discovery and cache catalog to ~/.alto/(project-name)/catalog
- Apply user configuration to the catalog
- Run the pipeline
- Clean up the staging directory
- Manage and persist the state
- Build PEX plugins for
# Create a dir, init a project, run an end-2-end pipeline, show some output as proof
mkdir example_project \
&& cd $_; yes | alto init; \
time alto tap-carbon-intensity:target-jsonl; \
cat output/* | head -8; ls -l output; cd -; \
tree example_project
Resulting in the below output:
example_project
├── .alto
│ ├── logs
│ │ └── dev
│ └── plugins
│ ├── 263b729b56cf48f4bc3d08b687045ad3f81713ce
│ └── 60e33af4f316a41812ee404136d7a747011ba811
├── .alto.json
├── alto.secrets.toml
├── alto.toml
└── output
├── entry-20230228T205342.jsonl
├── generationmix-20230228T205342.jsonl
└── region-20230228T205342.jsonl
5 directories, 8 files
>>> cat alto.toml
[default]
project_name = "4c167d53"
extensions = []
namespace = "raw"
[default.taps.tap-carbon-intensity]
pip_url = "git+https://gitlab.com/meltano/tap-carbon-intensity.git#egg=tap_carbon_intensity"
namespace = "carbon_intensity"
capabilities = ["state", "catalog"]
select = ["*.*"]
[default.taps.tap-carbon-intensity.config]
[default.targets.target-jsonl]
pip_url = "target-jsonl==0.1.4"
[default.targets.target-jsonl.config]
destination_path = "output"
The tale of a tiny binary
One can produce a sub 50mb binary with nuitka
that can be built in a multistage docker image and copied into the final stage producing incredibly small containers.
nuitka3 --standalone --onefile --output-dir=build --output-filename=alto alto/main.py
Resulting image based on bundled Dockerfile inspected with dive
:
❯ CI=true dive tinysinger:test
Using default CI config
Image Source: docker://tinysinger:test
Fetching image... (this can take a while for large images)
Analyzing image...
efficiency: 100.0000 %
wastedBytes: 0 bytes (0 B)
userWastedPercent: 0.0000 %
Inefficient Files:
Count Wasted Space File Path
None
Results:
PASS: highestUserWastedPercent
SKIP: highestWastedBytes: rule disabled
PASS: lowestEfficiency
Result:PASS [Total:3] [Passed:2] [Failed:0] [Warn:0] [Skipped:1]
So the example above could be ran like this:
mkdir example_project \
&& cd $_; yes | docker run -i -v$(pwd):/stage z3z1ma/alto:test-1 -- --root /stage init; \
time docker run -v$(pwd):/stage z3z1ma/alto:test-1 -- --root /stage tap-carbon-intensity:target-jsonl; \
cat output/* | head -8; ls -l output; cd -; \
tree example_project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file singer_alto-0.1.3.tar.gz
.
File metadata
- Download URL: singer_alto-0.1.3.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.10 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d11fd585f3e58c375219d2ffa6345daca7c28536e9aa1a2cacb1239f6af3a43e
|
|
MD5 |
b115df4742765800ca653e75d1900594
|
|
BLAKE2b-256 |
ecf377edf4a6fd0ce6f66c7fa75cadcb6ebea7c7a2c42038cce46edc9a51e3dc
|
File details
Details for the file singer_alto-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: singer_alto-0.1.3-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.9.10 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
f0f0e4ff2b600f98577bce9a712dbc487dc7987835ddbd3440f6e3c1629a5cd3
|
|
MD5 |
6a92bf05d7d2d54562e9eb78684a7ec5
|
|
BLAKE2b-256 |
8ee70fe068aed1e23ced15ba1bd935954937d7fe20f49d251fd2f14b5a80e929
|