A Singer target for CrateDB, built with the Meltano SDK, and based on the Meltano PostgreSQL target.
Project description
Meltano/Singer Target for CrateDB
About
A Singer target for CrateDB, built with the Meltano SDK for custom extractors and loaders, and based on the Meltano PostgreSQL target. It connects a library of 600+ connectors with CrateDB, and vice versa.
In Singer ELT jargon, a "target" conceptually wraps a data sink, where you "load" data into.
Singer, Meltano, and PipelineWise provide foundational components and an integration engine for composable Open Source ETL with 600+ connectors. On the database integration side, they are heavily based on SQLAlchemy.
CrateDB
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, and based on Apache Lucene.
CrateDB offers a Python SQLAlchemy dialect, in order to plug into the comprehensive Python data-science and -wrangling ecosystems.
Singer
The open-source standard for writing scripts that move data.
Singer is an open source specification and software framework for ETL/ELT data exchange between a range of different systems. For talking to SQL databases, it employs a metadata subsystem based on SQLAlchemy.
Singer reads and writes Singer-formatted messages, following the Singer Spec. Effectively, those are JSONL files.
Meltano
Unlock all the data that powers your data platform.
Say goodbye to writing, maintaining, and scaling your own API integrations with Meltano's declarative code-first data integration engine, bringing 600+ APIs and DBs to the table.
Meltano builds upon Singer technologies, uses configuration files in YAML syntax instead of JSON, adds an improved SDK and other components, and runs the central addon registry, meltano | Hub.
PipelineWise
PipelineWise is another Data Pipeline Framework using the Singer.io specification to ingest and replicate data from various sources to various destinations. The list of PipelineWise Taps include another 20+ high-quality data-source and -sink components.
SQLAlchemy
SQLAlchemy is the leading Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
It provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.
Install
Usually, you will not install this package directly, but on behalf
of a Meltano definition instead, for example. A corresponding snippet
is outlined in the next section. After adding it to your meltano.yml
configuration file, you can install all defined components and their
dependencies.
meltano install
Usage
You can run the CrateDB Singer target target-cratedb
by itself, or
in a pipeline using Meltano.
Meltano
Using the meltano add
subcommand, you can add the plugin to your
Meltano project.
meltano add loader target-cratedb
NB: It will only work like this when released and registered on Meltano Hub. In the meanwhile, please add the configuration snippet manually.
CrateDB Cloud
In order to connect to CrateDB Cloud, configure the sqlalchemy_url
setting
within your meltano.yml
configuration file like this.
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: "crate://admin:K4IgMXNvQBJM3CiElOiPHuSp6CiXPCiQYhB4I9dLccVHGvvvitPSYr1vTpt4@example.aks1.westeurope.azure.cratedb.net:4200?ssl=true"}
add_record_metadata: true
On localhost
In order to connect to a standalone or on-premise instance of CrateDB, configure
the sqlalchemy_url
setting within your meltano.yml
configuration file like this.
- name: target-cratedb
namespace: cratedb
variant: cratedb
pip_url: meltano-target-cratedb
config:
sqlalchemy_url: crate://crate@localhost/
add_record_metadata: true
Then, invoke the pipeline by using meltano run
, similar like this.
meltano run tap-xyz target-cratedb
Standalone
You can also invoke it standalone by using the target-cratedb
program.
This example demonstrates how to load a file into the database.
First, acquire an example file in Singer format, including the list of countries of the world.
wget https://github.com/MeltanoLabs/target-postgres/raw/v0.0.9/target_postgres/tests/data_files/tap_countries.singer
Now, define the database connection string including credentials in SQLAlchemy format.
echo '{"sqlalchemy_url": "crate://crate@localhost/"}' > settings.json
By using Unix pipes, load the data file into the database, referencing the path to the configuration file.
cat tap_countries.singer | target-cratedb --config=settings.json
Using the interactive terminal program, crash
, you can run SQL
statements on CrateDB.
pip install crash
crash --hosts localhost:4200
Now, you can verify that the data has been loaded correctly.
SELECT
"code", "name", "capital", "emoji", "languages[1]"
FROM
"melty"."countries"
ORDER BY
"name"
LIMIT
42;
Development
In order to work on this adapter dialect on behalf of a real pipeline definition,
link your sandbox to a development installation of meltano-target-cratedb, and
configure the pip_url
of the component to point to a different location than the
vanilla package on PyPI.
Use this URL to directly point to a specific Git repository reference.
pip_url: git+https://github.com/crate-workbench/meltano-target-cratedb.git@main
Use a pip
-like notation to link the CrateDB Singer target in development mode,
so you can work on it at the same time while running the pipeline, and iterating
on its definition.
pip_url: --editable=/path/to/sources/meltano-target-cratedb
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for meltano-target-cratedb-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae7d3a6ece37a38dd39e3ecb45d7d7104d17648ec297195ea9196dc745ab0f1f |
|
MD5 | 5701d71272fdafecdbfc9d7de647994f |
|
BLAKE2b-256 | 29a4bc68a560ecd794b09177eb8b5e4ab3234d09cf5810f9c0394a7a5112d583 |
Hashes for meltano_target_cratedb-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee338d60f8ae30111ba968ab7eb84829bbfd6408646471d1e1ae7c97e7007a20 |
|
MD5 | 88bd793c2cf453527d1c4315e6af0441 |
|
BLAKE2b-256 | 288d58e9fdc80484f832cb429b6a59f4a0e0822438f1e7dded596878f686b459 |