Singer.io target for loading data to iomete
Project description
singer-target-iomete
Singer target that loads data into iomete following the Singer spec.
How to use it
If you want to run this Singer Target independently please read further.
Install
First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.
It's recommended to use a virtualenv:
python3 -m venv venv
pip install singer-target-iomete
or
python3 -m venv venv
source .env/bin/activate
pip install --upgrade pip
pip install .
To run
Like any other target that's following the singer specification:
some-singer-tap | singer-target-iomete --config [config.json]
It's reading incoming messages from STDIN and using the properties in config.json
to upload data into iomete.
Note: To avoid version conflicts run tap
and targets
in separate virtual environments.
Pre-requirements
You need to create a few objects in iomete in one schema before start using this target.
Configuration settings
Running the the target connector requires a config.json
file. Example with the minimal settings:
{
"host": "<cluster_id>.iomete.cloud",
"workspace_id": "abcde-123",
"lakehouse": "lakehouse",
"user": "username",
"password": "password",
"database": "default",
"default_target_schema": "singer",
"add_metadata_columns": true,
"hard_delete": true,
"aws_access_key_id": "key_id",
"aws_secret_access_key": "access_key",
"s3_bucket": "iom-lakehouse-000000000000",
"s3_key_prefix": "external/singer/",
"primary_key_required": false,
"no_compression": false,
"temp_dir": "../tempdir"
}
Full list of options in config.json
:
Property | Type | Required? | Description |
---|---|---|---|
host | String | Yes | iomete host (i.e. xyz12.iomete.com) |
workspace_id | String | Yes | iomete workspace ID (i.e. abcde-123) |
user | String | Yes | iomete User |
password | String | Yes | iomete Password |
lakehouse | String | Yes | iomete lakehouse name |
database | String | No | database (or 'default') |
aws_access_key_id | String | No | S3 Access Key Id. If not provided, AWS_ACCESS_KEY_ID environment variable or IAM role will be used |
aws_secret_access_key | String | No | S3 Secret Access Key. If not provided, AWS_SECRET_ACCESS_KEY environment variable or IAM role will be used |
aws_session_token | String | No | AWS Session token. If not provided, AWS_SESSION_TOKEN environment variable will be used |
aws_profile | String | No | AWS profile name for profile based authentication. If not provided, AWS_PROFILE environment variable will be used. |
s3_bucket | String | No | S3 Bucket name. Required if to use S3 External stage. When this is defined then stage has to be defined as well. |
s3_key_prefix | String | No | (Default: None) A static prefix before the generated S3 key names. Using prefixes you can upload files into specific directories in the S3 bucket. |
s3_endpoint_url | String | No | The complete URL to use for the constructed client. This is allowing to use non-native s3 account. |
s3_region_name | String | No | Default region when creating new connections |
s3_acl | String | No | S3 ACL name to set on the uploaded files |
batch_size_rows | Integer | (Default: 100000) Maximum number of rows in each batch. At the end of each batch, the rows in the batch are loaded into iomete. | |
batch_wait_limit_seconds | Integer | (Default: None) Maximum time to wait for batch to reach batch_size_rows . |
|
flush_all_streams | Boolean | (Default: False) Flush and load every stream into iomete when one batch is full. Warning: This may trigger the COPY command to use files with low number of records, and may cause performance problems. | |
parallelism | Integer | (Default: 0) The number of threads used to flush tables. 0 will create a thread for each stream, up to parallelism_max. -1 will create a thread for each CPU core. Any other positive number will create that number of threads, up to parallelism_max. Parallelism works only with external stages. If no s3_bucket defined with an external stage then flusing tables is enforced to use a single thread. | |
parallelism_max | Integer | (Default: 16) Max number of parallel threads to use when flushing tables. | |
default_target_schema | String | Name of the schema where the tables will be created, without database prefix. If schema_mapping is not defined then every stream sent by the tap is loaded into this schema. |
|
schema_mapping | Object | Useful if you want to load multiple streams from one tap to multiple iomete schemas | |
add_metadata_columns | Boolean | (Default: False) Metadata columns add extra row level information about data ingestions, (i.e. when was the row read in source, when was inserted or deleted in iomete etc.) Metadata columns are creating automatically by adding extra columns to the tables with a column prefix _SDC_ . The column names are following the stitch naming conventions documented at https://www.stitchdata.com/docs/data-structure/integration-schemas#sdc-columns. Enabling metadata columns will flag the deleted rows by setting the _SDC_DELETED_AT metadata column. Without the add_metadata_columns option the deleted rows from singer taps will not be recongisable in iomete. |
|
hard_delete | Boolean | (Default: False) When hard_delete option is true then DELETE SQL commands will be performed in iomete to delete rows in tables. It's achieved by continuously checking the _SDC_DELETED_AT metadata column sent by the singer tap. Due to deleting rows requires metadata columns, hard_delete option automatically enables the add_metadata_columns option as well. |
|
data_flattening_max_level | Integer | (Default: 0) Object type RECORD items from taps can be loaded into STRUCT columns as JSON (default) or we can flatten the schema by creating columns automatically. When value is 0 (default) then flattening functionality is turned off. |
|
primary_key_required | Boolean | (Default: True) Log based and Incremental replications on tables with no Primary Key cause duplicates when merging UPDATE events. When set to true, stop loading data if no Primary Key is defined. | |
validate_records | Boolean | (Default: False) Validate every single record message to the corresponding JSON schema. This option is disabled by default and invalid RECORD messages will fail only at load time by iomete. Enabling this option will detect invalid records earlier but could cause performance degradation. | |
temp_dir | String | (Default: platform-dependent) Directory of temporary files with RECORD messages. | |
no_compression? | Boolean | (Default: False) Generate uncompressed files when loading to iomete. Normally, by default GZIP compressed files are generated. |
License
Apache License Version 2.0
See LICENSE to see the full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file singer-target-iomete-1.1.0.tar.gz
.
File metadata
- Download URL: singer-target-iomete-1.1.0.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebc774a886b468fb50786c73bbadd718a34e2d9b84edb0168973f902416c2621 |
|
MD5 | 1de79d17f861a7a11a221564506753d4 |
|
BLAKE2b-256 | 1323c08026a97074bc52949e75b7ce32896deee572850ff0a280ef46e59667d1 |
File details
Details for the file singer_target_iomete-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: singer_target_iomete-1.1.0-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8bb599531b3d05d2f4b5bd0f1e2b693fd89a01025c6a232a14710b4ac246679 |
|
MD5 | 0abd878dbc2eb2733275dd859ff78dc5 |
|
BLAKE2b-256 | 6cd832ffeda846b47e50cf03a88cf48e5eb9f60c004506abb115a834f7ee406c |