Singer.io target for loading data to iomete

These details have not been verified by PyPI

Project links

Homepage

Project description

singer-target-iomete

Singer target that loads data into iomete following the Singer spec.

How to use it

If you want to run this Singer Target independently please read further.

Install

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It's recommended to use a virtualenv:

  python3 -m venv venv
  pip install singer-target-iomete

  python3 -m venv venv
  source .env/bin/activate
  pip install --upgrade pip
  pip install .

To run

Like any other target that's following the singer specification:

some-singer-tap | singer-target-iomete --config [config.json]

It's reading incoming messages from STDIN and using the properties in config.json to upload data into iomete.

Note: To avoid version conflicts run tap and targets in separate virtual environments.

Pre-requirements

You need to create a few objects in iomete in one schema before start using this target.

Configuration settings

Running the the target connector requires a config.json file. Example with the minimal settings:

{
"host": "<cluster_id>.iomete.cloud",
"workspace_id": "abcde-123",
"lakehouse": "lakehouse",
"user": "username",
"password": "password",
"database": "default",
"default_target_schema": "singer",
"add_metadata_columns": true,
"hard_delete": true,
"aws_access_key_id": "key_id",
"aws_secret_access_key": "access_key",
"s3_bucket": "iom-lakehouse-000000000000",
"s3_key_prefix": "external/singer/",
"primary_key_required": false,
"no_compression": false,
"temp_dir": "../tempdir"
}

Full list of options in config.json:

Property	Type	Required?	Description
host	String	Yes	iomete host (i.e. xyz12.iomete.com)
workspace_id	String	Yes	iomete workspace ID (i.e. abcde-123)
user	String	Yes	iomete User
password	String	Yes	iomete Password
lakehouse	String	Yes	iomete lakehouse name
database	String	No	database (or 'default')
aws_access_key_id	String	No	S3 Access Key Id. If not provided, `AWS_ACCESS_KEY_ID` environment variable or IAM role will be used
aws_secret_access_key	String	No	S3 Secret Access Key. If not provided, `AWS_SECRET_ACCESS_KEY` environment variable or IAM role will be used
aws_session_token	String	No	AWS Session token. If not provided, `AWS_SESSION_TOKEN` environment variable will be used
aws_profile	String	No	AWS profile name for profile based authentication. If not provided, `AWS_PROFILE` environment variable will be used.
s3_bucket	String	No	S3 Bucket name. Required if to use S3 External stage. When this is defined then `stage` has to be defined as well.
s3_key_prefix	String	No	(Default: None) A static prefix before the generated S3 key names. Using prefixes you can upload files into specific directories in the S3 bucket.
s3_endpoint_url	String	No	The complete URL to use for the constructed client. This is allowing to use non-native s3 account.
s3_region_name	String	No	Default region when creating new connections
s3_acl	String	No	S3 ACL name to set on the uploaded files
batch_size_rows	Integer		(Default: 100000) Maximum number of rows in each batch. At the end of each batch, the rows in the batch are loaded into iomete.
batch_wait_limit_seconds	Integer		(Default: None) Maximum time to wait for batch to reach `batch_size_rows`.
flush_all_streams	Boolean		(Default: False) Flush and load every stream into iomete when one batch is full. Warning: This may trigger the COPY command to use files with low number of records, and may cause performance problems.
parallelism	Integer		(Default: 0) The number of threads used to flush tables. 0 will create a thread for each stream, up to parallelism_max. -1 will create a thread for each CPU core. Any other positive number will create that number of threads, up to parallelism_max. Parallelism works only with external stages. If no s3_bucket defined with an external stage then flusing tables is enforced to use a single thread.
parallelism_max	Integer		(Default: 16) Max number of parallel threads to use when flushing tables.
default_target_schema	String		Name of the schema where the tables will be created, without database prefix. If `schema_mapping` is not defined then every stream sent by the tap is loaded into this schema.
schema_mapping	Object		Useful if you want to load multiple streams from one tap to multiple iomete schemas
add_metadata_columns	Boolean		(Default: False) Metadata columns add extra row level information about data ingestions, (i.e. when was the row read in source, when was inserted or deleted in iomete etc.) Metadata columns are creating automatically by adding extra columns to the tables with a column prefix `_SDC_`. The column names are following the stitch naming conventions documented at https://www.stitchdata.com/docs/data-structure/integration-schemas#sdc-columns. Enabling metadata columns will flag the deleted rows by setting the `_SDC_DELETED_AT` metadata column. Without the `add_metadata_columns` option the deleted rows from singer taps will not be recongisable in iomete.
hard_delete	Boolean		(Default: False) When `hard_delete` option is true then DELETE SQL commands will be performed in iomete to delete rows in tables. It's achieved by continuously checking the `_SDC_DELETED_AT` metadata column sent by the singer tap. Due to deleting rows requires metadata columns, `hard_delete` option automatically enables the `add_metadata_columns` option as well.
data_flattening_max_level	Integer		(Default: 0) Object type RECORD items from taps can be loaded into STRUCT columns as JSON (default) or we can flatten the schema by creating columns automatically. When value is 0 (default) then flattening functionality is turned off.
primary_key_required	Boolean		(Default: True) Log based and Incremental replications on tables with no Primary Key cause duplicates when merging UPDATE events. When set to true, stop loading data if no Primary Key is defined.
validate_records	Boolean		(Default: False) Validate every single record message to the corresponding JSON schema. This option is disabled by default and invalid RECORD messages will fail only at load time by iomete. Enabling this option will detect invalid records earlier but could cause performance degradation.
temp_dir	String		(Default: platform-dependent) Directory of temporary files with RECORD messages.
no_compression?	Boolean		(Default: False) Generate uncompressed files when loading to iomete. Normally, by default GZIP compressed files are generated.

License

Apache License Version 2.0

See LICENSE to see the full text.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.0

May 9, 2023

1.0.1

Oct 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer-target-iomete-1.1.0.tar.gz (26.8 kB view details)

Uploaded May 9, 2023 Source

Built Distribution

singer_target_iomete-1.1.0-py3-none-any.whl (26.7 kB view details)

Uploaded May 9, 2023 Python 3

File details

Details for the file singer-target-iomete-1.1.0.tar.gz.

File metadata

Download URL: singer-target-iomete-1.1.0.tar.gz
Upload date: May 9, 2023
Size: 26.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for singer-target-iomete-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ebc774a886b468fb50786c73bbadd718a34e2d9b84edb0168973f902416c2621`
MD5	`1de79d17f861a7a11a221564506753d4`
BLAKE2b-256	`1323c08026a97074bc52949e75b7ce32896deee572850ff0a280ef46e59667d1`

See more details on using hashes here.

File details

Details for the file singer_target_iomete-1.1.0-py3-none-any.whl.

File metadata

Download URL: singer_target_iomete-1.1.0-py3-none-any.whl
Upload date: May 9, 2023
Size: 26.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for singer_target_iomete-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8bb599531b3d05d2f4b5bd0f1e2b693fd89a01025c6a232a14710b4ac246679`
MD5	`0abd878dbc2eb2733275dd859ff78dc5`
BLAKE2b-256	`6cd832ffeda846b47e50cf03a88cf48e5eb9f60c004506abb115a834f7ee406c`

See more details on using hashes here.

singer-target-iomete 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

singer-target-iomete

How to use it

Install

To run

Pre-requirements

Configuration settings

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes