Skip to main content

Singer.io target for loading data to iomete

Project description

singer-target-iomete

Singer target that loads data into iomete following the Singer spec.

How to use it

If you want to run this Singer Target independently please read further.

Install

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It's recommended to use a virtualenv:

  python3 -m venv venv
  pip install singer-target-iomete

or

  python3 -m venv venv
  source .env/bin/activate
  pip install --upgrade pip
  pip install .

To run

Like any other target that's following the singer specification:

some-singer-tap | singer-target-iomete --config [config.json]

It's reading incoming messages from STDIN and using the properties in config.json to upload data into iomete.

Note: To avoid version conflicts run tap and targets in separate virtual environments.

Pre-requirements

You need to create a few objects in iomete in one schema before start using this target.

Configuration settings

Running the the target connector requires a config.json file. Example with the minimal settings:

{
"host": "<cluster_id>.iomete.cloud",
"workspace_id": "abcde-123",
"lakehouse": "lakehouse",
"user": "username",
"password": "password",
"database": "default",
"default_target_schema": "singer",
"add_metadata_columns": true,
"hard_delete": true,
"aws_access_key_id": "key_id",
"aws_secret_access_key": "access_key",
"s3_bucket": "iom-lakehouse-000000000000",
"s3_key_prefix": "external/singer/",
"primary_key_required": false,
"no_compression": false,
"temp_dir": "../tempdir"
}

Full list of options in config.json:

Property Type Required? Description
host String Yes iomete host (i.e. xyz12.iomete.com)
workspace_id String Yes iomete workspace ID (i.e. abcde-123)
user String Yes iomete User
password String Yes iomete Password
lakehouse String Yes iomete lakehouse name
database String No database (or 'default')
aws_access_key_id String No S3 Access Key Id. If not provided, AWS_ACCESS_KEY_ID environment variable or IAM role will be used
aws_secret_access_key String No S3 Secret Access Key. If not provided, AWS_SECRET_ACCESS_KEY environment variable or IAM role will be used
aws_session_token String No AWS Session token. If not provided, AWS_SESSION_TOKEN environment variable will be used
aws_profile String No AWS profile name for profile based authentication. If not provided, AWS_PROFILE environment variable will be used.
s3_bucket String No S3 Bucket name. Required if to use S3 External stage. When this is defined then stage has to be defined as well.
s3_key_prefix String No (Default: None) A static prefix before the generated S3 key names. Using prefixes you can upload files into specific directories in the S3 bucket.
s3_endpoint_url String No The complete URL to use for the constructed client. This is allowing to use non-native s3 account.
s3_region_name String No Default region when creating new connections
s3_acl String No S3 ACL name to set on the uploaded files
batch_size_rows Integer (Default: 100000) Maximum number of rows in each batch. At the end of each batch, the rows in the batch are loaded into iomete.
batch_wait_limit_seconds Integer (Default: None) Maximum time to wait for batch to reach batch_size_rows.
flush_all_streams Boolean (Default: False) Flush and load every stream into iomete when one batch is full. Warning: This may trigger the COPY command to use files with low number of records, and may cause performance problems.
parallelism Integer (Default: 0) The number of threads used to flush tables. 0 will create a thread for each stream, up to parallelism_max. -1 will create a thread for each CPU core. Any other positive number will create that number of threads, up to parallelism_max. Parallelism works only with external stages. If no s3_bucket defined with an external stage then flusing tables is enforced to use a single thread.
parallelism_max Integer (Default: 16) Max number of parallel threads to use when flushing tables.
default_target_schema String Name of the schema where the tables will be created, without database prefix. If schema_mapping is not defined then every stream sent by the tap is loaded into this schema.
schema_mapping Object Useful if you want to load multiple streams from one tap to multiple iomete schemas
add_metadata_columns Boolean (Default: False) Metadata columns add extra row level information about data ingestions, (i.e. when was the row read in source, when was inserted or deleted in iomete etc.) Metadata columns are creating automatically by adding extra columns to the tables with a column prefix _SDC_. The column names are following the stitch naming conventions documented at https://www.stitchdata.com/docs/data-structure/integration-schemas#sdc-columns. Enabling metadata columns will flag the deleted rows by setting the _SDC_DELETED_AT metadata column. Without the add_metadata_columns option the deleted rows from singer taps will not be recongisable in iomete.
hard_delete Boolean (Default: False) When hard_delete option is true then DELETE SQL commands will be performed in iomete to delete rows in tables. It's achieved by continuously checking the _SDC_DELETED_AT metadata column sent by the singer tap. Due to deleting rows requires metadata columns, hard_delete option automatically enables the add_metadata_columns option as well.
data_flattening_max_level Integer (Default: 0) Object type RECORD items from taps can be loaded into STRUCT columns as JSON (default) or we can flatten the schema by creating columns automatically.

When value is 0 (default) then flattening functionality is turned off.
primary_key_required Boolean (Default: True) Log based and Incremental replications on tables with no Primary Key cause duplicates when merging UPDATE events. When set to true, stop loading data if no Primary Key is defined.
validate_records Boolean (Default: False) Validate every single record message to the corresponding JSON schema. This option is disabled by default and invalid RECORD messages will fail only at load time by iomete. Enabling this option will detect invalid records earlier but could cause performance degradation.
temp_dir String (Default: platform-dependent) Directory of temporary files with RECORD messages.
no_compression? Boolean (Default: False) Generate uncompressed files when loading to iomete. Normally, by default GZIP compressed files are generated.

License

Apache License Version 2.0

See LICENSE to see the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

singer-target-iomete-1.1.0.tar.gz (26.8 kB view hashes)

Uploaded Source

Built Distribution

singer_target_iomete-1.1.0-py3-none-any.whl (26.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page