Skip to main content

Singer target for S3Tables, built with the Meltano Singer SDK.

Project description

target-s3tables

Singer target (Meltano Singer SDK) which loads Singer streams into Amazon S3 Tables using Apache Iceberg via PyIceberg and the Iceberg REST catalog with AWS SigV4 signing.

This is not a “write Parquet files to an S3 bucket” target — it uses Iceberg catalog operations and is intended for S3 Tables table buckets.

Install

Local dev:

pip install -e .

Or with uv:

uv sync
uv run target-s3tables --version

AWS auth

By default, the target relies on the standard AWS credential chain (env vars, profiles, ECS/EC2 roles, etc.). You can optionally pass aws_access_key_id, aws_secret_access_key, and aws_session_token in config to override via environment variables.

Catalog modes

1) Glue Iceberg REST endpoint (recommended)

Uses the AWS Glue Iceberg REST endpoint (centralized governance).

Example config (config.glue.json):

{
  "catalog_mode": "glue_rest",
  "region": "us-east-1",
  "namespace": "default",
  "account_id": "123456789012",
  "table_bucket_name": "my-table-bucket",
  "sigv4_enabled": true,
  "signing_name": "glue",
  "signing_region": "us-east-1",
  "write_mode": "append",
  "batch_size_rows": 5000
}

Notes:

  • glue_uri defaults to https://glue.<region>.amazonaws.com/iceberg
  • glue_warehouse defaults to <account-id>:s3tablescatalog/<table-bucket-name>

2) S3 Tables Iceberg REST endpoint (direct)

Uses the S3 Tables Iceberg REST endpoint (direct access to a single table bucket).

Example config (config.s3tables.json):

{
  "catalog_mode": "s3tables_rest",
  "region": "us-east-1",
  "namespace": "default",
  "table_bucket_arn": "arn:aws:s3tables:us-east-1:123456789012:bucket/my-table-bucket",
  "sigv4_enabled": true,
  "signing_name": "s3tables",
  "signing_region": "us-east-1",
  "write_mode": "append",
  "batch_size_rows": 5000
}

Notes:

  • s3tables_uri defaults to https://s3tables.<region>.amazonaws.com/iceberg
  • S3 Tables direct mode supports single-level namespaces only (no foo.bar).
  • The required REST path prefix is the URL-encoded table bucket ARN (handled automatically).

Usage

Run directly:

target-s3tables --about
target-s3tables --version
tap-smoke-test | target-s3tables --config config.glue.json

Environment-variable config (loads .env in the working directory when --config=ENV is used):

tap-smoke-test | target-s3tables --config=ENV

Schema evolution

  • If create_tables=true, tables are created on first sight of a stream schema.
  • If evolve_schema=true, schema updates are applied via table.update_schema().union_by_name(...).

If you write to an existing partitioned table and append fails, the target raises a message with options (unpartitioned tables, dynamic partition overwrite for compatible cases, or another engine).

Nullability notes (maps/arrays)

  • Top-level column nullability follows Singer JSON Schema: fields listed in required and not declared nullable (e.g. "type": ["null", ...]) become required Iceberg columns; everything else becomes optional.
  • Singer object fields without explicit properties are treated as Iceberg map<string, ...> columns:
    • If additionalProperties is omitted/true/{}, values are treated as nullable strings (map<string, string?>).
    • If you specify a non-nullable value schema (e.g. {"additionalProperties": {"type": "string"}}) and the tap emits null map values, those key/value pairs are dropped to satisfy the declared schema.
  • For arrays with non-nullable items, null elements are dropped.

Troubleshooting: ArrowInvalid on maps

If you see:

pyarrow.lib.ArrowInvalid: Can't view array of type map<...> as map<...>: nulls in input cannot be viewed as non-nullable

it means your records contain nulls inside a map, but the table/schema says the map values are non-nullable. Fix by:

  • Updating to a version of target-s3tables which treats untyped object maps as nullable values (and keeping evolve_schema=true for existing tables), or
  • Adjusting the stream schema to allow null map values (e.g. {"type": ["null", "string"]} for additionalProperties), or
  • Ensuring the tap never emits null map values.

Meltano (custom loader plugin)

Add to meltano.yml:

plugins:
  loaders:
  - name: target-s3tables
    namespace: target_s3tables
    pip_url: -e .
    settings:
    - name: catalog_mode
    - name: region
    - name: namespace
    - name: account_id
    - name: table_bucket_name
    - name: table_bucket_arn
    - name: write_mode
    - name: batch_size_rows
    - name: sanitize_names
    - name: create_tables
    - name: evolve_schema
    - name: signing_name
    - name: signing_region
    - name: sigv4_enabled
    - name: table_properties
    - name: snapshot_properties
    - name: debug_http
    - name: aws_access_key_id
    - name: aws_secret_access_key
      kind: password
    - name: aws_session_token
      kind: password

Run:

meltano run <tap-name> target-s3tables

Settings reference (--about)

Copy-paste of target-s3tables --about --format=markdown:

target-s3tables

Load Singer streams into Amazon S3 Tables via PyIceberg REST catalogs.

Built with the Meltano Singer SDK.

Capabilities

  • about
  • stream-maps
  • schema-flattening
  • structured-logging
  • validate-records

Supported Python Versions

  • 3.10
  • 3.11
  • 3.12
  • 3.13
  • 3.14

Settings

Setting Required Default Description
catalog_mode False glue_rest Iceberg REST catalog mode to use (AWS Glue recommended).
region True None AWS region for the Iceberg REST endpoint (e.g. us-east-1).
namespace False default Iceberg namespace (database).
write_mode False append Write mode: append for incremental; overwrite to replace table contents.
batch_size_rows False 5000 Max rows per Iceberg commit.
batch_max_bytes False None Optional approximate byte limit for an in-memory batch.
sanitize_names False True Sanitize stream/table/column names to Iceberg/AWS-friendly identifiers.
create_tables False True Create Iceberg tables when missing.
evolve_schema False True Evolve Iceberg schema when stream schema changes.
table_name_prefix False Prefix applied to all Iceberg table names.
table_name_mapping False {} Mapping of Singer stream name -> Iceberg table name.
glue_uri False None Glue Iceberg REST endpoint URI. Defaults to https://glue..amazonaws.com/iceberg.
glue_warehouse False None Glue warehouse string: :s3tablescatalog/.
account_id False None AWS account id (used to build glue_warehouse if not provided).
table_bucket_name False None S3 Tables table bucket name (used to build glue_warehouse if not provided).
s3tables_uri False None S3 Tables Iceberg REST endpoint URI. Defaults to https://s3tables..amazonaws.com/iceberg.
table_bucket_arn False None Table bucket ARN: arn:aws:s3tables:::bucket/.
sigv4_enabled False True Enable AWS SigV4 request signing for the Iceberg REST catalog.
signing_name False None SigV4 signing name (defaults to glue or s3tables based on mode).
signing_region False None SigV4 signing region (defaults to region).
aws_access_key_id False None Optional AWS access key id override (otherwise use default AWS credential chain).
aws_secret_access_key False None Optional AWS secret access key override (otherwise use default AWS credential chain).
aws_session_token False None Optional AWS session token override.
table_properties False {} Iceberg table properties passed at create_table time.
snapshot_properties False {} Snapshot properties passed to append/overwrite calls (when supported).
debug_http False False Enable debug logging for HTTP/SigV4 interactions.
log_level False None Optional log level override for this process (e.g. DEBUG, INFO).
add_record_metadata False None Whether to add metadata fields to records.
load_method False TargetLoadMethods.APPEND_ONLY The method to use when loading data into the destination. append-only will always write all input records whether that records already exists or not. upsert will update existing records and insert new records. overwrite will delete all existing records and insert all input records.
validate_records False True Whether to validate the schema of the incoming streams.
stream_maps False None Config object for stream maps capability. For more information check out Stream Maps.
stream_maps.else False None Currently, only setting this to __NULL__ is supported. This will remove all other streams.
stream_map_config False None User-defined config values to be used within map expressions.
faker_config False None Config for the Faker instance variable fake used within map expressions. Only applicable if the plugin specifies faker as an additional dependency (through the singer-sdk faker extra or directly).
faker_config.seed False None Value to seed the Faker generator for deterministic output: https://faker.readthedocs.io/en/master/#seeding-the-generator
faker_config.locale False None One or more LCID locale strings to produce localized output for: https://faker.readthedocs.io/en/master/#localization
flattening_enabled False None 'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth False None The max depth to flatten schemas.
flattening_max_key_length False None The maximum length of a flattened key.

A full list of supported settings and capabilities is available by running: target-s3tables --about

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

target_s3tables-0.0.5.tar.gz (204.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

target_s3tables-0.0.5-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file target_s3tables-0.0.5.tar.gz.

File metadata

  • Download URL: target_s3tables-0.0.5.tar.gz
  • Upload date:
  • Size: 204.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for target_s3tables-0.0.5.tar.gz
Algorithm Hash digest
SHA256 fd055abd2987386b44b3f64a5f3967aa1cd2e315ae59eff299ead1a64678590b
MD5 98f571513c293abffa536eb8dfbf8ff0
BLAKE2b-256 78578296a9dc44a883521ee7b4f3a8c249a419151473b6bdde67c2c99e453b66

See more details on using hashes here.

Provenance

The following attestation bundles were made for target_s3tables-0.0.5.tar.gz:

Publisher: build.yml on amaingot/target-s3tables

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file target_s3tables-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for target_s3tables-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 eafbb324122ff63c364d4df5914be9d210361b4c786803be20160032474da7ae
MD5 35297eaecf8b43755882d6909a500596
BLAKE2b-256 6333e104d0ef5831261a01c8c70d785779de631fd1b2bf6ce7f12d5badabb15f

See more details on using hashes here.

Provenance

The following attestation bundles were made for target_s3tables-0.0.5-py3-none-any.whl:

Publisher: build.yml on amaingot/target-s3tables

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page