Skip to main content

Singer tap for JsonFile, built with the Meltano Singer SDK.

Project description

tap-jsonfile

Singer tap that reads JSON files from local filesystem or S3-compatible storage (including MinIO).

Built with the Meltano Singer SDK.

Features

  • Glob patterns to match files across directories (data/**/*.json)
  • S3 and MinIO support via fsspec / s3fs
  • Automatic schema inference by sampling the first N files
  • JSON, JSON arrays, and JSONL formats detected automatically
  • Incremental sync: tracks file content hashes in Singer state, skips unchanged files on subsequent runs
  • Adds _sdc_source_file to every record for lineage

Configuration

Setting Required Default Description
paths Yes List of glob patterns for JSON files (local or s3://…)
stream_name No records Name of the output Singer stream
samples No 20 Number of files to sample for schema inference

S3 / MinIO credentials

Set these environment variables (standard AWS naming, also accepts S3_* prefix):

Variable Description
AWS_ACCESS_KEY_ID Access key (or S3_ACCESS_KEY_ID)
AWS_SECRET_ACCESS_KEY Secret key (or S3_SECRET_ACCESS_KEY)
AWS_ENDPOINT_URL Custom endpoint for MinIO (or S3_ENDPOINT_URL)

Usage

Standalone

# Local files
tap-jsonfile --config '{"paths": ["data/**/*.json"]}' > output.jsonl

# S3 / MinIO (credentials via env vars)
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_ENDPOINT_URL=https://minio.example.com
tap-jsonfile --config '{"paths": ["s3://bucket/prefix/**/*.json"]}'

Incremental sync

Pass state from a previous run to skip unchanged files:

tap-jsonfile --config config.json > output.jsonl 2>/dev/null

# Extract state for next run
grep '"type":"STATE"' output.jsonl | tail -1 | python3 -c \
  "import sys,json; print(json.dumps(json.loads(sys.stdin.read())['value']))" > state.json

# Next run skips files whose content hash hasn't changed
tap-jsonfile --config config.json --state state.json

With Meltano

uv tool install meltano
meltano install
meltano run tap-jsonfile target-jsonl

See meltano.yml for the default configuration (paths: ["data/**/*.json"]).

Development

Prerequisites: Python 3.10+, uv

uv sync
uv run pytest          # 41 tests
uv run tap-jsonfile --about

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tap_jsonfile-0.1.0.tar.gz (193.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tap_jsonfile-0.1.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file tap_jsonfile-0.1.0.tar.gz.

File metadata

  • Download URL: tap_jsonfile-0.1.0.tar.gz
  • Upload date:
  • Size: 193.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for tap_jsonfile-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eaeda15352082a19fea6dd3f72ad063d8932fb1b042a31a4c9ab6372b953bef7
MD5 a26546fe7ceb825381f0deb44ab541e1
BLAKE2b-256 7e4d63e29370aa732da10537883e5b12f511ef4a4e49823ffa679749ad41e45b

See more details on using hashes here.

Provenance

The following attestation bundles were made for tap_jsonfile-0.1.0.tar.gz:

Publisher: build.yml on celine-eu/tap-jsonfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tap_jsonfile-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tap_jsonfile-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for tap_jsonfile-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a32a4486b359a53dbd86c6d1b79b9e6decfe8e72b2fcc3cce04ec84ce41f5bcc
MD5 1b3c1c3f71632d103e7d307f77d17a2c
BLAKE2b-256 a64c3c2a0ab5497b75f53120b706967e36d74ff80f8a98eb47bb83c29d86e6f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for tap_jsonfile-0.1.0-py3-none-any.whl:

Publisher: build.yml on celine-eu/tap-jsonfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page