Singer tap for JsonFile, built with the Meltano Singer SDK.
Project description
tap-jsonfile
Singer tap that reads JSON files from local filesystem or S3-compatible storage (including MinIO).
Built with the Meltano Singer SDK.
Features
- Glob patterns to match files across directories (
data/**/*.json) - S3 and MinIO support via fsspec / s3fs
- Automatic schema inference by sampling the first N files
- JSON, JSON arrays, and JSONL formats detected automatically
- Incremental sync: tracks file content hashes in Singer state, skips unchanged files on subsequent runs
- Adds
_sdc_source_fileto every record for lineage
Configuration
| Setting | Required | Default | Description |
|---|---|---|---|
paths |
Yes | — | List of glob patterns for JSON files (local or s3://…) |
stream_name |
No | records |
Name of the output Singer stream |
samples |
No | 20 |
Number of files to sample for schema inference |
S3 / MinIO credentials
Set these environment variables (standard AWS naming, also accepts S3_* prefix):
| Variable | Description |
|---|---|
AWS_ACCESS_KEY_ID |
Access key (or S3_ACCESS_KEY_ID) |
AWS_SECRET_ACCESS_KEY |
Secret key (or S3_SECRET_ACCESS_KEY) |
AWS_ENDPOINT_URL |
Custom endpoint for MinIO (or S3_ENDPOINT_URL) |
Usage
Standalone
# Local files
tap-jsonfile --config '{"paths": ["data/**/*.json"]}' > output.jsonl
# S3 / MinIO (credentials via env vars)
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_ENDPOINT_URL=https://minio.example.com
tap-jsonfile --config '{"paths": ["s3://bucket/prefix/**/*.json"]}'
Incremental sync
Pass state from a previous run to skip unchanged files:
tap-jsonfile --config config.json > output.jsonl 2>/dev/null
# Extract state for next run
grep '"type":"STATE"' output.jsonl | tail -1 | python3 -c \
"import sys,json; print(json.dumps(json.loads(sys.stdin.read())['value']))" > state.json
# Next run skips files whose content hash hasn't changed
tap-jsonfile --config config.json --state state.json
With Meltano
uv tool install meltano
meltano install
meltano run tap-jsonfile target-jsonl
See meltano.yml for the default configuration (paths: ["data/**/*.json"]).
Development
Prerequisites: Python 3.10+, uv
uv sync
uv run pytest # 41 tests
uv run tap-jsonfile --about
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tap_jsonfile-0.1.0.tar.gz.
File metadata
- Download URL: tap_jsonfile-0.1.0.tar.gz
- Upload date:
- Size: 193.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaeda15352082a19fea6dd3f72ad063d8932fb1b042a31a4c9ab6372b953bef7
|
|
| MD5 |
a26546fe7ceb825381f0deb44ab541e1
|
|
| BLAKE2b-256 |
7e4d63e29370aa732da10537883e5b12f511ef4a4e49823ffa679749ad41e45b
|
Provenance
The following attestation bundles were made for tap_jsonfile-0.1.0.tar.gz:
Publisher:
build.yml on celine-eu/tap-jsonfile
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tap_jsonfile-0.1.0.tar.gz -
Subject digest:
eaeda15352082a19fea6dd3f72ad063d8932fb1b042a31a4c9ab6372b953bef7 - Sigstore transparency entry: 1592335856
- Sigstore integration time:
-
Permalink:
celine-eu/tap-jsonfile@4343f15a18f1e5583b25afe715bba70c80da368a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/celine-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@4343f15a18f1e5583b25afe715bba70c80da368a -
Trigger Event:
push
-
Statement type:
File details
Details for the file tap_jsonfile-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tap_jsonfile-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a32a4486b359a53dbd86c6d1b79b9e6decfe8e72b2fcc3cce04ec84ce41f5bcc
|
|
| MD5 |
1b3c1c3f71632d103e7d307f77d17a2c
|
|
| BLAKE2b-256 |
a64c3c2a0ab5497b75f53120b706967e36d74ff80f8a98eb47bb83c29d86e6f1
|
Provenance
The following attestation bundles were made for tap_jsonfile-0.1.0-py3-none-any.whl:
Publisher:
build.yml on celine-eu/tap-jsonfile
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tap_jsonfile-0.1.0-py3-none-any.whl -
Subject digest:
a32a4486b359a53dbd86c6d1b79b9e6decfe8e72b2fcc3cce04ec84ce41f5bcc - Sigstore transparency entry: 1592336375
- Sigstore integration time:
-
Permalink:
celine-eu/tap-jsonfile@4343f15a18f1e5583b25afe715bba70c80da368a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/celine-eu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@4343f15a18f1e5583b25afe715bba70c80da368a -
Trigger Event:
push
-
Statement type: