Singer.io target for extracting data
Project description
target-avro
This is a Singer target that reads JSON-formatted data following the Singer spec.
Features
- Output Avro files for Singer streams
- Output to cloud storages like Google Cloud Storage and Amazon S3, etc are supported powered by smart_open.
Install
pip install target-avro
Usage
# simple
cat <<EOF | target-avro -c sample_config.json
{"type":"STATE","value": {}}
{"key_properties":["id"],"schema":{"properties":{"assignee":{"properties":{},"type":["null","object"]},"created_at":{"format":"date-time","type":["null","string"]},"id":{"type":["null","integer"]},"labels":{"items":{"properties":{"id":{"type":["null","integer"]},"name":{"type":["null","string"]}},"type":"object"},"type":["null","array"]},"locked":{"type":["null","boolean"]},"pull_request":{"properties":{"url":{"type":["null","string"]}},"type":["null","object"]},"title":{"type":"string"}},"selected":true,"type":["null","object"]},"stream":"issues","type":"SCHEMA"}
{"type": "RECORD", "stream": "issues", "record": {"created_at":"2020-11-24T23:49:24.000000Z","id":12,"labels":[{"id":238,"name":"ABCDEFGHIJKLMNOPQRSTUV"}],"locked":true,"pull_request":{"url":"https://api.github.com/repos/sample/issues/pulls/999999"},"title":"ABCDEFGHIJKLMNOPQRSTUVWXY"}, "time_extracted": "2021-03-25T12:53:51.817781Z"}
{"type": "STATE", "value": {"bookmarks": {"singer-io/singer-python": {"issues": {"since": "2020-11-24T23:49:24.000000Z"}}}}}
EOF
# complex
cat ./tests/data/github.jsonl | target-avro -c sample_config.json
Configuration
The fields available to be specified in the config file are specified here.
Field | Type | Default | Details |
---|---|---|---|
prefix |
["string"] |
N/A |
The output uri prefix. See smart_open for information about valid values and credentials. |
disable_collection |
["boolean", "null"] |
false |
Include true in your config to disable Singer Usage Logging. |
logging_level |
["string", "null"] |
"INFO" |
The level for logging. Set to DEBUG to get things like HTTP requests executed, JSON and Avro schemas, etc. See Python's Logger Levels for information about valid values. |
Known Limitations
- Requires a JSON Schema for every stream.
- Only string, string with date-time format, integer, number, boolean,
object, and array types with or without null are supported. Arrays can
have any of the other types listed, including objects as types within
items.
- Example of JSON Schema types that work
['number']
['string']
['string', 'null']
- Exmaple of JSON Schema types that DO NOT work
['string', 'integer']
['integer', 'number']
['any']
['null']
- Example of JSON Schema types that work
- JSON Schema combinations such as
anyOf
andoneOf
are not supported. - JSON Schema
$ref
is not supported.
Usage Logging
Singer.io requires official taps and targets to collect anonymous usage data. This data is only used in aggregate to report on individual tap/targets, as well as the Singer community at-large. IP addresses are recorded to detect unique tap/targets users but not shared with third-parties.
To disable anonymous data collection set disable_collection to true in the configuration JSON file.
Copyright © 2021 Kageboushi
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
target-avro-0.2.0.tar.gz
(6.1 kB
view hashes)
Built Distribution
Close
Hashes for target_avro-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d1660e6015f4847644596d6a010abe31063dcacf0b80d8896115b6ad4ef901c |
|
MD5 | 02aeecbc85d513a16b5187a20a3323f6 |
|
BLAKE2b-256 | 08a058a43159540e4efff07b79a76ec82ab3d920e2d4a016d784c0a150bc6444 |