Skip to main content

Singer.io target for writing JSON Line files via webhdfs

Project description

target-jsonl-webhdfs

A Singer target that writes data to HDFS cluster in the JSONL (JSON Lines) format. This is fork of the Target-jsonl repo.

How to use it

target-jsonl works together with any other Singer Tap to move data from sources like Braintree, Freshdesk and Hubspot to JSONL formatted files.

Install

We will use tap-exchangeratesapi to pull currency exchange rate data from a public data set as an example.

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It is recommended to install each Tap and Target in a separate Python virtual environment to avoid conflicting dependencies between any Taps and Targets.

 # Install tap-exchangeratesapi in its own virtualenv
python3 -m venv ~/.virtualenvs/tap-exchangeratesapi
source ~/.virtualenvs/tap-exchangeratesapi/bin/activate
pip install tap-exchangeratesapi
deactivate

# Install target-jsonl in its own virtualenv
python3 -m venv ~/.virtualenvs/target-jsonl
source ~/.virtualenvs/target-jsonl/bin/activate
pip install target-jsonl
deactivate

Run

We can now run tap-exchangeratesapi and pipe the output to target-jsonl.

~/.virtualenvs/tap-exchangeratesapi/bin/tap-exchangeratesapi | ~/.virtualenvs/target-jsonl/bin/target-jsonl

The data by default will be written to a file called exchange_rate-{timestamp}.jsonl in your working directory.

 cat exchange_rate-{timestamp}.jsonl
{"CAD": 1.3954067515, "HKD": 7.7503228187, "ISK": 147.1130787678, "PHP": 50.5100534957, "DKK": 6.8779745434, "HUF": 327.9376498801, "CZK": 25.018446781, "GBP": 0.8059214167, "RON": 4.4673491976, "SEK": 9.9002029146, "IDR": 15321.0016602103, "INR": 75.6516325401, "BRL": 5.4711307877, "RUB": 73.6220254566, "HRK": 6.9765725881, "JPY": 106.548607268, "THB": 32.420217672, "CHF": 0.9750046117, "EUR": 0.9223390518, "MYR": 4.3475373547, "BGN": 1.8039107176, "TRY": 6.988286294, "CNY": 7.0764619074, "NOK": 10.3973436635, "NZD": 1.6446227633, "ZAR": 18.4316546763, "USD": 1.0, "MXN": 24.1217487548, "SGD": 1.4152370411, "AUD": 1.5361556908, "ILS": 3.5102379635, "KRW": 1218.9540675152, "PLN": 4.1912931194, "date": "2020-04-29T00:00:00Z"}

Optional Configuration

target-jsonl takes an optional configuration file that can be used to set formatting parameters like the delimiter - see config.sample.json for examples. To run target-jsonl with the configuration file, use this command:

~/.virtualenvs/tap-exchangeratesapi/bin/tap-exchangeratesapi | ~/.virtualenvs/target-jsonl/bin/target-jsonl -c my-config.json

Here is a brief description of the optional config keys

destination_path - Specifies where to write the resulting .jsonl file to. By default, the file gets written in your working directory.

custom_name - Specifies a custom name for the filename, instead of the stream name (i.e. {custom_name}-{timestamp}.jsonl, asumming do_timestamp_file is true). By default, the stream name will be used.

do_timestamp_file - Specifies if the file should get timestamped. By default, the resulting file will have a timestamp in the file name (i.e. exchange_rate-{timestamp}.jsonl as described above in the Run section). If this option gets set to false, the resulting file will not have a timestamp associated with it (i.e. exchange_rate.jsonl in our example).

webhdfs - Boolean variable to enable webhdfs writing.

webhdfs_url - Specifies url for connection to the webhdfs service (i.e. http://hostname:port).

webhdfs_user - Specifies user that will be use for connect to the webhdfs service.


Copyright © 2022 Andy Huynh, Stanislav Lysikov

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

target_jsonl_webhdfs-0.1.4.4.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

target_jsonl_webhdfs-0.1.4.4-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file target_jsonl_webhdfs-0.1.4.4.tar.gz.

File metadata

  • Download URL: target_jsonl_webhdfs-0.1.4.4.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.6 Darwin/20.5.0

File hashes

Hashes for target_jsonl_webhdfs-0.1.4.4.tar.gz
Algorithm Hash digest
SHA256 d32f1dca4cc0ab42cd37674aaa99f4d0453d86268d236edbbe83843f0ef0e46b
MD5 e1799a30183ee555dfb1d8e0c11aac98
BLAKE2b-256 5c735fceb8b08eb9c8325c53164e468a93151cdc284ec47cbe5e06801c6a7ff8

See more details on using hashes here.

File details

Details for the file target_jsonl_webhdfs-0.1.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for target_jsonl_webhdfs-0.1.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a7b1423c3002702369e05af47d5bfc504054ab2f5d151449f1562fdcff81ee18
MD5 946bd2d5d9cd326a001ab9e4ca48ab4e
BLAKE2b-256 68c3bfea299c9f8e4af77432fecc753fd1d0ad777c4cf009ef94a4ed182b66c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page