Skip to main content

Singer.io target for writing JSON Line files via webhdfs

Project description

target-jsonl-webhdfs

A Singer target that writes data to HDFS cluster in the JSONL (JSON Lines) format. This is fork of the Target-jsonl repo.

How to use it

target-jsonl works together with any other Singer Tap to move data from sources like Braintree, Freshdesk and Hubspot to JSONL formatted files.

Install

We will use tap-exchangeratesapi to pull currency exchange rate data from a public data set as an example.

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

It is recommended to install each Tap and Target in a separate Python virtual environment to avoid conflicting dependencies between any Taps and Targets.

 # Install tap-exchangeratesapi in its own virtualenv
python3 -m venv ~/.virtualenvs/tap-exchangeratesapi
source ~/.virtualenvs/tap-exchangeratesapi/bin/activate
pip install tap-exchangeratesapi
deactivate

# Install target-jsonl in its own virtualenv
python3 -m venv ~/.virtualenvs/target-jsonl
source ~/.virtualenvs/target-jsonl/bin/activate
pip install target-jsonl
deactivate

Run

We can now run tap-exchangeratesapi and pipe the output to target-jsonl.

~/.virtualenvs/tap-exchangeratesapi/bin/tap-exchangeratesapi | ~/.virtualenvs/target-jsonl/bin/target-jsonl

The data by default will be written to a file called exchange_rate-{timestamp}.jsonl in your working directory.

 cat exchange_rate-{timestamp}.jsonl
{"CAD": 1.3954067515, "HKD": 7.7503228187, "ISK": 147.1130787678, "PHP": 50.5100534957, "DKK": 6.8779745434, "HUF": 327.9376498801, "CZK": 25.018446781, "GBP": 0.8059214167, "RON": 4.4673491976, "SEK": 9.9002029146, "IDR": 15321.0016602103, "INR": 75.6516325401, "BRL": 5.4711307877, "RUB": 73.6220254566, "HRK": 6.9765725881, "JPY": 106.548607268, "THB": 32.420217672, "CHF": 0.9750046117, "EUR": 0.9223390518, "MYR": 4.3475373547, "BGN": 1.8039107176, "TRY": 6.988286294, "CNY": 7.0764619074, "NOK": 10.3973436635, "NZD": 1.6446227633, "ZAR": 18.4316546763, "USD": 1.0, "MXN": 24.1217487548, "SGD": 1.4152370411, "AUD": 1.5361556908, "ILS": 3.5102379635, "KRW": 1218.9540675152, "PLN": 4.1912931194, "date": "2020-04-29T00:00:00Z"}

Optional Configuration

target-jsonl takes an optional configuration file that can be used to set formatting parameters like the delimiter - see config.sample.json for examples. To run target-jsonl with the configuration file, use this command:

~/.virtualenvs/tap-exchangeratesapi/bin/tap-exchangeratesapi | ~/.virtualenvs/target-jsonl/bin/target-jsonl -c my-config.json

Here is a brief description of the optional config keys

destination_path - Specifies where to write the resulting .jsonl file to. By default, the file gets written in your working directory.

custom_name - Specifies a custom name for the filename, instead of the stream name (i.e. {custom_name}-{timestamp}.jsonl, asumming do_timestamp_file is true). By default, the stream name will be used.

do_timestamp_file - Specifies if the file should get timestamped. By default, the resulting file will have a timestamp in the file name (i.e. exchange_rate-{timestamp}.jsonl as described above in the Run section). If this option gets set to false, the resulting file will not have a timestamp associated with it (i.e. exchange_rate.jsonl in our example).

webhdfs - Boolean variable to enable webhdfs writing.

webhdfs_url - Specifies url for connection to the webhdfs service (i.e. http://hostname:port).

webhdfs_user - Specifies user that will be use for connect to the webhdfs service.


Copyright © 2022 Andy Huynh, Stanislav Lysikov

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

target_jsonl_webhdfs-0.1.4.4.tar.gz (16.0 kB view hashes)

Uploaded Source

Built Distribution

target_jsonl_webhdfs-0.1.4.4-py3-none-any.whl (16.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page