Skip to main content

An agent that can be installed inside a firewall or VPN and used to push data to Datateer

Project description

Datateer upload-agent

This is a command-line tool for uploading data into your Datateer data lake.

The upload agent pushes files into an AWS S3 bucket, where the files are picked up for ingestion and further processing

Quick start

Ensure you have python and pip installed, then follow these steps:

  1. Install with pip install datateer-upload-agent
  2. Do one-time agent configuration with datateer config upload-agent
  3. Do one-time feed configuration with datateer config feed
  4. Upload data with datateer upload <feed_key> <path>

Concepts

All data in the data lake has the following metadata:

  • A provider is an organization that is providing data. This could be your organization if you are pushing data from an internal database or application
  • A source is the system or application that is providing data. A provider can provide data from one or more systems
  • A feed is an independent data feed. A source can provide one or more feeds. For example, if the source is a database, each feed could represent a single table or view. If the source is an API, each feed could represent a single entity.
  • A file is a data file like a CSV file. It is a point-in-time extraction of a feed, and it is what you upload using the agent.

Commands

Uploading

Upload a file

datateer upload orders_feed ./my_exported_data/orders.csv will upload the file at ./my_exported_data/orders.csv using the feed key orders_feed

Configuring

Configure the upload agent

datateer config upload-agent will ask you a series of questions to configure your agent

Datateer client code:
Raw bucket name:
Access key:
Access secret:

If you need to reconfigure the agent, just rerun datateer config upload-agent

Configure a new feed

datateer config feed will ask a series of questions to configure a new feed

Provider: xyz
Data Source: internal_app1
Feed: orders
Feed key [orders]: orders_feed

Reconfigure an existing feed

datateer config feed --update orders_feed will rerun the configuration questions for the feed with the key orders_feed

Show config

datateer config upload-agent --show will show you your existing configuration

client-code: xyz
raw-bucket: xyz-pipeline-raw-202012331213123432341213
access-key: ABC***
access-secret: 123***
feeds: 3
1) Feed "customers" will upload to xyz/internal_app1/customers/
2) Feed "orders_feed" will upload to xyz/internal_app1/orders/
3) Feed "leads" will upload to salesforce/salesforce/leads
Feed "abc" will upload to provider/source/feed

Data File Requirements

  • The data lake supports CSV, TSV, and JSONL files
  • The first row of the data file must contain header names
  • Adding new data fields or removing data fields are both supported
  • You should strive to be consistent with your header names over time. The data lake can handle changes, but it will likely confuse anyone using the feeds

Configuration - detailed info

Configuration can be handled completely through the datateer config commands. If you need more details, this section provides more details on how configuration works and where it is stored.

Location

Here is where the Datateer upload agent will look for configuration information, in order of preference:

  1. In a relative directory named .datateer, in a file named config.yml.
  2. In the future, we may add global configuration in the user's home directory or in environment variables

Schema

An example configuration file will look like this:

client-code: xyz
upload-agent:
  raw-bucket: xyz-pipeline-raw-202012331213123432341213
  access-key: ABC***
  access-secret: 123***
  feeds:
    customers:
      provider: xyz
      source: internal_app1
      feed: customers
    orders_feed:
      provider: xyz
      source: internal_app1
      feed: orders
    leads:
      provider: salesforce
      source: salesforce
      feed: leads

Development

To develop in this repo:

  1. Install poetry and activate shell with poetry shell
  2. Run poetry install
  3. To test run pytest or ptw
  4. To run locally, install with pip install -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datateer_upload_agent-0.5.1.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

datateer_upload_agent-0.5.1-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file datateer_upload_agent-0.5.1.tar.gz.

File metadata

  • Download URL: datateer_upload_agent-0.5.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for datateer_upload_agent-0.5.1.tar.gz
Algorithm Hash digest
SHA256 2dd51078dba8bff05a065590ad9e2649ca3b9cb0134ce081415bccbc67322d17
MD5 9bb89557705d59f46b182af321175fba
BLAKE2b-256 277f2b8c38eb2f6f912e285c615056cfe9975f7d9d160b76f07e890a4c70a3b5

See more details on using hashes here.

File details

Details for the file datateer_upload_agent-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for datateer_upload_agent-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5804687b579165b2c7e9de743212b0dafa5b528d3a895969fc594170986f8e50
MD5 aea081ae54f59444f29b5c1b859e0cb3
BLAKE2b-256 322a89784b9f3f62e57cc6d356a2b79f54bbff9bd23c28999e73a92d4a7e445b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page