Skip to main content

No project description provided

Project description

dabapush

Database pusher for social media data (Twitter for the beginning) – pre-alpha version

Using dabapush

dabapush is a tool to read longer running data collections and write them to another file format or persist them into a database. It is designed to run periodically, e.g. controlled by chron, thus, for convenience ot use project-based configurations which contain all required information on what to read where and what to do with it. A project may have one or more jobs, each job consists of a reader and a writer configuration, e.g. read JSON-files from the Twitter API that we stored in folder /home/user/fancy-project/twitter/ and write the flattened and compiled data set in to /some/where/else as CSV files.

First steps

In order to run a first dabapush-job we'll need to create a project configuration. This is done by calling:

dabapush create

By default this walks you through the configuration process in a step-by-step manner. Alternatively, you could call:

dabapush create --non-interactive

This will create an empty configuration, you'll have to fill out the required information by e.g. calling:

dabapush reader add NDJSON default
dabapush writer add CSV default

Whereas reader add/writer add is the verb, NDJSON or CSV is the plugin to add and default is the pipeline name.

Of course you can edit the configration after creation in your favorite editor, but BEWARE NOT TO TEMPER WITH THE YAMl-TAGS!!!.

To run the newly configured job, please call:

dabapush run default

Command Reference

Invocation Pattern

dabapush <command> <subcommand?> <options>

Commands

create -- creates a dabapush project (invokes interactive prompt)

Options:

--non-interactive, create an empty configuration and exit

--interactive, this is the default behavior: prompts for user input on

  • project name,
  • project authors name,
  • project author email address(es) for notifications
  • manually configure targets or run discover?

run all -- collect all known items and execute targets/destinations

run <target> -- run a single writer and/or named target

Options:

--force-rerun, -r: forces all data to be read, ignores already logged data


reader -- interact with readers

reader configure <name> -- configure the reader for one or more subproject(s); Reader configuration is inherited from global to local level; throws if configuration is incomplete and defaults are missing

reader list: returns a table of all configured readers, with <path> <target> <class> <id>

reader list_all: returns a table of all registered reader plugins

reader add <type> <name>: add a reader to the project configuration

Options:

--input-directory <path>: directory to be read

--pattern <pattern>: pattern for matching file names against.

remove <name>: remove a reader from the project configuration.

register <path>: not there yet


discover -- discover (possible) targets in project directory and configure them automagically -- yeah, you dream of that, don't you?


writer -- interact with writers

writer add <type> <name>:

writer remove <name>: removes the writer for the given name

writer list -- returns table of all writers, with <path> <subproject-name> <class> <id>

writer list_all: returns a table of all registered writer plugins

writer configure <name> or writer configure all

Options:

--output-dir, -o <path>: default for all targets: <project-dir>/output/<target-name>

--output-pattern, -p <pattern>: pattern used for file name creation e.g. 'YYYY-MM-dd', file extension is added by the writer and cannot be overwritten

--roll-over, -r ``<file-size>:

--roll-over, -r <lines>:

--roll-over -r <None>: should be the output chunked? Give either a file-size or a number of lines for roll-over or None to disable chunking

Extending dabapush and developers guide

Dabapush's reader and writer plug-ins are registered via entry point: dabapush_readers for readers and dabapush_writers for writers. Both expect Configuration-subclass.

Developer Installation

  1. Install poetry
  2. Clone repository
  3. In the cloned repository's root directory run poetry install
  4. Run poetry shell to start development virtualenv
  5. Run dabapush create to create your first project.
  6. Run pytest to run all tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dabapush-0.3.3.tar.gz (489.9 kB view details)

Uploaded Source

Built Distribution

dabapush-0.3.3-py3-none-any.whl (491.2 kB view details)

Uploaded Python 3

File details

Details for the file dabapush-0.3.3.tar.gz.

File metadata

  • Download URL: dabapush-0.3.3.tar.gz
  • Upload date:
  • Size: 489.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.19.0-32-generic

File hashes

Hashes for dabapush-0.3.3.tar.gz
Algorithm Hash digest
SHA256 87d077d0ccf5829d31e77ca81b13c4f76782594b1a16ac1799ab7ee45dc77be6
MD5 7db53a027d3176c02142d7c355f38b6c
BLAKE2b-256 93f2bf0655081dad8d0372fc9df6a2b3b021593819e50aa7ce9d936a1d7bbe77

See more details on using hashes here.

File details

Details for the file dabapush-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: dabapush-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 491.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.19.0-32-generic

File hashes

Hashes for dabapush-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 af6b6278d51f7272e607d8fd33d4a3c7bc8e765b09e7674eac0300f493e58fe5
MD5 c425ae195ce445f3edb711c575b1c472
BLAKE2b-256 9bab4ca9954c3e890548f2f8d11be6c96fafc5b1f9d8eec2f1a312fc00617f87

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page