No project description provided
Project description
dabapush
Database pusher for social media data (Twitter for the beginning) – pre-alpha version
Using dabapush
dabapush
is a tool to read longer running data collections and write them to another file format or persist them into a database. It is designed to run periodically, e.g. controlled by chron, thus, for convenience ot use project-based configurations which contain all required information on what to read where and what to do with it.
A project may have one or more jobs, each job consists of a reader and a writer configuration, e.g. read JSON-files from the Twitter API that we stored in folder /home/user/fancy-project/twitter/
and write the flattened and compiled data set in to /some/where/else
as CSV files.
First steps
In order to run a first dabapush
-job we'll need to create a project configuration. This is done by calling:
dabapush create
By default this walks you through the configuration process in a step-by-step manner. Alternatively, you could call:
dabapush create --non-interactive
This will create an empty configuration, you'll have to fill out the required information by e.g. calling:
dabapush reader add NDJSON default
dabapush writer add CSV default
Whereas reader add
/writer add
is the verb, NDJSON
or CSV
is the plugin to add and default
is the pipeline name.
Of course you can edit the configration after creation in your favorite editor, but BEWARE NOT TO TEMPER WITH THE YAMl-TAGS!!!.
To run the newly configured job, please call:
dabapush run default
Command Reference
Invocation Pattern
dabapush <command> <subcommand?> <options>
Commands
create
-- creates a dabapush project (invokes interactive prompt)
Options:
--non-interactive
, create an empty configuration and exit
--interactive
, this is the default behavior: prompts for user input on
- project name,
- project authors name,
- project author email address(es) for notifications
- manually configure targets or run discover?
run all
-- collect all known items and execute targets/destinations
run <target>
-- run a single writer and/or named target
Options:
--force-rerun, -r
: forces all data to be read, ignores already logged data
reader
-- interact with readers
reader configure <name>
-- configure the reader for one or more subproject(s); Reader configuration is inherited from global to local level; throws if configuration is incomplete and defaults are missing
reader list
: returns a table of all configured readers, with <path> <target> <class> <id>
reader list_all
: returns a table of all registered reader plugins
reader add <type> <name>
: add a reader to the project configuration
Options:
--input-directory <path>
: directory to be read
--pattern <pattern>
: pattern for matching file names against.
remove <name>
: remove a reader from the project configuration.
register <path>
: not there yet
discover
-- discover (possible) targets in project directory and configure them automagically -- yeah, you dream of that, don't you?
writer
-- interact with writers
writer add <type> <name>
:
writer remove <name>
: removes the writer for the given name
writer list
-- returns table of all writers, with <path> <subproject-name> <class> <id>
writer list_all
: returns a table of all registered writer plugins
writer configure <name>
or writer configure all
Options:
--output-dir, -o <path>
: default for all targets: <project-dir>/output/<target-name>
--output-pattern, -p <pattern>
: pattern used for file name creation e.g. 'YYYY-MM-dd', file extension is added by the writer and cannot be overwritten
--roll-over, -r ``<file-size>
:
--roll-over, -r
<lines>
:
--roll-over -r <None>
: should be the output chunked? Give either a file-size or a number of lines for roll-over or None to disable chunking
Extending dabapush and developers guide
Dabapush's reader and writer plug-ins are registered via entry point: dabapush_readers
for readers and dabapush_writers
for writers. Both expect Configuration
-subclass.
Developer Installation
- Install poetry
- Clone repository
- In the cloned repository's root directory run
poetry install
- Run
poetry shell
to start development virtualenv - Run
dabapush create
to create your first project. - Run
pytest
to run all tests
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dabapush-0.3.3.tar.gz
.
File metadata
- Download URL: dabapush-0.3.3.tar.gz
- Upload date:
- Size: 489.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.19.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87d077d0ccf5829d31e77ca81b13c4f76782594b1a16ac1799ab7ee45dc77be6 |
|
MD5 | 7db53a027d3176c02142d7c355f38b6c |
|
BLAKE2b-256 | 93f2bf0655081dad8d0372fc9df6a2b3b021593819e50aa7ce9d936a1d7bbe77 |
File details
Details for the file dabapush-0.3.3-py3-none-any.whl
.
File metadata
- Download URL: dabapush-0.3.3-py3-none-any.whl
- Upload date:
- Size: 491.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.7 Linux/5.19.0-32-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af6b6278d51f7272e607d8fd33d4a3c7bc8e765b09e7674eac0300f493e58fe5 |
|
MD5 | c425ae195ce445f3edb711c575b1c472 |
|
BLAKE2b-256 | 9bab4ca9954c3e890548f2f8d11be6c96fafc5b1f9d8eec2f1a312fc00617f87 |