Skip to main content

Integrate DataFlows with shell scripts

Project description

# DataFlows Shell

DataFlows Shell enhances [DataFlows](https://github.com/datahq/dataflows) with shell integration.

## Introduction

A lot of the work on the shell, especially for "DevOps" / automation type work, deals with data processing.
The first command a shell user learns is `ls` - which returns a set of data.
The second might by `grep` or `cp` - which filters and performs actions based on this data set.

DataFlows Shell brings the power of the [DataFlows]() data processing framework to the shell.

DataFlows Shell acts as a very minimal and intuitive layer between the shell, the DataFlows framework, and the [Frictionless Data Ecosystem](https://frictionlessdata.io/).

## Quickstart

The only required dependencies are Python3 and Bash

Install the dataflows-shell package

```
$ python3 -m pip install -U dataflows-shell
```

Import required dfs processors to the current shell

```
$ source <(dfs import printer filter_rows kubectl)
```

Run a processor chain to get a list of pods with a specified condition:

```
$ kubectl get pods -c -q \
| dfs 'lambda row: row.update(is_ckan="ckan" in str(row["volumes"]))' --fields=+is_ckan:boolean -q
| filter_rows --args='[[{"is_ckan":true}]]' -q
| printer --kwargs='{"fields":["kind","name","namespace"]}'
```

```
{'count_of_rows': 12, 'bytes': 57584, 'hash': '5febe0c3cfe75d174e242f290f00c289', 'dataset_name': None}
checkpoint:1
{'count_of_rows': 12, 'bytes': 57876, 'hash': '17f446a8f562f10cccc1de1a33c48d91', 'dataset_name': None}
checkpoint:2
{'count_of_rows': 6, 'bytes': 40797, 'hash': '6ab4290efd82478b1677d1f226c4199a', 'dataset_name': None}
checkpoint:3
saving checkpoint to: .dfs-checkpoints/__9
using checkpoint data from .dfs-checkpoints/__8
res_1:
# kind name namespace
(string) (string) (string)
--- ---------- ---------------------------- -----------
1 Pod ckan-5d74747649-92z9x odata-blue
2 Pod ckan-5d74747649-fzvd6 odata-blue
3 Pod ckan-jobs-5d895695cf-wgrzr odata-blue
4 Pod datastore-db-944bfbc74-2nc7b odata-blue
5 Pod db-7dd99b8547-vpf57 odata-blue
6 Pod pipelines-9f4466db-vlzm8 odata-blue
checkpoint saved: __9
{'count_of_rows': 6, 'bytes': 40798, 'hash': 'adc31744dfc99a0d8cbe7b081f31d78b', 'dataset_name': None}
checkpoint:9
```

## Documentation

* [DataFlows Shell Tutorial](TUTORIAL.md)
* [DataFlows Shell Reference](REFERENCE.md)
* [DataFlows Shell Processors Reference](processors/README.md)
* [DataFlows Processors Reference](https://github.com/datahq/dataflows/blob/master/PROCESSORS.md)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataflows_shell-0.0.2.tar.gz (7.8 kB view details)

Uploaded Source

File details

Details for the file dataflows_shell-0.0.2.tar.gz.

File metadata

  • Download URL: dataflows_shell-0.0.2.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.20.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.3

File hashes

Hashes for dataflows_shell-0.0.2.tar.gz
Algorithm Hash digest
SHA256 aff8198e02e710dd28cad6a5e3f06f2cee58ab7cf1b603cbe0cf841f390b1cf0
MD5 34d6c0a12aecacb1c6941f98e8970181
BLAKE2b-256 fcea90c8211d1c7fc4bad00d4370050a803e2ba09fa5c526987f6ccf08850f86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page