Skip to main content

Package for building pipeline by yaml file

Project description

DataSika :deer:

Have you gotten stuck in data muds and needed to write pile of codes to handle just a tiny data type problems? Or have you bumped into some situations that you have to check out your spaghetti codes :spaghetti: for a long time to see what's wrong when cleaning your data? Don't worry! :relieved: Here comes DataSika :deer: for you! DataSika is a simple python package that allows you to produce your own data pipeline locally by writing some basic standard yaml syntaxs. You can do webscrapping, api-requesting based on our useful functions. Also, we provide some filter availabilities for you if you want to filter out some content by xpath (for html responses), jsonpath (for json responses) and sql (for manipulating dataframes). Can't wait to try? Just install it as soon as possible and test it with examples we provided! :satisfied: :sparkles:

Compatibility of python

  • python version > 3.7

PyPI Package

Environment SetUp

Clone this project (If you wanna run some examples!)

  • Using command: git clone git@github.com:rainyjonne/DataSika.git
  • Manually download: clicking Download ZIP file from the green code button

Install Python

Upgrade your pip if its version not new enough

  • Install pip
    • macOS: python -m ensurepip --upgrade
    • WSL, Linux: python -m ensurepip --upgrade

Install our pacakge

  • Just execute this command: pip install DataSika, then you can happily use this command with your yaml files! :tada: :confetti_ball:
  • Sika Command usage:
usage: sika [-h] [--input INPUT] [--output OUTPUT] [--rerun]

Build a simple pipeline by a yaml file

optional arguments:
  -h, --help       show this help message and exit
  --input INPUT    put in an input yaml file path
  --output OUTPUT  put a path for your output db
  --rerun          rerun the whole pipeline again, delete all data tables in your db file

Running examples

  • Making this package's command line tool works: python setup.py install
  • Running our four examples:
    • Using command line tools:
      1. (ETL) Getting Ruby Gem Details Example: sika --input examples/repominer.yaml
      2. (ETL) Airbnb UK Hostings + UK Crime Data Example: sika --input examples/airbnb-uk-crime.yaml
      3. (EL) Getting Ruby Gem Lists Example: sika --input examples/repominer-el.yaml
      4. (EL) Airbnb Japan Hostings Example: sika --input examples/airbnb-tokyo.yaml
    • Using python scripts:
      1. (ETL) Getting Ruby Gem Details Example: python sika/main.py --input examples/repominer.yaml
      2. (ETL) Airbnb UK Hostings + UK Crime Data Example: python sika/main.py --input examples/airbnb-uk-crime.yaml
      3. (EL) Getting Ruby Gem Lists Example: python sika/main.py --input examples/repominer-el.yaml
      4. (EL) Airbnb Japan Hostings Example: python sika/main.py --input examples/airbnb-tokyo.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataSika-1.0.3.tar.gz (20.6 kB view hashes)

Uploaded Source

Built Distributions

DataSika-1.0.3-py3.9.egg (66.5 kB view hashes)

Uploaded Source

DataSika-1.0.3-py3.8.egg (66.6 kB view hashes)

Uploaded Source

DataSika-1.0.3-py3.7.egg (66.7 kB view hashes)

Uploaded Source

DataSika-1.0.3-py3-none-any.whl (31.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page