Skip to main content

Package for building pipeline by yaml file

Project description

DataSika :deer:

Have you gotten stuck in data muds and needed to write pile of codes to handle just a tiny data type problems? Or have you bumped into some situations that you have to check out your spaghetti codes :spaghetti: for a long time to see what's wrong when cleaning your data? Don't worry! :relieved: Here comes DataSika :deer: for you! DataSika is a simple python package that allows you to produce your own data pipeline locally by writing some basic standard yaml syntaxs. You can do webscrapping, api-requesting based on our useful functions. Also, we provide some filter availabilities for you if you want to filter out some content by xpath (for html responses), jsonpath (for json responses) and sql (for manipulating dataframes). Can't wait to try? Just install it as soon as possible and test it with examples we provided! :satisfied: :sparkles:

Compatibility of python

  • python version > 3.7

PyPI Package

Environment SetUp

Clone this project (If you wanna run some examples!)

  • Using command: git clone git@github.com:rainyjonne/DataSika.git
  • Manually download: clicking Download ZIP file from the green code button

Install Python

Upgrade your pip if its version not new enough

  • Install pip
    • macOS: python -m ensurepip --upgrade
    • WSL, Linux: python -m ensurepip --upgrade

Install our pacakge

  • Just execute this command: pip install DataSika, then you can happily use this command with your yaml files! :tada: :confetti_ball:
  • Sika Command usage:
usage: sika [-h] [--input INPUT] [--output OUTPUT] [--rerun]

Build a simple pipeline by a yaml file

optional arguments:
  -h, --help       show this help message and exit
  --input INPUT    put in an input yaml file path
  --output OUTPUT  put a path for your output db
  --rerun          rerun the whole pipeline again, delete all data tables in your db file

Running examples

  • Making this package's command line tool works: python setup.py install
  • Running our four examples:
    • Using command line tools:
      1. (ETL) Getting Ruby Gem Details Example: sika --input examples/repominer.yaml
      2. (ETL) Airbnb UK Hostings + UK Crime Data Example: sika --input examples/airbnb-uk-crime.yaml
      3. (EL) Getting Ruby Gem Lists Example: sika --input examples/repominer-el.yaml
      4. (EL) Airbnb Japan Hostings Example: sika --input examples/airbnb-tokyo.yaml
    • Using python scripts:
      1. (ETL) Getting Ruby Gem Details Example: python sika/main.py --input examples/repominer.yaml
      2. (ETL) Airbnb UK Hostings + UK Crime Data Example: python sika/main.py --input examples/airbnb-uk-crime.yaml
      3. (EL) Getting Ruby Gem Lists Example: python sika/main.py --input examples/repominer-el.yaml
      4. (EL) Airbnb Japan Hostings Example: python sika/main.py --input examples/airbnb-tokyo.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataSika-1.0.3.tar.gz (20.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

DataSika-1.0.3-py3.9.egg (66.5 kB view details)

Uploaded Egg

DataSika-1.0.3-py3.8.egg (66.6 kB view details)

Uploaded Egg

DataSika-1.0.3-py3.7.egg (66.7 kB view details)

Uploaded Egg

DataSika-1.0.3-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file DataSika-1.0.3.tar.gz.

File metadata

  • Download URL: DataSika-1.0.3.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.11

File hashes

Hashes for DataSika-1.0.3.tar.gz
Algorithm Hash digest
SHA256 432b5d7df4196c74baa3aa47452dd3891520ba561e2b20e90081e02743437a83
MD5 483c6715a1836d0b3481f7b9778e0456
BLAKE2b-256 e045a0ba4cbc8bb0ba3680d2e0e3c9558b110542e5a1fab620d590c1294c1368

See more details on using hashes here.

File details

Details for the file DataSika-1.0.3-py3.9.egg.

File metadata

  • Download URL: DataSika-1.0.3-py3.9.egg
  • Upload date:
  • Size: 66.5 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.11

File hashes

Hashes for DataSika-1.0.3-py3.9.egg
Algorithm Hash digest
SHA256 aaee81da904ae10be4a018fd283650a5a074f175903c0f30e46d64c87803ecae
MD5 6aaa7b76588f0e324a08f2722cad9f4f
BLAKE2b-256 b2a1e4303829aa319b35659afabddf67253cd00b30df78b91f31283a7240d598

See more details on using hashes here.

File details

Details for the file DataSika-1.0.3-py3.8.egg.

File metadata

  • Download URL: DataSika-1.0.3-py3.8.egg
  • Upload date:
  • Size: 66.6 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.11

File hashes

Hashes for DataSika-1.0.3-py3.8.egg
Algorithm Hash digest
SHA256 c038f17387c366c6f04d8c99c8b5ac37aeceaf6b1f7f2491de16c1c442a22e85
MD5 2e2b741b8341296bdef39dcf3344b32d
BLAKE2b-256 f84976dc832d349826117b28f1318d508606a6299aa5b66d6b681858430b8e33

See more details on using hashes here.

File details

Details for the file DataSika-1.0.3-py3.7.egg.

File metadata

  • Download URL: DataSika-1.0.3-py3.7.egg
  • Upload date:
  • Size: 66.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.11

File hashes

Hashes for DataSika-1.0.3-py3.7.egg
Algorithm Hash digest
SHA256 38928b3419ee24fc44f68e4952268be51f434e1cb2902fc5dadf726d7e973d98
MD5 1f21dc9641853bf9c75edf41140e7d36
BLAKE2b-256 29c21aa68bdbabb611a7b6debfce731d2f072e31b1b863a3d816247ce5ef5fb2

See more details on using hashes here.

File details

Details for the file DataSika-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: DataSika-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.11

File hashes

Hashes for DataSika-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 60e18e0332ad6a592563a3a05c68893d2fd7e422f783a99b1770bf4f19928195
MD5 ff3ce0d7375bdc7b22bff5e03f78b4a2
BLAKE2b-256 8bf68197ec43f1224540616a820f4f8521a022100a73deb26e4122877ea7a6c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page