Package for building pipeline by yaml file
Project description
DataSika :deer:
Have you gotten stuck in data muds and needed to write pile of codes to handle just a tiny data type problems? Or have you bumped into some situations that you have to check out your spaghetti codes :spaghetti: for a long time to see what's wrong when cleaning your data? Don't worry! :relieved: Here comes DataSika :deer: for you! DataSika is a simple python package that allows you to produce your own data pipeline locally by writing some basic standard yaml syntaxs. You can do webscrapping, api-requesting based on our useful functions. Also, we provide some filter availabilities for you if you want to filter out some content by xpath (for html responses), jsonpath (for json responses) and sql (for manipulating dataframes). Can't wait to try? Just install it as soon as possible and test it with examples we provided! :satisfied: :sparkles:
Compatibility of python
- python version >
3.7
PyPI Package
Environment SetUp
Clone this project (If you wanna run some examples!)
- Using command:
git clone git@github.com:rainyjonne/DataSika.git - Manually download: clicking
Download ZIP filefrom the green code button
Install Python
Upgrade your pip if its version not new enough
- Install
pip- macOS:
python -m ensurepip --upgrade - WSL, Linux:
python -m ensurepip --upgrade
- macOS:
Install our pacakge
- Just execute this command:
pip install DataSika, then you can happily use this command with your yaml files! :tada: :confetti_ball: - Sika Command usage:
usage: sika [-h] [--input INPUT] [--output OUTPUT] [--rerun]
Build a simple pipeline by a yaml file
optional arguments:
-h, --help show this help message and exit
--input INPUT put in an input yaml file path
--output OUTPUT put a path for your output db
--rerun rerun the whole pipeline again, delete all data tables in your db file
Running examples
- Making this package's command line tool works:
python setup.py install - Running our four examples:
- Using command line tools:
- (ETL) Getting Ruby Gem Details Example:
sika --input examples/repominer.yaml - (ETL) Airbnb UK Hostings + UK Crime Data Example:
sika --input examples/airbnb-uk-crime.yaml - (EL) Getting Ruby Gem Lists Example:
sika --input examples/repominer-el.yaml - (EL) Airbnb Japan Hostings Example:
sika --input examples/airbnb-tokyo.yaml
- (ETL) Getting Ruby Gem Details Example:
- Using python scripts:
- (ETL) Getting Ruby Gem Details Example:
python sika/main.py --input examples/repominer.yaml - (ETL) Airbnb UK Hostings + UK Crime Data Example:
python sika/main.py --input examples/airbnb-uk-crime.yaml - (EL) Getting Ruby Gem Lists Example:
python sika/main.py --input examples/repominer-el.yaml - (EL) Airbnb Japan Hostings Example:
python sika/main.py --input examples/airbnb-tokyo.yaml
- (ETL) Getting Ruby Gem Details Example:
- Using command line tools:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file DataSika-1.0.3.tar.gz.
File metadata
- Download URL: DataSika-1.0.3.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
432b5d7df4196c74baa3aa47452dd3891520ba561e2b20e90081e02743437a83
|
|
| MD5 |
483c6715a1836d0b3481f7b9778e0456
|
|
| BLAKE2b-256 |
e045a0ba4cbc8bb0ba3680d2e0e3c9558b110542e5a1fab620d590c1294c1368
|
File details
Details for the file DataSika-1.0.3-py3.9.egg.
File metadata
- Download URL: DataSika-1.0.3-py3.9.egg
- Upload date:
- Size: 66.5 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaee81da904ae10be4a018fd283650a5a074f175903c0f30e46d64c87803ecae
|
|
| MD5 |
6aaa7b76588f0e324a08f2722cad9f4f
|
|
| BLAKE2b-256 |
b2a1e4303829aa319b35659afabddf67253cd00b30df78b91f31283a7240d598
|
File details
Details for the file DataSika-1.0.3-py3.8.egg.
File metadata
- Download URL: DataSika-1.0.3-py3.8.egg
- Upload date:
- Size: 66.6 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c038f17387c366c6f04d8c99c8b5ac37aeceaf6b1f7f2491de16c1c442a22e85
|
|
| MD5 |
2e2b741b8341296bdef39dcf3344b32d
|
|
| BLAKE2b-256 |
f84976dc832d349826117b28f1318d508606a6299aa5b66d6b681858430b8e33
|
File details
Details for the file DataSika-1.0.3-py3.7.egg.
File metadata
- Download URL: DataSika-1.0.3-py3.7.egg
- Upload date:
- Size: 66.7 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38928b3419ee24fc44f68e4952268be51f434e1cb2902fc5dadf726d7e973d98
|
|
| MD5 |
1f21dc9641853bf9c75edf41140e7d36
|
|
| BLAKE2b-256 |
29c21aa68bdbabb611a7b6debfce731d2f072e31b1b863a3d816247ce5ef5fb2
|
File details
Details for the file DataSika-1.0.3-py3-none-any.whl.
File metadata
- Download URL: DataSika-1.0.3-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60e18e0332ad6a592563a3a05c68893d2fd7e422f783a99b1770bf4f19928195
|
|
| MD5 |
ff3ce0d7375bdc7b22bff5e03f78b4a2
|
|
| BLAKE2b-256 |
8bf68197ec43f1224540616a820f4f8521a022100a73deb26e4122877ea7a6c7
|