Data Quality Framework provides by Jabar Digital Service
Project description
DataSae
Data Quality Framework provides by Jabar Digital Service
Configuration Files
Checker for Data Quality
[!NOTE]
You can use DataSae Column's Function Based on Data Type for adding column checker function data quality in the config file.
pip install 'DataSae[converter,gsheet,s3,sql]'
Python Code
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# Check all data qualities on configuration
config.checker # dict result
# Check data quality by config name
config('test_local').checker # list of dict result
config('test_gsheet').checker # list of dict result
config('test_s3').checker # list of dict result
config('test_mariadb_or_mysql').checker # list of dict result
config('test_postgresql').checker # list of dict result
Example results: https://github.com/jabardigitalservice/DataSae/blob/46ef80072b98ca949084b4e1ae50bcf23d07d646/tests/data/checker.json#L1-L432
Command Line Interface (CLI)
datasae --help
Usage: datasae [OPTIONS] FILE_PATH
Checker command.
Creates checker result based on the configuration provided in the checker section of the data source's configuration file.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * file_path TEXT The source path of the .json or .yaml file [default: None] [required] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --config-name TEXT If the config name is not set, it will create all of the checker results [default: None] │
│ --yaml-display --json-display [default: yaml-display] │
│ --save-to-file-path TEXT [default: None] │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Example commands:
datasae DataSae/tests/data/config.yaml # Check all data qualities on configuration
datasae DataSae/tests/data/config.yaml --config-name test_local # Check data quality by config name
[!TIP] Actually, we have example for DataSae implementation in Apache Airflow, but for now it is for private use only. Internal developer can see it at this git repository.
Converter from Any Data Source to Pandas's DataFrame
[!NOTE]
Currently support to convert from CSV, JSON, Parquet, Excel, Google Spreadsheet, and SQL.
pip install 'DataSae[converter]'
Local Computer
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# Local computer file to DataFrame
local = config('test_local')
df = local('path/file_name.csv', sep=',')
df = local('path/file_name.json')
df = local('path/file_name.parquet')
df = local('path/file_name.xlsx', sheet_name='Sheet1')
df = local('path/file_name.csv') # Default: sep = ','
df = local('path/file_name.json')
df = local('path/file_name.parquet')
df = local('path/file_name.xlsx') # Default: sheet_name = 'Sheet1'
Google Spreadsheet
pip install 'DataSae[converter,gsheet]'
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# Google Spreadsheet to DataFrame
gsheet = config('test_gsheet')
df = gsheet('Sheet1')
df = gsheet('Sheet1', 'gsheet_id')
S3
pip install 'DataSae[converter,s3]'
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# S3 object to DataFrame
s3 = config('test_s3')
df = s3('path/file_name.csv', sep=',')
df = s3('path/file_name.json')
df = s3('path/file_name.parquet')
df = s3('path/file_name.xlsx', sheet_name='Sheet1')
df = s3('path/file_name.csv', 'bucket_name') # Default: sep = ','
df = s3('path/file_name.json', 'bucket_name')
df = s3('path/file_name.parquet', 'bucket_name')
df = s3('path/file_name.xlsx', 'bucket_name') # Default: sheet_name = 'Sheet1'
SQL
pip install 'DataSae[converter,sql]'
[!IMPORTANT] For MacOS users, if pip install failed at
mysqlclient
, please run this and retry to install again after that.brew install mysql
MariaDB or MySQL
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# MariaDB or MySQL to DataFrame
mariadb_or_mysql = config('test_mariadb_or_mysql')
df = mariadb_or_mysql('select 1 column_name from schema_name.table_name;')
df = mariadb_or_mysql('path/file_name.sql')
PostgreSQL
from datasae.converter import Config
# From JSON
config = Config('DataSae/tests/data/config.json')
# From YAML
config = Config('DataSae/tests/data/config.yaml')
# PostgreSQL to DataFrame
postgresql = config('test_postgresql')
df = postgresql('select 1 column_name from schema_name.table_name;')
df = postgresql('path/file_name.sql')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datasae-0.5.2.tar.gz
.
File metadata
- Download URL: datasae-0.5.2.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9b3f5faed23e1cf3663f9467d58b6622e718f5bfaaaede69d258e1ede88cd56 |
|
MD5 | 22ea5ae50379cb812fcf2e6df5f5318c |
|
BLAKE2b-256 | 8059535312823de020a760cbc9d5eab7dcc1fc8e21a2a2b907b9ea987681bddf |
File details
Details for the file DataSae-0.5.2-py3-none-any.whl
.
File metadata
- Download URL: DataSae-0.5.2-py3-none-any.whl
- Upload date:
- Size: 37.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec647de81588cc40c5bdb9aa6375ee880211add3739f7a153cb12befb94c50c4 |
|
MD5 | fe7dc71bfd06f3a1045c4f2e2fc2faba |
|
BLAKE2b-256 | 7a22252fb4be144d3be03900affcc2ca9a9f8d456d47dc5e499f19307552c593 |