A tool to automatically infer columns data types in .csv files
Project description
Csv Schema Inference
A tool to automatically infer columns data types in .csv files
Installing csv-schema-inference 🔧
pip install csv-schema-inference
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting csv-schema-inference
Downloading csv_schema_inference-0.0.2-py3-none-any.whl (5.2 kB)
Installing collected packages: csv-schema-inference
Successfully installed csv-schema-inference-0.0.2
Importing csv-schema-inference library ⚡
from csv_schema_inference import csv_schema_inference
Setting csv-schema-inference configuration ✍
csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.7, max_length=100, seed=2, header=True, sep=",")
pathfile = "/content/data.csv"
Run inference 🏃
aprox_schema = csv_infer.run_inference(pathfile)
Showing the approximate data type inference for each column 🔍
csv_infer.pretty(aprox_schema)
0
name
key_1
type
STRING
nullable
False
1
name
date_2
type
DATE
nullable
False
2
name
cont_3
type
FLOAT
nullable
False
3
name
cont_4
type
FLOAT
nullable
False
4
name
disc_5
type
INTEGER
nullable
False
5
name
disc_6
type
INTEGER
nullable
True
6
name
cat_7
type
STRING
nullable
False
7
name
cat_8
type
STRING
nullable
False
8
name
cont_9
type
FLOAT
nullable
False
9
name
cont_10
type
FLOAT
nullable
True
Checking schema values for specific columns ✔
result = csv_infer.get_schema_columns(columns = {"disc_6"})
csv_infer.pretty(result)
5
_name
disc_6
values
na
cnt
70755
_type
STRING
14
cnt
34732
_type
INTEGER
17
cnt
35237
_type
INTEGER
12
cnt
35408
_type
INTEGER
10
cnt
35174
_type
INTEGER
4
cnt
34924
_type
INTEGER
8
cnt
34861
_type
INTEGER
7
cnt
35270
_type
INTEGER
13
cnt
35274
_type
INTEGER
5
cnt
35024
_type
INTEGER
0
cnt
35325
_type
INTEGER
2
cnt
35265
_type
INTEGER
16
cnt
35250
_type
INTEGER
6
cnt
34961
_type
INTEGER
15
cnt
35132
_type
INTEGER
11
cnt
35250
_type
INTEGER
3
cnt
35063
_type
INTEGER
1
cnt
35237
_type
INTEGER
9
cnt
35078
_type
INTEGER
nullable
True
approximate_type
INTEGER
Explore all possible data types for a specific columns ✅
result = csv_infer.explore_schema_column(column = "disc_6")
csv_infer.pretty(result)
5
name
disc_6
types
STRING
10.061573902903785
INTEGER
89.93842609709621
nullable
True
Contributing and Feedback
Any ideas or feedback about this repository?. Help me to improve it.
Authors
- Created by Ramses Alexander Coraspe Valdez
- Created on 2022
License
This project is licensed under the terms of the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csv-schema-inference-0.0.4.tar.gz
(41.3 kB
view hashes)
Built Distribution
Close
Hashes for csv-schema-inference-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 24b31d20eb80d2c2d36bfbdb8f012c4ce2a7f1d0639152cb3c032abe84d812e2 |
|
MD5 | 23ca6fbc24bceecf778f0962de29bc9a |
|
BLAKE2b-256 | 095083241b35d10ae83e610dce77cb2a60590ccf3c05527b8158f37bd11d7939 |
Close
Hashes for csv_schema_inference-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c590cfe8fe3c11f3b7a17a8c1afdc8af799f34814c9c73af11ebe26ad96395e |
|
MD5 | 7433b3cebc57874c1857d052adfd78b8 |
|
BLAKE2b-256 | 82719b9f8f844f74ff95c805ef506bf7bfd17024bb90867318228017d6aa360f |