A tool to automatically infer columns data types in .csv files
Project description
Csv Schema Inference
A tool to automatically infer columns data types in .csv files
Installing csv-schema-inference 🔧
pip install csv-schema-inference
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting csv-schema-inference
Downloading csv_schema_inference-0.0.2-py3-none-any.whl (5.2 kB)
Installing collected packages: csv-schema-inference
Successfully installed csv-schema-inference-0.0.2
Importing csv-schema-inference library ⚡
from csv_schema_inference import csv_schema_inference
Setting csv-schema-inference configuration ✍
csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.7, max_length=100, seed=2, header=True, sep=",")
pathfile = "/content/data.csv"
Run inference 🏃
aprox_schema = csv_infer.run_inference(pathfile)
Showing the approximate data type inference for each column 🔍
csv_infer.pretty(aprox_schema)
0
name
key_1
type
STRING
nullable
False
1
name
date_2
type
DATE
nullable
False
2
name
cont_3
type
FLOAT
nullable
False
3
name
cont_4
type
FLOAT
nullable
False
4
name
disc_5
type
INTEGER
nullable
False
5
name
disc_6
type
INTEGER
nullable
True
6
name
cat_7
type
STRING
nullable
False
7
name
cat_8
type
STRING
nullable
False
8
name
cont_9
type
FLOAT
nullable
False
9
name
cont_10
type
FLOAT
nullable
True
Checking schema values for specific columns ✔
result = csv_infer.get_schema_columns(columns = {"disc_6"})
csv_infer.pretty(result)
5
_name
disc_6
values
na
cnt
70755
_type
STRING
14
cnt
34732
_type
INTEGER
17
cnt
35237
_type
INTEGER
12
cnt
35408
_type
INTEGER
10
cnt
35174
_type
INTEGER
4
cnt
34924
_type
INTEGER
8
cnt
34861
_type
INTEGER
7
cnt
35270
_type
INTEGER
13
cnt
35274
_type
INTEGER
5
cnt
35024
_type
INTEGER
0
cnt
35325
_type
INTEGER
2
cnt
35265
_type
INTEGER
16
cnt
35250
_type
INTEGER
6
cnt
34961
_type
INTEGER
15
cnt
35132
_type
INTEGER
11
cnt
35250
_type
INTEGER
3
cnt
35063
_type
INTEGER
1
cnt
35237
_type
INTEGER
9
cnt
35078
_type
INTEGER
nullable
True
approximate_type
INTEGER
Explore all possible data types for a specific columns ✅
result = csv_infer.explore_schema_column(column = "disc_6")
csv_infer.pretty(result)
5
name
disc_6
types
STRING
10.061573902903785
INTEGER
89.93842609709621
nullable
True
Contributing and Feedback
Any ideas or feedback about this repository?. Help me to improve it.
Authors
- Created by Ramses Alexander Coraspe Valdez
- Created on 2022
License
This project is licensed under the terms of the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csv-schema-inference-0.0.3.tar.gz
(41.2 kB
view hashes)
Built Distribution
Close
Hashes for csv-schema-inference-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3a7405ce89209fe7e0256e40ff530b6750de01b501ce731f362a3a7a2c99a8c |
|
MD5 | eb294bc01a2fb4c3d5398a0b00dbc4dd |
|
BLAKE2b-256 | 563ff371cf0b64717c083a7b65db993719256089c519e1b55eeeb374e23f72a2 |
Close
Hashes for csv_schema_inference-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3506d4e7b7edda7a1f720bea4797d386930acf4470fa6713c2e6e7e8b40b402 |
|
MD5 | a0317fe29e033598304354ae440b2d6a |
|
BLAKE2b-256 | e23e414ac7b8350f50b5498fd5e33d1915308c9d4da5c360c7baf4a3349d8bab |