Skip to main content

A tool to automatically infer columns data types in .csv files

Project description

Csv Schema Inference

A tool to automatically infer columns data types in .csv files

Installing csv-schema-inference 🔧

pip install csv-schema-inference
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting csv-schema-inference
  Downloading csv_schema_inference-0.0.2-py3-none-any.whl (5.2 kB)
Installing collected packages: csv-schema-inference
Successfully installed csv-schema-inference-0.0.2

Importing csv-schema-inference library

from csv_schema_inference import csv_schema_inference

Setting csv-schema-inference configuration

csv_infer = csv_schema_inference.CsvSchemaInference(portion=0.7, max_length=100, seed=2, header=True, sep=",")
pathfile = "/content/data.csv"

Run inference 🏃

aprox_schema = csv_infer.run_inference(pathfile)

Showing the approximate data type inference for each column 🔍

csv_infer.pretty(aprox_schema)
0
	name
		key_1
	type
		STRING
	nullable
		False
1
	name
		date_2
	type
		DATE
	nullable
		False
2
	name
		cont_3
	type
		FLOAT
	nullable
		False
3
	name
		cont_4
	type
		FLOAT
	nullable
		False
4
	name
		disc_5
	type
		INTEGER
	nullable
		False
5
	name
		disc_6
	type
		INTEGER
	nullable
		True
6
	name
		cat_7
	type
		STRING
	nullable
		False
7
	name
		cat_8
	type
		STRING
	nullable
		False
8
	name
		cont_9
	type
		FLOAT
	nullable
		False
9
	name
		cont_10
	type
		FLOAT
	nullable
		True

Checking schema values for specific columns

result = csv_infer.get_schema_columns(columns = {"disc_6"})
csv_infer.pretty(result)
5
	_name
		disc_6
	values
		na
			cnt
				70755
			_type
				STRING
		14
			cnt
				34732
			_type
				INTEGER
		17
			cnt
				35237
			_type
				INTEGER
		12
			cnt
				35408
			_type
				INTEGER
		10
			cnt
				35174
			_type
				INTEGER
		4
			cnt
				34924
			_type
				INTEGER
		8
			cnt
				34861
			_type
				INTEGER
		7
			cnt
				35270
			_type
				INTEGER
		13
			cnt
				35274
			_type
				INTEGER
		5
			cnt
				35024
			_type
				INTEGER
		0
			cnt
				35325
			_type
				INTEGER
		2
			cnt
				35265
			_type
				INTEGER
		16
			cnt
				35250
			_type
				INTEGER
		6
			cnt
				34961
			_type
				INTEGER
		15
			cnt
				35132
			_type
				INTEGER
		11
			cnt
				35250
			_type
				INTEGER
		3
			cnt
				35063
			_type
				INTEGER
		1
			cnt
				35237
			_type
				INTEGER
		9
			cnt
				35078
			_type
				INTEGER
	nullable
		True
	approximate_type
		INTEGER

Explore all possible data types for a specific columns

result = csv_infer.explore_schema_column(column = "disc_6")
csv_infer.pretty(result)
5
	name
		disc_6
	types
		STRING
			10.061573902903785
		INTEGER
			89.93842609709621
	nullable
		True

Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

Authors

License

This project is licensed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv-schema-inference-0.0.4.tar.gz (41.3 kB view hashes)

Uploaded Source

Built Distribution

csv_schema_inference-0.0.4-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page