Automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.

Project description

deepcsv

Stop losing your data types when working with CSV files.
deepcsv automatically cleans messy CSV/XLSX data and converts it into ML-ready Parquet format.

Installation

pip install deepcsv

Example

Before

# CSV column value
"['a', 'b', 'c']"

After

# Automatically converted
['a', 'b', 'c']

Usage

import deepcsv

df = deepcsv.ConvertListStrToList("path/to/file.csv")

What it does

Walks through all folders and subfolders automatically
Finds every CSV and XLSX file
Detects columns that contain list strings like "['item1', 'item2']" and converts them into real Python arrays for faster performance
Detects columns with mixed data types and tries to fix them automatically
Warns you when a column has mixed types so you know what was changed
Saves the results as Parquet files to preserve the converted data types

Why Parquet?

CSV files cannot store arrays or preserve data types.
Parquet solves this by keeping the exact types after conversion and is much faster for data processing workflows.

Why arrays instead of Python lists?

Arrays are significantly faster for numerical operations and machine learning workflows, especially when working with large datasets.

Functions

`ConvertListStrToList(file_path)`

Reads a CSV file, converts list strings to arrays, fixes mixed-type columns, and returns a clean DataFrame.

import deepcsv

df = deepcsv.ConvertListStrToList("path/to/file.csv")

`ReadAllCSVData(path)`

Walks through all folders and subfolders, applies ConvertListStrToList on every CSV and XLSX file, and saves the results as Parquet files in a new folder called All CSV Data is Converted Here.

import deepcsv

deepcsv.ReadAllCSVData("path/to/folder")

Notes

Only files that contain list string columns are saved as Parquet
Mixed-type columns are converted to float automatically when possible
Skips NaN values without breaking
Requires pyarrow for Parquet support

Requirements

Python >= 3.7
pandas
pyarrow

Project details

Release history Release notifications | RSS feed

0.7.0

May 1, 2026

0.7.0b2 pre-release

Apr 27, 2026

0.7.0b1 pre-release

Apr 24, 2026

0.6.9

Apr 15, 2026

0.6.9b2 pre-release

Apr 13, 2026

0.6.9b1 pre-release

Apr 12, 2026

0.6.8

Apr 5, 2026

0.6.7

Apr 5, 2026

0.6.6

Apr 5, 2026

0.6.5

Apr 5, 2026

0.6.4

Apr 5, 2026

0.6.3

Apr 4, 2026

0.6.3b1 pre-release

Mar 29, 2026

0.6.2

Mar 28, 2026

0.6.2b2 pre-release

Mar 27, 2026

0.6.2b1 pre-release

Mar 27, 2026

0.6.1

Mar 26, 2026

0.6.0

Mar 26, 2026

0.5.0

Mar 25, 2026

0.5.0b1 pre-release

Mar 25, 2026

This version

0.4.0

Mar 24, 2026

0.3.0

Mar 23, 2026

0.2.0

Mar 22, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcsv-0.4.0.tar.gz (4.1 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepcsv-0.4.0-py3-none-any.whl (4.4 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file deepcsv-0.4.0.tar.gz.

File metadata

Download URL: deepcsv-0.4.0.tar.gz
Upload date: Mar 24, 2026
Size: 4.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`30d19114ed3dc17216bf37a3b183f8e2c19ed81facbe9fd52a4b3a40ea409ab6`
MD5	`5f2aee1a2a8644fb2bc3a26d4ea687a6`
BLAKE2b-256	`a7c034eba24a1d33bdad4eb087ef6792c797de5178e7c2aedf2cef6f684a7e7c`

See more details on using hashes here.

File details

Details for the file deepcsv-0.4.0-py3-none-any.whl.

File metadata

Download URL: deepcsv-0.4.0-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 4.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7452b6316a402bd35c0f9f5c46641f21c8aceccbe709a6524314ada907e8528f`
MD5	`99e79d52a639e98e994b7d6f62f5fb9c`
BLAKE2b-256	`8a3039318812eea38a0df236ef51840a2b175892a2cee4545abfe00e267cf791`

See more details on using hashes here.

deepcsv 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

deepcsv

Installation

Example

Before

After

Usage

What it does

Why Parquet?

Why arrays instead of Python lists?

Functions

`ConvertListStrToList(file_path)`

`ReadAllCSVData(path)`

Notes

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes