Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files and MORE!

These details have not been verified by PyPI

Project links

Project description

deepcsv

Ever loaded a CSV file and found your carefully structured lists turned into useless strings?

"['Action', 'Sci-Fi', 'Thriller']"  # This is a string, not a list

deepcsv fixes this automatically.

The Solution

deepcsv handles these cases automatically:

Reads CSV/XLSX files or existing DataFrames
Converts any string list value "[" into real NumPy arrays (fast and lightweight)
Detects and fixes mixed-type columns by safely converting them to numeric
Recursively processes all CSV/XLSX files in each directory
Saves results as Parquet format to preserve types and speed up analysis

Installation

pip install deepcsv

Usage

Single file processing (process_file)

import deepcsv

df = deepcsv.process_file('path/to/file.csv')

Accepts str (file path) or pd.DataFrame
Returns pd.DataFrame with columns converted to arrays

Batch directory processing (process_all_files)

import deepcsv

deepcsv.process_all_files('path/to/folder')

Processes all .csv and .xlsx files recursively
Saves converted files as Parquet in: All CSV Files is Converted Here

Utilities

read_any(file_path)

Reads any supported file and returns a pandas DataFrame. No need to manually pick the reader.

from deepcsv import read_any

df = read_any('data/users.csv')
df = read_any('reports/sales.xlsx')
df = read_any('warehouse/orders.parquet')

Supported formats: .csv, .txt, .tsv, .xls, .xlsx, .json, .parquet, .pkl, .feather, .db, .sqlite

clean_values(data_input, ...)

Cleans a DataFrame by removing nulls from specific columns or rows, or dropping rows by index.

from deepcsv import clean_values

# Drop fully-null columns from specific cols
df = clean_values('data.csv', cols=['age', 'salary'])

# Drop rows that have nulls in specific cols
df = clean_values('data.csv', cols=['age', 'salary'], ax_0=True)

# Drop rows by index
df = clean_values(df, index=[0, 5, 12])

# Apply on all columns except some
df = clean_values('data.csv', all_cols_except=['id', 'name'])

Parameters:

Parameter	Type	Default	Description
`data_input`	`str \| DataFrame`	required	File path or DataFrame
`cols`	`list`	`None`	Columns to apply on
`ax_0`	`bool`	`False`	If `True`: drop rows with nulls. If `False`: drop fully-null cols
`index`	`list`	`None`	Row indexes to drop
`all_cols_except`	`list`	`None`	Apply on all columns except these

What it does

Auto-detects files in directory and subdirectories
Converts values like:
- "['item1', 'item2']" → array(['item1', 'item2']) (NumPy array)
- Mixed numeric/string columns → single numeric type (float)
Handles NaN values without breaking
Stores results in Parquet format for type safety and performance

Function Signatures

process_file(data_input: Union[str, pd.DataFrame]) -> pd.DataFrame
process_all_files(directory_path: str) -> None
read_any(file_path: str) -> pd.DataFrame
clean_values(data_input, cols=None, ax_0=False, index=None, all_cols_except=None) -> pd.DataFrame

Output arrays are NumPy arrays for optimal performance in machine learning workflows.

Key Features

Fast NumPy array conversion instead of slow Python lists
Mixed-type detection with automatic fixes
Parquet storage for data integrity
Recursive directory traversal
Warning messages for transparency
Built-in file reader supporting 10+ formats (read_any)
Flexible null/index cleaning (clean_values)

Notes

Requires pyarrow for Parquet support
Only saves files that contain converted array columns

Requirements

Python >= 3.7
pandas
pyarrow

Changelog

Added

finding_value parameter in clean_values(data_input,finding_value) find and remove rows that have this specific value
finding_type parameter in clean_values(data_input,finding_type) find and remove rows that have this specific type (ex: str, int)
condition parameter in clean_values(data_input,condition : [operator, value] → ex: ['>=', 500]) applied only with finding_value or finding_type

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.0

May 1, 2026

0.7.0b2 pre-release

Apr 27, 2026

0.7.0b1 pre-release

Apr 24, 2026

0.6.9

Apr 15, 2026

0.6.9b2 pre-release

Apr 13, 2026

0.6.9b1 pre-release

Apr 12, 2026

0.6.8

Apr 5, 2026

0.6.7

Apr 5, 2026

0.6.6

Apr 5, 2026

0.6.5

Apr 5, 2026

0.6.4

Apr 5, 2026

0.6.3

Apr 4, 2026

0.6.3b1 pre-release

Mar 29, 2026

0.6.2

Mar 28, 2026

0.6.2b2 pre-release

Mar 27, 2026

This version

0.6.2b1 pre-release

Mar 27, 2026

0.6.1

Mar 26, 2026

0.6.0

Mar 26, 2026

0.5.0

Mar 25, 2026

0.5.0b1 pre-release

Mar 25, 2026

0.4.0

Mar 24, 2026

0.3.0

Mar 23, 2026

0.2.0

Mar 22, 2026

0.1.0

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcsv-0.6.2b1.tar.gz (8.6 kB view details)

Uploaded Mar 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deepcsv-0.6.2b1-py3-none-any.whl (8.6 kB view details)

Uploaded Mar 27, 2026 Python 3

File details

Details for the file deepcsv-0.6.2b1.tar.gz.

File metadata

Download URL: deepcsv-0.6.2b1.tar.gz
Upload date: Mar 27, 2026
Size: 8.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.6.2b1.tar.gz
Algorithm	Hash digest
SHA256	`deaa6697def36092027e5789585ce11a062db07183d0777e2437a2b5bc730be5`
MD5	`c7feeb64dc25b76c9f6024da3c015db1`
BLAKE2b-256	`3078dcef40ff424a84f9e785d39a5f46954ef9a5f8507de41b27367ac5aabe40`

See more details on using hashes here.

File details

Details for the file deepcsv-0.6.2b1-py3-none-any.whl.

File metadata

Download URL: deepcsv-0.6.2b1-py3-none-any.whl
Upload date: Mar 27, 2026
Size: 8.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.6.2b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c92bbf75bffb24cbc69d640768951d872f09e18630cc6d742c85505bc79adff8`
MD5	`969089096d5b67578848f26d6a9447a8`
BLAKE2b-256	`ba74d4611befde3db4029bb24859f05dba4878f064636fe251704cd08e36c30a`

See more details on using hashes here.

deepcsv 0.6.2b1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

deepcsv

The Solution

Installation

Usage

Single file processing (process_file)

Batch directory processing (process_all_files)

Utilities

read_any(file_path)

clean_values(data_input, ...)

What it does

Function Signatures

Key Features

Notes

Requirements

Changelog

Added

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes