Skip to main content

Automatically processes data files in directories, converts array-like strings to NumPy arrays, detects and fixes data type issues, and saves results as optimized Parquet files.

Project description

deepcsv

Ever loaded a CSV file and found your carefully structured lists turned into useless strings?

"['Action', 'Sci-Fi', 'Thriller']"  # This is a string, not a list

deepcsv fixes this automatically.


The Solution

deepcsv handles these cases automatically:

  • Reads CSV/XLSX files or existing DataFrames
  • Converts any string list value "[" into real NumPy arrays (fast and lightweight)
  • Detects and fixes mixed-type columns by safely converting them to numeric
  • Recursively processes all CSV/XLSX files in each directory
  • Saves results as Parquet format to preserve types and speed up analysis

Installation

pip install deepcsv

Usage

Single file processing (process_file)

import deepcsv

df = deepcsv.process_file('path/to/file.csv')
  • Accepts str (file path) or pd.DataFrame
  • Returns pd.DataFrame with columns converted to arrays

Batch directory processing (process_all_files)

import deepcsv

deepcsv.process_all_files('path/to/folder')
  • Processes all .csv and .xlsx files recursively
  • Saves converted files as Parquet in: All CSV Files is Converted Here

What it does

  • Auto-detects files in directory and subdirectories
  • Converts values like:
    • "['item1', 'item2']"array(['item1', 'item2']) (NumPy array)
    • Mixed numeric/string columns → single numeric type (float)
  • Handles NaN values without breaking
  • Stores results in Parquet format for type safety and performance

Function Signatures

  • process_file(data_input: Union[str, pd.DataFrame]) -> pd.DataFrame
  • process_all_files(directory_path: str) -> None

Output arrays are NumPy arrays for optimal performance in machine learning workflows.


Key Features

  • Fast NumPy array conversion instead of slow Python lists
  • Mixed-type detection with automatic fixes
  • Parquet storage for data integrity
  • Recursive directory traversal
  • Warning messages for transparency

Notes

  • Requires pyarrow for Parquet support
  • Only saves files that contain converted array columns

Requirements

  • Python >= 3.7
  • pandas
  • pyarrow

Changelog


Added

  • Added for user can customize the output folder name in process_all_files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcsv-0.5.0b1.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepcsv-0.5.0b1-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file deepcsv-0.5.0b1.tar.gz.

File metadata

  • Download URL: deepcsv-0.5.0b1.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.5.0b1.tar.gz
Algorithm Hash digest
SHA256 32531d57d9e1fe7eaf8456c8905d0eace4f88a8d768a7b47875c1c6dbea4ab2d
MD5 d41f3ab9ccc703d04c35cc69a7a4223d
BLAKE2b-256 17af3b90eee8b09e691e06d1d73860e4e9f36667aef3c3876ce2c7033280d03e

See more details on using hashes here.

File details

Details for the file deepcsv-0.5.0b1-py3-none-any.whl.

File metadata

  • Download URL: deepcsv-0.5.0b1-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.5.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 a010c08371c0b62e048d05fa288aa0407e27774574d78bdde3ec2931518f8ff9
MD5 aa338667b794575138e3149f4ad95eaa
BLAKE2b-256 7d23a4a8de89d99728c064150b9ad53ec1482cec93f6686c0f768526f116b2c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page