Skip to main content

Automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.

Project description

deepcsv

A Python library that automatically walks through folders and subfolders, finds all CSV and XLSX files, detects and fixes data issues, and saves the results as Parquet files while keeping the exact same folder structure.

Installation

pip install deepcsv

What it does

  • Walks through all folders and subfolders automatically
  • Finds every CSV and XLSX file
  • Detects columns that contain list strings like "['item1', 'item2']" and converts them into real Python arrays for faster performance
  • Detects columns with mixed data types and tries to fix them automatically
  • Warns you when a column has mixed types so you know what was changed
  • Saves the results as Parquet files to preserve the converted data types

Why Parquet? CSV files cannot store arrays or preserve data types. Parquet solves this by keeping the exact types after conversion.

Why arrays instead of Python lists? Arrays are significantly faster for numerical operations and machine learning workflows.

Functions

ConvertListStrToList(file_path)

Reads a CSV file, converts list strings to arrays, fixes mixed-type columns, and returns a clean DataFrame.

import deepcsv

df = deepcsv.ConvertListStrToList("path/to/file.csv")

ReadAllCSVData(path)

Walks through all folders and subfolders, applies ConvertListStrToList on every CSV and XLSX file, and saves the results as Parquet files in a new folder called All CSV Data is Converted Here.

import deepcsv

deepcsv.ReadAllCSVData("path/to/folder")

Notes

  • Only files that contain list string columns are saved as Parquet
  • Mixed-type columns are converted to float automatically when possible
  • Skips NaN values without breaking
  • Requires pyarrow for Parquet support

Requirements

  • Python >= 3.7
  • pandas
  • pyarrow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepcsv-0.3.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepcsv-0.3.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file deepcsv-0.3.0.tar.gz.

File metadata

  • Download URL: deepcsv-0.3.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b3783d17a5bf104e271a02b091caa6dbc49317237f1c8e3671a140f92082e654
MD5 84c9122c7294bd3008d92219240c9fea
BLAKE2b-256 557d8180f3d0ac7a8ac9b1b06c2a230fa69de56ea75fc4a50fcbf75dae575250

See more details on using hashes here.

File details

Details for the file deepcsv-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: deepcsv-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for deepcsv-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2250bbd46416b3b3ca3599ac69fad30d0ed23ec39d25b3be791890b5af0a75db
MD5 637392543400385d2c55580de000f11a
BLAKE2b-256 5e3a94a139db8b0e822affd745291d4f9ca7375194dfa8307c98b5b89a939999

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page