Skip to main content

A simple Python package for loading data from CSV and XLSX files

Project description

Simple Data Loader

Load CSV and Excel files into a single pandas DataFrame — from one file or by concatenating many files in a folder. Simple defaults, clear errors, and optional consistency checks.

TL;DR Quick Start

Install

pip install simple-data-loader

Load a single file

from simple_data_loader import load_data

df = load_data("path\\to\\file.csv")

Load all files in a folder

from simple_data_loader import SimpleDataLoader

loader = SimpleDataLoader(
    "path\\to\\folder",          # file or folder
    include_subfolders=False,        # set True to recurse into subfolders
    verbose=True,                    # print progress/info
    column_consistency='error'       # 'error' | 'warning' | 'ignore'
)

df = loader.load()

Parameters at a glance

  • file_path: str — path to a file or a folder
  • include_subfolders: bool — include subfolders when loading a folder (default False)
  • verbose: bool — print per-file and summary info (default True)
  • column_consistency: 'error' | 'warning' | 'ignore' — how to handle column mismatches when loading a folder (default 'error')

Supported formats: .csv, .xlsx, .xls

Tip: Use 'warning' to proceed while notifying about mismatched columns, or 'ignore' to skip the check entirely.

Below are the full details, examples, and testing instructions.

Features

  • Single File Loading: Read CSV or XLSX files individually
  • Folder Loading: Automatically concatenate all CSV/XLSX files in a folder
  • Subfolder Support: Option to include files from subfolders
  • Verbose Output: Control the level of detail in console output
  • Error Handling: Graceful handling of file errors and unsupported formats
  • Flexible Usage: Both class-based and function-based interfaces

Installation

Install from PyPI

pip install simple-data-loader

Install dependencies only

pip install pandas openpyxl xlrd

Dependencies

  • pandas (>=1.3.0): For data manipulation and DataFrame operations
  • openpyxl (>=3.0.0): For reading Excel (.xlsx) files
  • xlrd (>=2.0.0): For reading legacy Excel (.xls) files

Quick Start

Basic Usage

from simple_data_loader import SimpleDataLoader

# Load a single file
loader = SimpleDataLoader("data.csv")
df = loader.load()

# Load all files from a folder
loader = SimpleDataLoader("data_folder")
df = loader.load()

Using the Convenience Function

from simple_data_loader import load_data

# Direct loading
df = load_data("data.csv")
df = load_data("data_folder")

Detailed Usage

Class Initialization

from simple_data_loader import SimpleDataLoader
SimpleDataLoader(file_path, include_subfolders=False, verbose=True, column_consistency='error')

Parameters:

  • file_path (str): Path to a file or folder
  • include_subfolders (bool): Whether to include files from subfolders (default: False)
  • verbose (bool): Whether to print detailed information (default: True)
  • column_consistency (str): How to handle column consistency ('error', 'warning', 'ignore') (default: 'error')

Examples

1. Single File Loading

from simple_data_loader import SimpleDataLoader

# Load a CSV file
loader = SimpleDataLoader("sales_data.csv")
df = loader.load()

# Load an Excel file
loader = SimpleDataLoader("financial_report.xlsx")
df = loader.load()

Output:

sales_data.csv is imported with 1000 rows and 5 columns

2. Folder Loading (No Subfolders)

loader = SimpleDataLoader("data_folder", include_subfolders=False)
df = loader.load()

Output:

Found 3 files to process
data_1.csv is imported with 500 rows and 4 columns
data_2.csv is imported with 300 rows and 4 columns
data_3.xlsx is imported with 200 rows and 4 columns

Summary:
Successfully loaded 3 files
Combined dataset has 1000 rows and 4 columns

3. Folder Loading (With Subfolders)

loader = SimpleDataLoader("data_folder", include_subfolders=True)
df = loader.load()

This will recursively search through all subfolders and load all CSV/XLSX files.

4. Quiet Mode

loader = SimpleDataLoader("data.csv", verbose=False)
df = loader.load()

No console output will be displayed.

5. Column Consistency Control

# Error mode (default) - stops if columns don't match
loader = SimpleDataLoader("data_folder", column_consistency='error')
df = loader.load()

# Warning mode - shows warning but continues
loader = SimpleDataLoader("data_folder", column_consistency='warning')
df = loader.load()

# Ignore mode - skips consistency check entirely
loader = SimpleDataLoader("data_folder", column_consistency='ignore')
df = loader.load()

Column Consistency Modes:

  • 'error' (default): Raises an error if files have different column counts or names
  • 'warning': Shows a warning but continues processing
  • 'ignore': Skips consistency check entirely

6. Convenience Function

from simple_data_loader import load_data

# All parameters are optional
df = load_data("data.csv")  # Uses defaults
df = load_data("data_folder", include_subfolders=True, verbose=False, column_consistency='warning')

Supported File Formats

  • CSV files: .csv
  • Excel files: .xlsx, .xls

Error Handling

The SimpleDataLoader handles various error scenarios:

  • File not found: Raises FileNotFoundError
  • Unsupported format: Raises ValueError with format information
  • Invalid path: Raises ValueError if path is neither file nor directory
  • Column consistency errors: Raises ValueError when column_consistency='error' and files have mismatched columns
  • Individual file errors: Continues processing other files and reports errors in verbose mode

Example Project Structure

project/
├── simple_data_loader/
│   ├── __init__.py
│   └── simple_data_loader.py
├── tests/
│   └── test_data_loader_pytest.py
├── requirements.txt
├── example_usage.py
├── README.md
├── data/
│   ├── sales_2023.csv
│   ├── sales_2024.csv
│   └── reports/
│       ├── monthly_report.xlsx
│       └── quarterly_summary.csv
└── single_file.csv

Running Examples

To see the SimpleDataLoader in action, run the example script:

python example_usage.py

This will create sample data files and demonstrate various usage patterns.

Testing

Run the comprehensive test suite:

# Run all tests
python -m pytest

# Run tests with verbose output
python -m pytest -v

# Run specific test class
python -m pytest tests/test_data_loader_pytest.py::TestSingleFileLoading -v

The test suite includes 20 tests covering:

  • Single file loading
  • Folder loading with consistent files
  • Column consistency validation
  • Error handling
  • Data integrity checks

API Reference

SimpleDataLoader Class

Methods

  • load(): Load data from the specified path
    • Returns: pandas.DataFrame

Internal Methods

  • _load_single_file(): Load data from a single file
  • _load_folder(): Load and concatenate data from folder
  • _load_single_file_from_path(): Internal method for loading individual files

Convenience Function

  • load_data(file_path, include_subfolders=False, verbose=True, column_consistency='error'): Direct data loading function

Performance Notes

  • Large files are loaded into memory entirely
  • For very large datasets, consider processing files individually
  • Concatenation happens in memory, so ensure sufficient RAM for large folder operations

License

This project is open source and available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_data_loader-1.0.6.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_data_loader-1.0.6-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file simple_data_loader-1.0.6.tar.gz.

File metadata

  • Download URL: simple_data_loader-1.0.6.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for simple_data_loader-1.0.6.tar.gz
Algorithm Hash digest
SHA256 9024353d78559f03d823af15fdd59f030f3fb1969dbf205e272e1599b7968865
MD5 cacbc57cd261ac3dd24be15c60f5c20d
BLAKE2b-256 49113ad246bda125828b417e7706c38669bbf21c1b2a1b77be83f220dc28993f

See more details on using hashes here.

File details

Details for the file simple_data_loader-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_data_loader-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a4615617730ae47d5e41f2d9e85b9c874c70b8a2739f3723ea231388868e091e
MD5 d7e8ceb2326416ea38669a68d93c0885
BLAKE2b-256 aeb2d88c252bf9e1c4ad9206e3dfb10a53d04025160dec00a741252130c4dfd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page