Skip to main content

CSV处理工具包 (CSV Process Tool Kit)

Project description

Tabular Tool Kit - 表格数据处理工具包 (Tabular Data Processing Toolkit)

English | 中文

中文

简介

Tabular Tool Kit 是一个高性能的表格数据处理工具包,专为处理大型CSV文件而设计。它能够将大型CSV文件拆分成多个小文件,并自动转换为Excel格式,解决了Excel无法打开超过100万行数据的限制问题。

特点

  • 高性能多线程处理:自动检测CPU核心数,充分利用多线程性能
  • 智能拆分:预先计算文件大小和行数,智能确定拆分数量
  • 严格控制文件大小:确保每个拆分后的文件小于指定大小
  • 自动格式转换:自动将拆分后的CSV文件转换为XLSX格式
  • 进度可视化:实时显示处理进度

安装

pip install tabular_tool_kit

使用方法

命令行使用

# 基本用法
cptk input.csv output_directory

# 指定每个文件的最大大小(MB)
cptk input.csv output_directory --max-size 50

# 安静模式,不显示进度条
cptk input.csv output_directory --quiet

# 显示中文帮助
cptk -h --zh

# 显示英文帮助
cptk -h --en

# 显示版本信息
cptk --version

作为Python模块使用

from cptk import CSVSplitterConverter

# 创建处理器
processor = CSVSplitterConverter(
    input_file="large_file.csv",
    output_dir="output",
    max_size_mb=95,  # 每个文件最大大小(MB)
    verbose=True  # 是否显示详细输出
)

# 执行拆分和转换
num_files = processor.split_csv()
print(f"共生成 {num_files} 个文件")

输出目录结构

output_directory/
├── split_csv/  # 拆分后的CSV文件
│   ├── original-0001.csv
│   ├── original-0002.csv
│   └── ...
└── split_xlsx/  # 转换后的XLSX文件
    ├── original-0001.xlsx
    ├── original-0002.xlsx
    └── ...

English

Introduction

CPTK (CSV Process Tool Kit) is a high-performance CSV file processing toolkit designed for processing large CSV files. It can split large CSV files into multiple smaller files and automatically convert them to Excel format, solving the limitation of Excel being unable to open data with more than 1 million rows.

Features

  • High-performance multi-threading: Automatically detects CPU cores and fully utilizes multi-threading performance
  • Intelligent splitting: Pre-calculates file size and line count to intelligently determine split quantity
  • Strict file size control: Ensures each split file is smaller than the specified size
  • Automatic format conversion: Automatically converts split CSV files to XLSX format
  • Progress visualization: Displays processing progress in real-time

Installation

pip install tabular_tool_kit

Usage

Command Line Usage

# Basic usage
cptk input.csv output_directory

# Specify maximum file size (MB)
cptk input.csv output_directory --max-size 50

# Quiet mode, no progress bars
cptk input.csv output_directory --quiet

# Show Chinese help
cptk -h --zh

# Show English help
cptk -h --en

# Show version information
cptk --version

Use as a Python Module

from cptk import CSVSplitterConverter

# Create processor
processor = CSVSplitterConverter(
    input_file="large_file.csv",
    output_dir="output",
    max_size_mb=95,  # Maximum file size (MB)
    verbose=True  # Whether to display detailed output
)

# Execute splitting and conversion
num_files = processor.split_csv()
print(f"Generated {num_files} files")

Output Directory Structure

output_directory/
├── split_csv/  # Split CSV files
│   ├── original-0001.csv
│   ├── original-0002.csv
│   └── ...
└── split_xlsx/  # Converted XLSX files
    ├── original-0001.xlsx
    ├── original-0002.xlsx
    └── ...

License

License

Licensed under the Apache License, Version 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabular_tool_kit-1.0.5.tar.gz (25.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tabular_tool_kit-1.0.5-py3-none-any.whl (30.5 MB view details)

Uploaded Python 3

File details

Details for the file tabular_tool_kit-1.0.5.tar.gz.

File metadata

  • Download URL: tabular_tool_kit-1.0.5.tar.gz
  • Upload date:
  • Size: 25.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for tabular_tool_kit-1.0.5.tar.gz
Algorithm Hash digest
SHA256 5778b89ca974a8334e1df64c0c95f5908039e1dfdf0a0f543a9fa6ee85344c22
MD5 ee6ed1d28a386dc47776a8f549c920ff
BLAKE2b-256 213a00bc1a5aa37640b0393a8f6f41c0498bb2466a4336d234afd79d95f05737

See more details on using hashes here.

File details

Details for the file tabular_tool_kit-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for tabular_tool_kit-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 24b1bdcb8c4f73def68549fe75632d5577f534d977353026991b26c9694f8d91
MD5 db831a4295809f3f2a464044364e806d
BLAKE2b-256 d0d12db8f7b9c169d429ce65e5dfc2bf19427420bf87091f03b6d249cddb0987

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page