Skip to main content

A Python library for handling multi-CSV format.

Project description

MultiCSV

codecov types - Mypy flake8 checked License - GPL3 PRs Welcome

Python library multicsv is designed for handling multi-CSV format files. It provides an interface for reading, writing, and manipulating sections of a CSV file as individual text file objects.

Key Features

  • Efficient Section Management: Read and write multiple independent sections within a single CSV file.
  • TextIO Interface: Sections are treated as TextIO objects, enabling familiar file operations.
  • Flexible Operations: Supports reading, writing, iterating, and deleting sections.
  • Context Management: Ensures resource safety with with statement compatibility.
  • Integrated Testing: Includes comprehensive unit tests, covering 100% of the functionality.

The Multi-CSV Format

The multi-CSV format is an extension of the traditional CSV (Comma-Separated Values) format that supports dividing a single file into multiple independent sections. Each section is demarcated by a header enclosed in square brackets, e.g., [section_name]. This format is commonly known for usage in Illumina-MiSeq sample sheet files.

Conceptually, this file format provides the ability to store a whole SQL database in a single, human readable file.

Example

Here's a simplified example of a multi-CSV file:

[section1]
header1,header2,header3
value1,value2,value3

[section2]
headerA,headerB,headerC
valueA,valueB,valueC

In the example above, the file contains two sections: section1 and section2. Each section has its own headers and rows of data.

Usage

Here's a quick example of how to use the multicsv library:

import csv
import multicsv

with multicsv.open('example.csv', mode='w+') as csv_file:
    # Write the CSV content to the file
    csv_file.section('section1').write("header1,header2,header3\nvalue1,value2,value3\n")
    csv_file.section('section2').write("header4,header5,header6\nvalue4,value5,value6\n")

    # Read a section using the csv module
    csv_reader = csv.reader(csv_file['section1'])
    assert list(csv_reader) == [['header1', 'header2', 'header3'],
                                ['value1', 'value2', 'value3']]

There are only two methods exported in multicsv: open and wrap. This is how the latter one is meant to be used:

import io
import multicsv

# Initialize the MultiCSVFile with a base CSV string
csv_content = io.StringIO("""\
[section1]
a,b,c
1,2,3
[section2]
d,e,f
4,5,6
""")

csv_file = multicsv.wrap(csv_content)

# Accessing a section
section1 = csv_file["section1"]
print(section1.read())  # Outputs: "a,b,c\n1,2,3\n"

# Adding a new section
new_section = io.StringIO("g,h,i\n7,8,9\n")
csv_file["section3"] = new_section
csv_file.flush()

# Verify the new section is added
csv_content.seek(0)
print(csv_content.read())
# Outputs:
# [section1]
# a,b,c
# 1,2,3
# [section2]
# d,e,f
# 4,5,6
# [section3]
# g,h,i
# 7,8,9

Both exported methods return a MultiCSVFile object. Objects of that class are MutableMappings from names of sections (: str) to contents of sections (: TextIO).

So, for instance, this is how to print all sections in a multi-csv file:

import multicsv

for section in multicsv.open("example.csv"):
    print(section)

Installation

Install the library using pip:

pip install multicsv

Development

Setting Up

Set up your environment for development as follows:

  1. Clone the repository:

    git clone https://github.com/cfe-lab/multicsv.git
    
  2. Navigate to the project directory:

    cd multicsv
    
  3. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
    
  4. Install dependencies:

    pip install -e .[dev,test]
    

Running Tests

Run the test suite to ensure everything is functioning correctly:

pytest

Contributing

Contributions are welcome! Please follow these steps for contributions:

  1. Fork the repository.
  2. Create a new branch with a descriptive name.
  3. Make your changes and ensure the test suite passes.
  4. Open a pull request with a clear description of what you've done.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multicsv-1.0.5.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multicsv-1.0.5-py2.py3-none-any.whl (23.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file multicsv-1.0.5.tar.gz.

File metadata

  • Download URL: multicsv-1.0.5.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for multicsv-1.0.5.tar.gz
Algorithm Hash digest
SHA256 5220e1ad45fa7bebf6c76c66d36aae8d79ac5dfcd3b1073bb30130e991a37aac
MD5 b2034c13a269287cda37a062f3692e48
BLAKE2b-256 de9e7a41248690b0f912ef67dc3de16e080f7b3018055aba6d501aa0491b02a3

See more details on using hashes here.

File details

Details for the file multicsv-1.0.5-py2.py3-none-any.whl.

File metadata

  • Download URL: multicsv-1.0.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for multicsv-1.0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0cf83ff324e67675df2e33a7ffed4731f35918874661a8ff4e07b30b8dd0adf1
MD5 6cbf7c506a5d97804cd0cbe69798d875
BLAKE2b-256 f4863157ecdd7818be0d9772de2b2dacc8c15eb441bddedb0537c2da7fcddee2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page