Skip to main content

A Python library for handling multi-CSV format.

Project description

MultiCSV

codecov types - Mypy flake8 checked License - GPL3 PRs Welcome

Python library multicsv is designed for handling multi-CSV format files. It provides an interface for reading, writing, and manipulating sections of a CSV file as individual text file objects.

Key Features

  • Efficient Section Management: Read and write multiple independent sections within a single CSV file.
  • TextIO Interface: Sections are treated as TextIO objects, enabling familiar file operations.
  • Flexible Operations: Supports reading, writing, iterating, and deleting sections.
  • Context Management: Ensures resource safety with with statement compatibility.
  • Integrated Testing: Includes comprehensive unit tests, covering 100% of the functionality.

The Multi-CSV Format

The multi-CSV format is an extension of the traditional CSV (Comma-Separated Values) format that supports dividing a single file into multiple independent sections. Each section is demarcated by a header enclosed in square brackets, e.g., [section_name]. This format is commonly known for usage in Illumina-MiSeq sample sheet files.

Conceptually, this file format provides the ability to store a whole SQL database in a single, human readable file.

Example

Here's a simplified example of a multi-CSV file:

[section1]
header1,header2,header3
value1,value2,value3

[section2]
headerA,headerB,headerC
valueA,valueB,valueC

In the example above, the file contains two sections: section1 and section2. Each section has its own headers and rows of data.

Usage

Here's a quick example of how to use the multicsv library:

import csv
import multicsv

with multicsv.open('example.csv', mode='w+') as csv_file:
    # Write the CSV content to the file
    csv_file.section('section1').write("header1,header2,header3\nvalue1,value2,value3\n")
    csv_file.section('section2').write("header4,header5,header6\nvalue4,value5,value6\n")

    # Read a section using the csv module
    csv_reader = csv.reader(csv_file['section1'])
    assert list(csv_reader) == [['header1', 'header2', 'header3'],
                                ['value1', 'value2', 'value3']]

There are only two methods exported in multicsv: open and wrap. This is how the latter one is meant to be used:

import io
import multicsv

# Initialize the MultiCSVFile with a base CSV string
csv_content = io.StringIO("""\
[section1]
a,b,c
1,2,3
[section2]
d,e,f
4,5,6
""")

csv_file = multicsv.wrap(csv_content)

# Accessing a section
section1 = csv_file["section1"]
print(section1.read())  # Outputs: "a,b,c\n1,2,3\n"

# Adding a new section
new_section = io.StringIO("g,h,i\n7,8,9\n")
csv_file["section3"] = new_section
csv_file.flush()

# Verify the new section is added
csv_content.seek(0)
print(csv_content.read())
# Outputs:
# [section1]
# a,b,c
# 1,2,3
# [section2]
# d,e,f
# 4,5,6
# [section3]
# g,h,i
# 7,8,9

Both exported methods return a MultiCSVFile object. Objects of that class are MutableMappings from names of sections (: str) to contents of sections (: TextIO).

So, for instance, this is how to print all sections in a multi-csv file:

import multicsv

for section in multicsv.open("example.csv"):
    print(section)

Installation

Install the library using pip:

pip install multicsv

Development

Setting Up

Set up your environment for development as follows:

  1. Clone the repository:

    git clone https://github.com/cfe-lab/multicsv.git
    
  2. Navigate to the project directory:

    cd multicsv
    
  3. Create a virtual environment:

    python3 -m venv venv
    source venv/bin/activate
    
  4. Install dependencies:

    pip install -e .[dev,test]
    

Running Tests

Run the test suite to ensure everything is functioning correctly:

pytest

Contributing

Contributions are welcome! Please follow these steps for contributions:

  1. Fork the repository.
  2. Create a new branch with a descriptive name.
  3. Make your changes and ensure the test suite passes.
  4. Open a pull request with a clear description of what you've done.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multicsv-1.0.4.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

multicsv-1.0.4-py2.py3-none-any.whl (23.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file multicsv-1.0.4.tar.gz.

File metadata

  • Download URL: multicsv-1.0.4.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.7

File hashes

Hashes for multicsv-1.0.4.tar.gz
Algorithm Hash digest
SHA256 44afe54a0435b1f4fec86cd03bedad5a33bdeca1625107d06e3d65b918e7a0c4
MD5 1162f88f897837a966c54ce37a848e98
BLAKE2b-256 06f62120636ad5ad6cbd2bb7a6b06ef8fdcbdff68a31ad3af6abd33bedb57208

See more details on using hashes here.

File details

Details for the file multicsv-1.0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: multicsv-1.0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.7

File hashes

Hashes for multicsv-1.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 15b69d5218f45b7d21a1c07808f97530962418608ff3c97e2ea6aa5409466847
MD5 37d90f39ec7399cbb8c5cc54ccf10dc2
BLAKE2b-256 9e54ca24396fbfcc079dcb74d38a5674bcd8dc3dfe3e9b43dbfcaef126f70ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page