Skip to main content

Efficient file slicing and memory-mapped line iteration.

Project description

fileslicer

PyPI - Version PyPI - Python Version pre-commit.ci status


fileslicer is a lightweight Python library for efficiently reading and splitting large files using memory mapping. It allows you to iterate over lines within a file slice and split files into chunks without loading the entire file into memory, making it ideal for processing very large files.


Features

  • Memory-efficient line iteration using mmap.
  • Split large files into chunks while respecting newline boundaries.
  • Simple and Pythonic API.
  • Works with files of arbitrary size.

Installation

Install via pip:

pip install fileslicer

Usage

Basic Example: Iterate over a file

from fileslicer import FileSlice

# Create a FileSlice for an entire file
slice = FileSlice.from_file("large_file.txt")

# Iterate over lines in the slice
for line in slice.iter_lines():
    print(line.decode().strip())

Split a File into Chunks

from fileslicer import FileSlice

# Split a file into 4 chunks
chunks = FileSlice.split_file("large_file.txt", splits=4)

for chunk in chunks:
    print(f"Processing bytes {chunk.start_offset}-{chunk.end_offset}")
    for line in chunk.iter_lines():
        print(line.decode().strip())

Create a Custom File Slice

from fileslicer import FileSlice

# Only read bytes 1000 to 5000
slice = FileSlice("large_file.txt", 1000, 5000)

for line in slice.iter_lines():
    print(line.decode().strip())

API

FileSlice

  • FileSlice(file_path: str, start_offset: int, end_offset: int): Represents a slice of a file.

  • iter_lines() -> Generator[bytes]: Iterate over lines in the file slice as bytes.

  • @staticmethod from_file(file_path: str) -> FileSlice: Create a FileSlice covering the entire file.

  • @staticmethod split_file(file_path: str, splits: int) -> list[FileSlice]: Split a file into multiple slices, aligned to newline boundaries.


Why Use fileslicer?

Processing extremely large files with standard file reading can be slow and memory-intensive. fileslicer uses memory mapping to efficiently slice and iterate over file data without reading everything into memory. Inspired by the "1 Billion Row Challenge" in Python, it is perfect for data processing pipelines, log analysis, and ETL tasks.


License

fileslicer is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileslicer-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fileslicer-0.1.0-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file fileslicer-0.1.0.tar.gz.

File metadata

  • Download URL: fileslicer-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fileslicer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6198f736662b49f8273b163f7140052f142924a7f273d4bb29c80ae81910c1a8
MD5 66bbbdf1593209faaa0d4fd18182e2ce
BLAKE2b-256 f68b6a658ce67f71058141be2cff7e46f52fe26dfd160adc3b9e880b4c1c33f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for fileslicer-0.1.0.tar.gz:

Publisher: main.yaml on FlavioAmurrioCS/fileslicer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fileslicer-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fileslicer-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fileslicer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 338c5448387c879b56862f458426b5754f98341d6583348ca6e86b7a0b9288db
MD5 91592a564e1c390d4a76ea01d000f52f
BLAKE2b-256 510a8fa4cd80a22333e579ff56ae66b9e948efeaee21a53fcd86e5e505c72906

See more details on using hashes here.

Provenance

The following attestation bundles were made for fileslicer-0.1.0-py3-none-any.whl:

Publisher: main.yaml on FlavioAmurrioCS/fileslicer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page