Efficient file slicing and memory-mapped line iteration.
Project description
fileslicer
fileslicer is a lightweight Python library for efficiently reading and splitting large files using memory mapping. It allows you to iterate over lines within a file slice and split files into chunks without loading the entire file into memory, making it ideal for processing very large files.
Features
- Memory-efficient line iteration using
mmap. - Split large files into chunks while respecting newline boundaries.
- Simple and Pythonic API.
- Works with files of arbitrary size.
Installation
Install via pip:
pip install fileslicer
Usage
Basic Example: Iterate over a file
from fileslicer import FileSlice
# Create a FileSlice for an entire file
slice = FileSlice.from_file("large_file.txt")
# Iterate over lines in the slice
for line in slice.iter_lines():
print(line.decode().strip())
Split a File into Chunks
from fileslicer import FileSlice
# Split a file into 4 chunks
chunks = FileSlice.split_file("large_file.txt", splits=4)
for chunk in chunks:
print(f"Processing bytes {chunk.start_offset}-{chunk.end_offset}")
for line in chunk.iter_lines():
print(line.decode().strip())
Create a Custom File Slice
from fileslicer import FileSlice
# Only read bytes 1000 to 5000
slice = FileSlice("large_file.txt", 1000, 5000)
for line in slice.iter_lines():
print(line.decode().strip())
API
FileSlice
-
FileSlice(file_path: str, start_offset: int, end_offset: int): Represents a slice of a file. -
iter_lines() -> Generator[bytes]: Iterate over lines in the file slice as bytes. -
@staticmethod from_file(file_path: str) -> FileSlice: Create aFileSlicecovering the entire file. -
@staticmethod split_file(file_path: str, splits: int) -> list[FileSlice]: Split a file into multiple slices, aligned to newline boundaries.
Why Use fileslicer?
Processing extremely large files with standard file reading can be slow and memory-intensive. fileslicer uses memory mapping to efficiently slice and iterate over file data without reading everything into memory. Inspired by the "1 Billion Row Challenge" in Python, it is perfect for data processing pipelines, log analysis, and ETL tasks.
License
fileslicer is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fileslicer-0.1.0.tar.gz.
File metadata
- Download URL: fileslicer-0.1.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6198f736662b49f8273b163f7140052f142924a7f273d4bb29c80ae81910c1a8
|
|
| MD5 |
66bbbdf1593209faaa0d4fd18182e2ce
|
|
| BLAKE2b-256 |
f68b6a658ce67f71058141be2cff7e46f52fe26dfd160adc3b9e880b4c1c33f2
|
Provenance
The following attestation bundles were made for fileslicer-0.1.0.tar.gz:
Publisher:
main.yaml on FlavioAmurrioCS/fileslicer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fileslicer-0.1.0.tar.gz -
Subject digest:
6198f736662b49f8273b163f7140052f142924a7f273d4bb29c80ae81910c1a8 - Sigstore transparency entry: 543081129
- Sigstore integration time:
-
Permalink:
FlavioAmurrioCS/fileslicer@54d3213a1e2a8210af2f57e081a68552c5c0faac -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/FlavioAmurrioCS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
main.yaml@54d3213a1e2a8210af2f57e081a68552c5c0faac -
Trigger Event:
push
-
Statement type:
File details
Details for the file fileslicer-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fileslicer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
338c5448387c879b56862f458426b5754f98341d6583348ca6e86b7a0b9288db
|
|
| MD5 |
91592a564e1c390d4a76ea01d000f52f
|
|
| BLAKE2b-256 |
510a8fa4cd80a22333e579ff56ae66b9e948efeaee21a53fcd86e5e505c72906
|
Provenance
The following attestation bundles were made for fileslicer-0.1.0-py3-none-any.whl:
Publisher:
main.yaml on FlavioAmurrioCS/fileslicer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fileslicer-0.1.0-py3-none-any.whl -
Subject digest:
338c5448387c879b56862f458426b5754f98341d6583348ca6e86b7a0b9288db - Sigstore transparency entry: 543081130
- Sigstore integration time:
-
Permalink:
FlavioAmurrioCS/fileslicer@54d3213a1e2a8210af2f57e081a68552c5c0faac -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/FlavioAmurrioCS
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
main.yaml@54d3213a1e2a8210af2f57e081a68552c5c0faac -
Trigger Event:
push
-
Statement type: