Python module that is capable of splitting files and merging it back.
Project description
filesplit
File splitting and merging made easy for python programmers!
- This module
Can split files of any size into multiple chunks and also merge them back.
Can handle both structured and unstructured files.
System Requirements
Operating System: Windows/Linux/Mac
Python version: 3.x.x
Installation
The module is available as a part of PyPI and can be easily installed using pip
pip install filesplit
Split
Create an instance
from filesplit.split import Split
split = Split(inputfile: str, outputdir: str)
inputfile (str, Required) - Path to the original file.
outputdir (str, Required) - Output directory path to write the file splits.
With the instance created, the following methods can be used on the instance
bysize (size: int, newline: Optional[bool] = False, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None
Splits file by size.
Args:
size (int, Required): Max size in bytes that is allowed in each split.
newline (bool, Optional): Setting this to True will not produce any incomplete lines in each split. Defaults to False.
includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.
callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.
Returns:
None
bylinecount(self, linecount: int, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None
Splits file by line count.
Args:
linecount (int, Required): Max lines that is allowed in each split.
includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.
callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.
Returns:
None
The file splits are generated in this fashion [original_filename]_0001.ext, [original_filename]_0002.ext, .., [original_filename]_n.ext.
A manifest file is also created in the output directory to keep track of the file splits. This manifest file is required for merge operation.
- Moreover,
The delimiter for the generated splits can be changed by setting splitdelimiter property like split.splitdelimiter='$'. Default is _ (underscore).
The number of zero fill digits for the generated splits can be changed by setting splitzerofill property like split.splitzerofill=10. Default is 4.
The manifest file name for the generated splits can be changed by setting manfilename property like split.manfilename='man'. Default is manifest.
To forcefully and safely terminate the process set the property terminate to True while the process is running.
Merge
Create an instance
from filesplit.merge import Merge
merge = Merge(inputdir: str, outputdir: str, outputfilename: str)
inputdir (str, Required) - Path to the directory containing file splits.
outputdir (str, Required) - Output directory path to write the merged file.
outputfilename (str, Required) - Name to use for the merged file.
With the instance created, the following method can be used on the instance
merge(cleanup: Optional[bool] = False, callback: Optional[Callable] = None) -> None
Merges the split files back into one single file.
Args:
cleanup (bool, Optional): If True, all the split files and manifest file will be purged after successful merge. Defaults to False.
callback (Callable, Optional): Callback function to invoke after merge. The callback function should accept two arguments [func (str, int)] - full path to the merged file, merged file size (bytes). Defaults to None.
Returns:
None
- Moreover,
The manifest file name can be changed by setting manfilename property like merge.manfilename='man'. The manifest file name should match with the one used during the file split process and should be available in the same directory as that of file splits. Default is manifest.
To forcefully and safely terminate the process set the property terminate to True while the process is running.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file filesplit-4.1.0.tar.gz
.
File metadata
- Download URL: filesplit-4.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1aceb3a8bea84743254683e6b97056aa24593783f3b7e35dac10bac706e184b3 |
|
MD5 | e652e13dd8e8d30117694ba054e6ffc4 |
|
BLAKE2b-256 | 0f1739439b12d77c4ca76e795832b3d3209609b58bc5a0a375630e271b5d7b88 |
File details
Details for the file filesplit-4.1.0-py3-none-any.whl
.
File metadata
- Download URL: filesplit-4.1.0-py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5244718d37302b5741a7ffe11e7379bd178bcf31d8350632be200ba94c74a12c |
|
MD5 | 7029ee516a1905807c5dbe6a608ee144 |
|
BLAKE2b-256 | ee8b8381669a91a04834c5111e0ff1d56efb5c2779ba6e7410678f4ee4799083 |