Skip to main content

Python module that is capable of splitting files and merging it back.

Project description

https://badge.fury.io/py/filesplit.png

filesplit

File splitting and merging made easy for python programmers!

This module
  • Can split files of any size into multiple chunks and also merge them back.

  • Can handle both structured and unstructured files.

System Requirements

Operating System: Windows/Linux/Mac

Python version: 3.x.x

Installation

The module is available as a part of PyPI and can be easily installed using pip

pip install filesplit

Split

Create an instance

from filesplit.split import Split

split = Split(inputfile: str, outputdir: str)

inputfile (str, Required) - Path to the original file.

outputdir (str, Required) - Output directory path to write the file splits.

With the instance created, the following methods can be used on the instance

bysize (size: int, newline: Optional[bool] = False, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by size.

Args:

size (int, Required): Max size in bytes that is allowed in each split.

newline (bool, Optional): Setting this to True will not produce any incomplete lines in each split. Defaults to False.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

bylinecount(self, linecount: int, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by line count.

Args:

linecount (int, Required): Max lines that is allowed in each split.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

The file splits are generated in this fashion [original_filename]_0001.ext, [original_filename]_0002.ext, .., [original_filename]_n.ext.

A manifest file is also created in the output directory to keep track of the file splits. This manifest file is required for merge operation.

Moreover,
  • The delimiter for the generated splits can be changed by setting splitdelimiter property like split.splitdelimiter='$'. Default is _ (underscore).

  • The number of zero fill digits for the generated splits can be changed by setting splitzerofill property like split.splitzerofill=10. Default is 4.

  • The manifest file name for the generated splits can be changed by setting manfilename property like split.manfilename='man'. Default is manifest.

  • To forcefully and safely terminate the process set the property terminate to True while the process is running.

Merge

Create an instance

from filesplit.merge import Merge

merge = Merge(inputdir: str, outputdir: str, outputfilename: str)

inputdir (str, Required) - Path to the directory containing file splits.

outputdir (str, Required) - Output directory path to write the merged file.

outputfilename (str, Required) - Name to use for the merged file.

With the instance created, the following method can be used on the instance

merge(cleanup: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Merges the split files back into one single file.

Args:

cleanup (bool, Optional): If True, all the split files and manifest file will be purged after successful merge. Defaults to False.

callback (Callable, Optional): Callback function to invoke after merge. The callback function should accept two arguments [func (str, int)] - full path to the merged file, merged file size (bytes). Defaults to None.

Returns:

None

Moreover,
  • The manifest file name can be changed by setting manfilename property like merge.manfilename='man'. The manifest file name should match with the one used during the file split process and should be available in the same directory as that of file splits. Default is manifest.

  • To forcefully and safely terminate the process set the property terminate to True while the process is running.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filesplit-4.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

filesplit-4.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file filesplit-4.1.0.tar.gz.

File metadata

  • Download URL: filesplit-4.1.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for filesplit-4.1.0.tar.gz
Algorithm Hash digest
SHA256 1aceb3a8bea84743254683e6b97056aa24593783f3b7e35dac10bac706e184b3
MD5 e652e13dd8e8d30117694ba054e6ffc4
BLAKE2b-256 0f1739439b12d77c4ca76e795832b3d3209609b58bc5a0a375630e271b5d7b88

See more details on using hashes here.

File details

Details for the file filesplit-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: filesplit-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for filesplit-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5244718d37302b5741a7ffe11e7379bd178bcf31d8350632be200ba94c74a12c
MD5 7029ee516a1905807c5dbe6a608ee144
BLAKE2b-256 ee8b8381669a91a04834c5111e0ff1d56efb5c2779ba6e7410678f4ee4799083

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page