Skip to main content

Python module that is capable of splitting files and merging it back.

Project description

https://badge.fury.io/py/filesplit.png

filesplit

File splitting and merging made easy for python programmers!

This module
  • Can split files of any size into multiple chunks and also merge them back.

  • Can handle both structured and unstructured files.

System Requirements

Operating System: Windows/Linux/Mac

Python version: 3.x.x

Installation

The module is available as a part of PyPI and can be easily installed using pip

pip install filesplit

Split

Create an instance

from filesplit.split import Split

split = Split(inputfile: str, outputdir: str)

inputfile (str, Required) - Path to the original file.

outputdir (str, Required) - Output directory path to write the file splits.

With the instance created, the following methods can be used on the instance

bysize (size: int, newline: Optional[bool] = False, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by size.

Args:

size (int, Required): Max size in bytes that is allowed in each split.

newline (bool, Optional): Setting this to True will not produce any any incomplete lines in each split. Defaults to False.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

bylinecount(self, linecount: int, includeheader: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Splits file by line count.

Args:

linecount (int, Required): Max lines that is allowed in each split.

includeheader (bool, Optional): Setting this to True will include header in each split. The first line is treated as a header. Defaults to False.

callback (Callable, Optional): Callback function to invoke after each split. The callback function should accept two arguments [func (str, int)] - full path to the split file, split file size (bytes). Defaults to None.

Returns:

None

The file splits are generated in this fashion [original_filename]_1.ext, [original_filename]_2.ext, .., [original_filename]_n.ext.

A manifest file is also created in the output directory to keep track of the file splits. This manifest file is required for merge operation.

Moreover,
  • The delimiter for the generated splits can be changed by setting splitdelimiter property like split.splitdelimiter='$'. Default is _ (underscore).

  • The manifest file name for the generated splits can be changed by setting manfilename property like split.manfilename='man'. Default is manifest.

  • To forcefully and safely terminate the process set the property terminate to True while the process is running.

Merge

Create an instance

from filesplit.merge import Merge

merge = Merge(inputdir: str, outputdir: str, outputfilename: str)

inputdir (str, Required) - Path to the original file.

outputdir (str, Required) - Output directory path to write the file splits.

outputfilename (str, Required) - Name to use for the merged file.

With the instance created, the following method can be used on the instance

merge(cleanup: Optional[bool] = False, callback: Optional[Callable] = None) -> None

Merges the split files back into one single file.

Args:

cleanup (bool, Optional): If True, all the split files and manifest file will be purged after successful merge. Defaults to False.

callback (Callable, Optional): Callback function to invoke after merge. The callback function should accept two arguments [func (str, int)] - full path to the merged file, merged file size (bytes). Defaults to None.

Returns:

None

Moreover,
  • The manifest file name can be changed by setting manfilename property like merge.manfilename='man'. The manifest file name should match with the one used during the file split process and should be available in the same directory as that of file splits. Default is manifest.

  • To forcefully and safely terminate the process set the property terminate to True while the process is running.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filesplit-4.0.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

filesplit-4.0.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file filesplit-4.0.0.tar.gz.

File metadata

  • Download URL: filesplit-4.0.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.0

File hashes

Hashes for filesplit-4.0.0.tar.gz
Algorithm Hash digest
SHA256 e8566c1a5321731a4a370684d24a4c73dc20d8bc0d57f0178a7b7d4cabb4334a
MD5 53dfbb50bf61b8ae71440290c777318b
BLAKE2b-256 b40327c4701295047a972a03df2e0a98626ba38165c0158f8c1d45a05b0fb2ea

See more details on using hashes here.

File details

Details for the file filesplit-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: filesplit-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.0

File hashes

Hashes for filesplit-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eccb31606298632371fceb2b2026323721b68dd4f685fe5754bae23ecba3f689
MD5 5480d79150897eaa34327e583741689f
BLAKE2b-256 e5bab9263366866073ceb5e8961fbc3d71d5186f23baf4a4ce3a9aebf2538feb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page