Skip to main content

Module to split file of any size into multiple chunks

Project description

File split made easy for python programmers!

A python module that can split files of any size into multiple chunks, with optimum use of memory and without compromising on performance. The module determines the splits based on the new line character in the file, therefore not writing incomplete lines to the file splits. The file splits are numbered from 1 to n as follows

[filename]_1.ext, [filename]_2.ext, …., [filename]_n.ext

System Requirements

Operating System: Windows/Linux/Mac

Python version: Python 3

Usage

The module is available as a part of PyPI and can be easily installed using pip

pip install filesplit

Create an instance of the FileSplit object by passing file path and split size as arguments.

from fsplit.filesplit import FileSplit

fs = FileSplit(file='path/to/file', splitsize=500000000, output_dir='/path/to/output directory/')
  • “file” and “splitsize” are required. “output_dir” is optional and defaults to current directory.

  • “splitsize” should be given in bytes.

With the instance created, any of the following methods can be invoked

split (include_header=False, callback=None)

Method that splits the file into multiple chunks. This method works in binary mode under the hood which keeps the formatting and encoding of splits as-is to that of the source which should be sufficient to handle any file types.

fs.split()

In case, if the file contains a header and if you want the header to be available in all of your splits, you can optionally set the flag “include_header” to True. By default it is set to False.

fs.split(include_header=True)

Also, you can pass a callback function (optional) [func (str, long, long)] that accepts three arguments - full path to the split, split file size (bytes) and line count. The callback function will be called after each file split.

def func(f, s, c):
    print("file: {0}, size: {1}, count: {2}".format(f, s, c))

fs.split(callback=func)

splitbyencoding (rencoding=“utf-8”, wencoding=“utf-8”, include_header=False, callback=None)

This method is similar to the above split() method, except that the file encoding of the splits can be explicitly specified. This is helpful if the file chunks has to be of specific encoding standard. This method accepts two additional arguments to that of the split() method

  • “rencoding” - encoding of the source file (default : ‘utf-8’)

  • “wencoding” - encoding of the output file chunks (default: ‘utf-8’)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filesplit-2.0.0.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file filesplit-2.0.0.tar.gz.

File metadata

  • Download URL: filesplit-2.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for filesplit-2.0.0.tar.gz
Algorithm Hash digest
SHA256 d048fae66bdb795de8fcde88a5f1d2e7614d24c35e2ef8995b8ac009abccfb61
MD5 59965de1800bcac94e77bb414ebd4288
BLAKE2b-256 2e18d39475047fe432a4404f7a3a726183d4d70cee55c78ca6dfccc25304b2ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page