Skip to main content

Split & Merge utilities for large csv files.

Project description

Split-Merge Package

Splitting a large CSV file into multiple small csv files for better processing using Split features at your local disk & Merge will merge back to small files into one large file. This is a first sample version.

Limitations

As of now, this will create splitted file with the extension known as "splitted". Make sure that your original file should not contain the same naming pattern.

Your source file name for example - addr_det_20190101.csv Your split file name will will be given below: 1__addr_det_20180112__splitted_.csv 2__addr_det_20180112__splitted_.csv N__addr_det_20180112__splitted_.csv Where N would be any number based on the size of the file. Bye default, each chunk will contain at least 30000 or less number of records.

This requires pandas & regular expression package installed in your python environment.

Sample Code to use this library. You can name it as ->


                     callSplitMergeFiles.py

import clsSplitFiles as t
import clsMergeFiles as cm
import re
import platform as pl
import os

def main():
    print("Calling the custom Package for large file splitting..")
    os_det = pl.system()

    print("Running on :", os_det)

    ###############################################################
    ###### User Input based on Windows OS                  ########
    ###############################################################

    srcF = str(input("Please enter the file name with extension:"))
    base_name = re.sub(r'[0-9]','', srcF)
    srcFileInit = base_name[:-5]

    if os_det == "Windows":
        subdir = "\\temp\\"
        path = os.path.dirname(os.path.realpath(__file__)) + "\\"
    else:
        subdir = "/temp/"
        path = os.path.dirname(os.path.realpath(__file__)) + '/'

    ###############################################################
    ###### End Of User Input                                 ######
    ###############################################################

    x = t.clsSplitFiles(srcF, path, subdir)

    ret_val = x.split_files()

    if ret_val == 0:
        print("Splitting Successful!")
    else:
        print("Splitting Failure!")

    print("-"*30)

    print("Finally, Merging small splitted files to make the same big file!")

    y = cm.clsMergeFiles(srcFileInit)

    ret_val1 = y.merge_file()

    if ret_val1 == 0:
        print("Merge Successful!")
    else:
        print("Merge Failure!")

    print("-"*30)



if __name__ == "__main__":
    main()

             End Of Sample Code - callSplitMergeFiles.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SplitMerge-0.0.1.post1.tar.gz (2.3 kB view details)

Uploaded Source

Built Distribution

SplitMerge-0.0.1.post1-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file SplitMerge-0.0.1.post1.tar.gz.

File metadata

  • Download URL: SplitMerge-0.0.1.post1.tar.gz
  • Upload date:
  • Size: 2.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for SplitMerge-0.0.1.post1.tar.gz
Algorithm Hash digest
SHA256 54ecfbf576f5ed93876944227b2e9d93a990f4ff1554400e8ed1fa36cb205d88
MD5 09518a512fe0443346eefbb727750a9f
BLAKE2b-256 50dc34d7c7bddecea1f897b662ce3beaad017a1f3fcbb97b989df7b98125d049

See more details on using hashes here.

File details

Details for the file SplitMerge-0.0.1.post1-py3-none-any.whl.

File metadata

  • Download URL: SplitMerge-0.0.1.post1-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.1

File hashes

Hashes for SplitMerge-0.0.1.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 579462ca2a7a457767bc02e5a8d806bf810511f3410a55ffd839f9597d98687d
MD5 00f5dc070f979593011e79af4b7296e4
BLAKE2b-256 da93e0fca5c00d11f1d865eeb76d64beb6797e484d21f3f2797ee159b5c7f541

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page