Skip to main content

Split & Merge utilities for large csv files.

Project description

Split-Merge Package

Splitting a large CSV file into multiple small csv files for better processing using Split features at your local disk & Merge will merge back to small files into one large file. This is a first sample version.

Limitations

Split

As of now, this will create splitted file with the extension known as "splitted". Make sure that your original file should not contain the same naming pattern.

Your source file name for example - customer_addr_20180112.csv

Your split file name will will be given below:

1__customer_addr_20180112__splitted_.csv
2__customer_addr_20180112__splitted_.csv
....
N__customer_addr_20180112__splitted_.csv

Where N would be any number based on the size of the file. Bye default, each chunk will contain at least 30000 or less number of records.

Merge

For merge, it will pick-up from the temp directory as mentioned in the bottom of this page. And, it will create a final merged file with this kind of naming convention -

customer_addr_20180112_.csv

Final, file will be places under process directory. Please refer the bottom of this page for directory structure & placement of the main calling file.

This package requires pandas & regular expression package to be installed in your python environment.

Sample Code to use this library. You can name it as ->


                     callSplitMergeFiles.py

from SplitMerge.clsSplitFiles import clsSplitFiles
from SplitMerge.clsMergeFiles import clsMergeFiles
import re
import platform as pl
import os

def main():
    print("Calling the custom Package for large file splitting..")
    os_det = pl.system()

    print("Running on :", os_det)

    ###############################################################
    ###### User Input based on Windows OS                  ########
    ###############################################################

    srcF = str(input("Please enter the file name with extension:"))
    base_name = re.sub(r'[0-9]','', srcF)
    srcFileInit = base_name[:-5]

    if os_det == "Windows":
        subdir = "\\temp\\"
        path = os.path.dirname(os.path.realpath(__file__)) + "\\"
    else:
        subdir = "/temp/"
        path = os.path.dirname(os.path.realpath(__file__)) + '/'

    ###############################################################
    ###### End Of User Input                                 ######
    ###############################################################

    ###############################################################
    ######             Begining of Split Process             ######
    ###############################################################

    t = clsSplitFiles(srcF, path, subdir)
    ret_val = t.split_files()

    if ret_val == 0:
        print("Splitting Successful!")
    else:
        print("Splitting Failure!")

    print("-"*30)

    ###############################################################
    ######               End of Split Process                ######
    ###############################################################

    print("Finally, Merging small splitted files to make the same big file!")

    ###############################################################
    ######             Begining of Merge Process             ######
    ###############################################################

    y = clsMergeFiles(srcFileInit, path)
    ret_val1 = y.merge_file()

    if ret_val1 == 0:
        print("Merge Successful!")
    else:
        print("Merge Failure!")

    print("-"*30)

    ###############################################################
    ######               End of Merge Process                ######
    ###############################################################


if __name__ == "__main__":
    main()

             End Of Sample Code - callSplitMergeFiles.py

Bug Fix: 1. Module loading issue fixed. 2. Source & Target directory as per developer's choice. Dependancy Package: You need to install followig packages in order to run this package -

                pip install pandas
                pip install regex

Directory Structure shoould be like ->

-> \callSplitMergeFiles.py
-> \process\
-> \src_file\
-> \temp\


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SplitMerge-0.0.2.post3.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

SplitMerge-0.0.2.post3-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file SplitMerge-0.0.2.post3.tar.gz.

File metadata

  • Download URL: SplitMerge-0.0.2.post3.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for SplitMerge-0.0.2.post3.tar.gz
Algorithm Hash digest
SHA256 b36a1e92b7409c6b36f7b50cf1f5be5b9b2a04537778b8b8347fbcc71ee8c44d
MD5 3108bc04a5a2087f7afb20cfe39380c2
BLAKE2b-256 ca2d001581b0b1d0b30256042e02440a20cd635553af81983f893000d35e3c05

See more details on using hashes here.

File details

Details for the file SplitMerge-0.0.2.post3-py3-none-any.whl.

File metadata

  • Download URL: SplitMerge-0.0.2.post3-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.2

File hashes

Hashes for SplitMerge-0.0.2.post3-py3-none-any.whl
Algorithm Hash digest
SHA256 3084641076ba698491cb004e8d94654a65e65c7fa28823f79b3ce39d5cfafa81
MD5 21d23a85f5957e9c7b89b27acdd02126
BLAKE2b-256 0e46cf57068194cca6ced34d09072fdf2fca98f4a1876ec37267b1bb4ed94da1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page