Skip to main content

A class to handle and process multiple files with identical structures within a directory.

Project description

scify-file-reader

The scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.

Installation

You can install scify-file-reader using pip:

pip install scify-file-reader

Usage

To use scify-file-reader, follow these steps:

  1. Import the FileReader class:
from scify_file_reader import FileReader
  1. Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a file path, a Path object, or a zipfile.ZipFile object:
content = 'path/to/directory'

reader = FileReader(content)
  1. Read the files using the read_files method:
data = reader.read_files()

The read_files method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.

For more details on the available methods and parameters, refer to the package documentation.

Examples:

Here's an example that demonstrates how to use scify-file-reader:

Normal Output

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Supomos que temos estes arquivos dentro do nosso diretório

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')

data = reader.read_files() # read_files accept kwargs from pandas read_ methods



"""

OUTPUT: print(data)

{

    'file_1.csv': <pd.DataFrame>,

    'log_2.csv': <pd.DataFrame>,

    'test_3.csv': <pd.DataFrame>,

    'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,

    'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,

    'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,

    'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>

}

"""

Concatenating patterns:

Use this method when you need to concatenate multiple files with similar patterns into a single consolidated file.

E.g. In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Let's suppose we have these files inside our directory.

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')

data = reader.read_files(join_prefixes=True) #



"""

OUTPUT: print(data)

{

    'file': <pd.DataFrame>,

    'log': <pd.DataFrame>,

    'test': <pd.DataFrame>,

}

"""

Using a specific regular expression

In the example above, all files with common prefixes, such as file_1.csv, file_%Y%m%d%H%M%S.csv, and file_%Y%m%d_%H%M%S.csv, were joined together under the file key in the output.

If you want to use a specific regular expression for filtering your files, you can follow these steps:

from scify_file_reader import FileReader



PATH = '/path/to/directory'



# Example: Reading files from a directory

reader = FileReader('/path/to/directory')



regex = '<some_regex>'

reader.set_prefix_file_pattern_regex(regex)



data = reader.read_files(join_prefixes=True) 

By default the regular expression is ^([A-Z]+)_\d+.

Speficic prefixes instead of regular expressions

If you prefer to use specific prefixes instead of regular expressions, you can utilize the join_custom_prefixes argument. This argument accepts a tuple of prefixes that you want to join together.

from scify_file_reader import FileReader



PATH = '/path/to/directory'



"""

# Supomos que temos estes arquivos dentro do nosso diretório

print(os.listdir(PATH))

# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',

        'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv', 

        'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv', 

"""





# Example: Reading files from a directory

reader = FileReader('/path/to/directory')



specific_prefixes = ('file', 'log', 'test')



data = reader.read_files(join_prefixes=True) 



"""

OUTPUT: print(data)

{

    'file': <pd.DataFrame>,

    'log': <pd.DataFrame>,

    'test': <pd.DataFrame>,

}

"""

Contributing

Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the scify-file-reader repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scify-file-reader-0.0.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

scify_file_reader-0.0.2-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file scify-file-reader-0.0.2.tar.gz.

File metadata

  • Download URL: scify-file-reader-0.0.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.1

File hashes

Hashes for scify-file-reader-0.0.2.tar.gz
Algorithm Hash digest
SHA256 fbe78a53ad765f44f6ff8caa97874f8e24a42e8968a56af786d044a7d56299fa
MD5 5bc6f8e41df7c087e935c8d77ab45c12
BLAKE2b-256 83a55f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1

See more details on using hashes here.

File details

Details for the file scify_file_reader-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for scify_file_reader-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5d6d322f8f37f671aebac0914025c4fc9ed26950a2b2efece032a12c6ed8c219
MD5 18d198d5280a51ade1458fc4961c9bf4
BLAKE2b-256 7bf9df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page