A class to handle and process multiple files with identical structures within a directory.
Project description
scify-file-reader
The scify-file-reader package provides a convenient class for handling multiple files with the same structure in a directory. It offers functionality to read and process data from various file types, including CSV, XLSX, Parquet, and JSON.
Installation
You can install scify-file-reader using pip:
pip install scify-file-reader
Usage
To use scify-file-reader, follow these steps:
- Import the
FileReader
class:
from scify_file_reader import FileReader
- Create an instance of the FileReader class, providing the content you want to read. The content can be a string representing a
file path
, aPath
object, or azipfile.ZipFile
object:
content = 'path/to/directory'
reader = FileReader(content)
- Read the files using the read_files method:
data = reader.read_files()
The read_files
method returns a dictionary where the keys are the filenames (without the extension) and the values are pandas DataFrames containing the file data.
For more details on the available methods and parameters, refer to the package documentation.
Examples:
Here's an example that demonstrates how to use scify-file-reader:
Normal Output
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Supomos que temos estes arquivos dentro do nosso diretório
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
data = reader.read_files() # read_files accept kwargs from pandas read_ methods
"""
OUTPUT: print(data)
{
'file_1.csv': <pd.DataFrame>,
'log_2.csv': <pd.DataFrame>,
'test_3.csv': <pd.DataFrame>,
'file_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'log_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'test_%Y%m%d%H%M%S.csv': <pd.DataFrame>,
'file_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,
'log_%Y%m%d_%H%M%S.csv': <pd.DataFrame>,
'test_%Y%m%d_%H%M%S.csv': <pd.DataFrame>
}
"""
Concatenating patterns:
Use this method when you need to concatenate multiple files with similar patterns into a single consolidated file.
E.g. In the last example, we demonstrate the use of scify-file-reader with a directory containing 9 files that follow common naming patterns, such as 'file', 'log', and 'test'. By joining these files, we can consolidate and analyze their data more effectively. Let's take a look at the example to understand how they are joined.
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Let's suppose we have these files inside our directory.
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
data = reader.read_files(join_prefixes=True) #
"""
OUTPUT: print(data)
{
'file': <pd.DataFrame>,
'log': <pd.DataFrame>,
'test': <pd.DataFrame>,
}
"""
Using a specific regular expression
In the example above, all files with common prefixes, such as file_1.csv
, file_%Y%m%d%H%M%S.csv
, and file_%Y%m%d_%H%M%S.csv
, were joined together under the file key in the output.
If you want to use a specific regular expression for filtering your files, you can follow these steps:
from scify_file_reader import FileReader
PATH = '/path/to/directory'
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
regex = '<some_regex>'
reader.set_prefix_file_pattern_regex(regex)
data = reader.read_files(join_prefixes=True)
By default the regular expression is ^([A-Z]+)_\d+
.
Speficic prefixes instead of regular expressions
If you prefer to use specific prefixes instead of regular expressions, you can utilize the join_custom_prefixes
argument. This argument accepts a tuple of prefixes that you want to join together.
from scify_file_reader import FileReader
PATH = '/path/to/directory'
"""
# Supomos que temos estes arquivos dentro do nosso diretório
print(os.listdir(PATH))
# OUT: ['file_1.csv'', 'log_2.csv', 'test_3.csv',
'file_%Y%m%d%H%M%S.csv', 'log_%Y%m%d%H%M%S.csv', 'test_%Y%m%d%H%M%S.csv',
'file_%Y%m%d_%H%M%S.csv', 'log_%Y%m%d_%H%M%S.csv', 'test_%Y%m%d_%H%M%S.csv',
"""
# Example: Reading files from a directory
reader = FileReader('/path/to/directory')
specific_prefixes = ('file', 'log', 'test')
data = reader.read_files(join_prefixes=True)
"""
OUTPUT: print(data)
{
'file': <pd.DataFrame>,
'log': <pd.DataFrame>,
'test': <pd.DataFrame>,
}
"""
Contributing
Contributions are welcome! If you have any suggestions, bug reports, or feature requests, please open an issue or submit a pull request on the scify-file-reader repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scify-file-reader-0.0.2.tar.gz
.
File metadata
- Download URL: scify-file-reader-0.0.2.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbe78a53ad765f44f6ff8caa97874f8e24a42e8968a56af786d044a7d56299fa |
|
MD5 | 5bc6f8e41df7c087e935c8d77ab45c12 |
|
BLAKE2b-256 | 83a55f204d5864fa2b36537ff8b0457dbe01d43d2aedaffcc7e526d5a5be7bb1 |
File details
Details for the file scify_file_reader-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: scify_file_reader-0.0.2-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d6d322f8f37f671aebac0914025c4fc9ed26950a2b2efece032a12c6ed8c219 |
|
MD5 | 18d198d5280a51ade1458fc4961c9bf4 |
|
BLAKE2b-256 | 7bf9df411496690062caea8366e61942161b375412c600acaafcd80f2974e15b |