No project description provided
Project description
PipelineProcessor
Overview
Framework for processing text pipeline using classes and objects
Installation
poetry installation
poetry add PipelineProcessor
Usage
Use in command line
#Usage: PipelineProcessor [OPTIONS] INPUT_FILENAME YML_PATH
#Arguments:
# INPUT_FILENAME [required]
# YML_PATH [required]
#
#Options:
# --additional-function-path TEXT
# --output-filename TEXT
# --install-completion [bash|zsh|fish|powershell|pwsh]
# Install completion for the specified shell.
# --show-completion [bash|zsh|fish|powershell|pwsh]
# Show completion for the specified shell, to
# copy it or customize the installation.
# --help Show this message and exit.
# example usage
PipelineProcessor /path/to/input /path/to/pipeline --additional-function-path /path/to/additional_functions --output-filename /path/to/output
Sample pipeline.yml file
pipeline:
- stream_lower_case
- coalesce_empty_lines
- stream_uk_to_us
- break_lines:
kwargs: # Use a kwargs dictionary to pass arguments
max_length: 25
- number_the_lines
- stream_capitalized
- custom_function
List of available functions
number_the_lines: adds the line number as a prefix to each line.coalesce_empty_lines: removes multiple empty lines and produces only one empty line.remove_empty_lines: removes any empty lines.remove_even_lines: removes all the even numbered linesbreak_lines: breaks up a single long line into short (default 20) lines.Stream_remove_stop_words: Removes all the words "a", "an", "the", "and" "or" assuming that the text does not contain any punctuation -- and the words are separated by just simple space.stream_capitalize: Capitalizes the words in the file.stream_fetch_geo_ip: Collect city, region, and country in comma separated values from https://ipinfo.io/{ip_number}/geo .stream_upper_case: Uppercase the words in the file.stream_lower_case: Just like upper case, but lower case.stream_uk_to_us: Use regular expressions to convert any word ending with sation to zation, in lower case.
Example of custom function creation
Example of custom function
#additional function
from typing import Iterator
def custom_function(lines: Iterator[str]) -> Iterator[str]:
for line in lines:
new_line = line.strip() + " <new custom string> \n"
yield new_line
Extension
If you want to extend the functionality, you can add or update with your custom function. Example:
# In your main module
# Instantiate logger
# Instantiate FileHandler, Processor and FunctionRepositories
import logging
from logging import Logger
from PipelineProcessor.StreamFunctionRepository import StreamFunctionRepository
from PipelineProcessor.FileHandler import FileHandler
from PipelineProcessor.Processor import Processor
from PipelineProcessor.BasicStreamFunctionRepository import BasicStreamBasicFunctionRepository
from PipelineProcessor.YmlConfigLoader import YmlConfigLoader
# logging
logger: Logger = logging.getLogger()
# instantiate File handler
file_handler: FileHandler = FileHandler(logger=logger, input_filename='input/file/path', output_filename='output/file/path')
# instantiate Yml Config Loader
yml_config_loader: YmlConfigLoader = YmlConfigLoader(logger=logger, yml_path='yml/path')
# instantiate Function repositories
stream_repository: StreamFunctionRepository = StreamFunctionRepository()
extended_stream_repository: BasicStreamBasicFunctionRepository = BasicStreamBasicFunctionRepository()
# instantiate processor
processor: Processor = Processor(logger=logger, io_handler=file_handler,
config_loader=yml_config_loader,
function_repositories=[stream_repository, extended_stream_repository])
# call processor
processor.stream_process(additional_function_path='additional/function/path')
#additional function
from typing import Iterator
def custom_function(lines: Iterator[str]) -> Iterator[str]:
for line in lines:
new_line = line.strip() + " <new custom string> \n"
yield new_line
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipelineprocessor-0.1.1.tar.gz.
File metadata
- Download URL: pipelineprocessor-0.1.1.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.7 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1834396af26fa4cc8ad0e6346b032ba545e48b32f9843265f8217cbd31ccf365
|
|
| MD5 |
5703909ce35eac2c34f6c86d718171da
|
|
| BLAKE2b-256 |
c8007d32d7a9efb4ba7eb2a6ab53add4e76c5375ad2a5686fd0c36fece03ec60
|
File details
Details for the file pipelineprocessor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pipelineprocessor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.11.7 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d78f498259000a3260500bfb21353de81a3ae1fd3a4b2192ee12206d9730f61c
|
|
| MD5 |
35e9348bcd401d5687aba113d3ccebb6
|
|
| BLAKE2b-256 |
80909783f5fc13c2b21bac9bdf4011bb5fc18772275ae90cb9761541bbe57e3e
|