Skip to main content

No project description provided

Project description

PipelineProcessor

Overview

Framework for processing text pipeline using classes and objects

Installation

poetry installation

poetry add PipelineProcessor

Usage

Use in command line

#Usage: PipelineProcessor [OPTIONS] INPUT_FILENAME YML_PATH

#Arguments:
#  INPUT_FILENAME  [required]
#  YML_PATH        [required]
#
#Options:
#  --additional-function-path TEXT
#  --output-filename TEXT
#  --install-completion [bash|zsh|fish|powershell|pwsh]
#                                  Install completion for the specified shell.
#  --show-completion [bash|zsh|fish|powershell|pwsh]
#                                  Show completion for the specified shell, to
#                                  copy it or customize the installation.
#  --help                          Show this message and exit.

# example usage

PipelineProcessor  /path/to/input /path/to/pipeline --additional-function-path /path/to/additional_functions --output-filename /path/to/output

Sample pipeline.yml file

pipeline:
   - stream_lower_case
   - coalesce_empty_lines
   - stream_uk_to_us
   - break_lines:
      kwargs: # Use a kwargs dictionary to pass arguments
         max_length: 25
   - number_the_lines
   - stream_capitalized
   - custom_function

List of available functions

  1. number_the_lines : adds the line number as a prefix to each line.
  2. coalesce_empty_lines : removes multiple empty lines and produces only one empty line.
  3. remove_empty_lines : removes any empty lines.
  4. remove_even_lines : removes all the even numbered lines
  5. break_lines : breaks up a single long line into short (default 20) lines.
  6. Stream_remove_stop_words: Removes all the words "a", "an", "the", "and" "or" assuming that the text does not contain any punctuation -- and the words are separated by just simple space.
  7. stream_capitalize: Capitalizes the words in the file.
  8. stream_fetch_geo_ip: Collect city, region, and country in comma separated values from https://ipinfo.io/{ip_number}/geo .
  9. stream_upper_case: Uppercase the words in the file.
  10. stream_lower_case: Just like upper case, but lower case.
  11. stream_uk_to_us: Use regular expressions to convert any word ending with sation to zation, in lower case.

Example of custom function creation

Example of custom function

#additional function  
from typing import Iterator
def custom_function(lines: Iterator[str]) -> Iterator[str]:               
   for line in lines:                                                     
       new_line = line.strip() + " <new custom string> \n"                
       yield new_line 

Extension

If you want to extend the functionality, you can add or update with your custom function. Example:

# In your main module
# Instantiate logger
# Instantiate  FileHandler, Processor and FunctionRepositories
import logging
from logging import Logger
from PipelineProcessor.StreamFunctionRepository import StreamFunctionRepository
from PipelineProcessor.FileHandler import FileHandler
from PipelineProcessor.Processor import Processor
from PipelineProcessor.BasicStreamFunctionRepository import BasicStreamBasicFunctionRepository
from PipelineProcessor.YmlConfigLoader import YmlConfigLoader

# logging
logger: Logger = logging.getLogger()
# instantiate File handler
file_handler: FileHandler = FileHandler(logger=logger, input_filename='input/file/path', output_filename='output/file/path')
# instantiate Yml Config Loader
yml_config_loader: YmlConfigLoader = YmlConfigLoader(logger=logger, yml_path='yml/path')

# instantiate Function repositories
stream_repository: StreamFunctionRepository = StreamFunctionRepository()
extended_stream_repository: BasicStreamBasicFunctionRepository = BasicStreamBasicFunctionRepository()

# instantiate processor
processor: Processor = Processor(logger=logger, io_handler=file_handler,
                                 config_loader=yml_config_loader,
                                 function_repositories=[stream_repository, extended_stream_repository])

# call processor
processor.stream_process(additional_function_path='additional/function/path')
#additional function                                                      
from typing import Iterator                                               
                                                                          
def custom_function(lines: Iterator[str]) -> Iterator[str]:               
   for line in lines:                                                     
       new_line = line.strip() + " <new custom string> \n"                
       yield new_line                                                     

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelineprocessor-0.1.1.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelineprocessor-0.1.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file pipelineprocessor-0.1.1.tar.gz.

File metadata

  • Download URL: pipelineprocessor-0.1.1.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Windows/10

File hashes

Hashes for pipelineprocessor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1834396af26fa4cc8ad0e6346b032ba545e48b32f9843265f8217cbd31ccf365
MD5 5703909ce35eac2c34f6c86d718171da
BLAKE2b-256 c8007d32d7a9efb4ba7eb2a6ab53add4e76c5375ad2a5686fd0c36fece03ec60

See more details on using hashes here.

File details

Details for the file pipelineprocessor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pipelineprocessor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Windows/10

File hashes

Hashes for pipelineprocessor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d78f498259000a3260500bfb21353de81a3ae1fd3a4b2192ee12206d9730f61c
MD5 35e9348bcd401d5687aba113d3ccebb6
BLAKE2b-256 80909783f5fc13c2b21bac9bdf4011bb5fc18772275ae90cb9761541bbe57e3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page