Skip to main content

Accelerate your processing pipeline

Project description

pipeline-turbo is a package that will accelerate your processing pipeline. It works with the multi-threading concept in the background. It has been successful in both CPU and GPU tasks.

The only pre-requisite is to load the function running for a single process and adjust the threads according to your resource availability.

turbo

Read more about threading here: https://www.activestate.com/blog/how-to-manage-threads-in-python/

Installation

Use the package manager pip to install pipeline-turbo

pip install pipeline-turbo

Example Usage

# let's get some data for processing
sentences = ["Nevertheless, Trump and other Republicans have tarred the protests as havens for terrorists intent on destroying property.", "Billie Eilish issues apology for mouthing an anti-Asian derogatory term in a resurfaced video.", "Christians should make clear that the perpetuation of objectionable vaccines and the lack of alternatives is a kind of coercion.", "There have been a protest by a group of people", "While emphasizing he’s not singling out either party, Cohen warned about the danger of normalizing white supremacist ideology."]

sentences = sentences * 100

# Create your process - Here is an example of running a bias detection model across few sentences
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("d4data/bias-detection-model")
model = TFAutoModelForSequenceClassification.from_pretrained("d4data/bias-detection-model")
classifier = pipeline('text-classification', model=model, tokenizer=tokenizer) 

def bias_classification(text):
    out = classifier(text)
    #label_ = out[0]['label']
    #probability_ = out[0]['score']
    
    return out 

# without turbo, looping across all the sentences - Normal Method
out_list = []
for sent in sentences:
    out = bias_classification(sent)
    out_list.append(out)

# with turbo, call the turbo_threading function
"""
1. Each of the item in 'sentences' list has to be iterated and that has to be defined as the first argument
2. It should be followed by the function and its other arguments (if there are additional arguments for the function)
3. Define the thread based on your resource availability (5, 10 would be ideal based on your resources)
"""
from pipeline_turbo.turbo import turbo_threading # import the turbo threading function which does the magic
turbo_out = turbo_threading(sentences,bias_classification, num_threads=5)

"""
Note: You can pass any number of arguments inside the function, but the iterable list has to be defined first
The performance varies based on the processing speed of your machine/compute
"""

About

This package is created by Deepak John Reji, Afreen Aman. It was first used to speed up some deep learning pipeline projects and later made it open source. This can be used for normal CPU process as well.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipeline_turbo-0.0.6.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

pipeline_turbo-0.0.6-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file pipeline_turbo-0.0.6.tar.gz.

File metadata

  • Download URL: pipeline_turbo-0.0.6.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for pipeline_turbo-0.0.6.tar.gz
Algorithm Hash digest
SHA256 0d1c854f4cd801718fe606e046ed5adf3ec10317bdce287d054cc81755225c83
MD5 05942b3fff3ac35ad71842191f37360a
BLAKE2b-256 08bef9345f8a2f17c44dd42dd739d5c2b76769d72b44abd971f38896677fcb54

See more details on using hashes here.

File details

Details for the file pipeline_turbo-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for pipeline_turbo-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ee617fee736d1f030d9f312b5c112fd31931c161f74e040b1134bcdc206dae84
MD5 f74707b1c5eb40f5f2f81b1f4a5b611b
BLAKE2b-256 713835f753df0b32085d1b5076521ff89afd0f829eeab8b9fc82bbc197db81f1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page