Skip to main content

Wrapper for nicely displaying progress bars for langchain embedding components when using multiprocessing or ray.

Project description

Langchain Progress

A module that adds a context manager to wrap lanchain embedding elements to better handle progress bars. This is particularly useful when using ray or multiprocessing to use a single progress bar across all remotes/processes

Installing

The library can be installed as python package from this repo (will be released on PyPi in the future):

pip install git+https://github.com/wrmthorne/langchain-progress

How to Use

This context manager can be used in a single-process or across a distributed process such as ray to display the process of generating embeddings using langchain. The ProgressManager context manager requires that a langchain embedding object be provided and optionally accepts a progress bar. If no progress bar is provided, a new progress bar will be created using tqdm. An important note is that if using show_progress=True when instantiating an embeddings object, any internal progress bar created within that class will be replaced with one from langchain-progress.

The following is a simple example of passing an existing progress bar and depending on the automatically generated progress bar.

from langchain_progress import ProgressManager

with ProgressManager(embeddings):
    result = FAISS.from_documents(docs, embeddings)

with ProgressManager(embeddings, pbar):
    result = FAISS.from_documents(docs, embeddings)

Ray Example

The real use-case for this context manager is when using ray or multiprocessing to improve embedding speed. If show_progress=True is enabled for embeddings objects, a new progress bar is created for each process. This causes fighting while drawing each individual progress bar, causing the progress bar to be redrawn for each update on each process. This approach also doesn't allow us to report to a single progress bar across all remotes for a unified indication of progress. Using the ProgressManager context manager we can solve these problems:

from ray.experimental import tqdm_ray

@ray.remote(num_gpus=1)
def process_shard(shard, pbar):
    embeddings = HuggingFaceEmbeddings('sentence-transformers/all-MiniLM-L6-v2')
    with ProgressManager(embeddings, pbar):
        result = FAISS.from_documents(shard, embeddings)
    return result

# Create ray progress bar
remote_tqdm = ray.remote(tqdm_ray.tqdm)
pbar = remote_tqdm.remote(total=len(docs))

doc_shards = np.array_split(docs, num_shards)
vectors = ray.get([process_shard.remote(shard, pbar) for shard in doc_shards])

pbar.close.remote()

Tests

To run the test suite, you can run the following command from the root directory. Tests will be skipped if the required optional libraries are not installed:

python -m unittest

Limitations

This wrapper cannot create progress bars for any API based embedding tool such as HuggingFaceInferenceAPIEmbeddings as it relies on wrapping the texts supplied to the embeddings method. This obviously can't be done when querying a remote API. This module also doesn't currently support all of langchain's embedding classes. If your embedding class isn't yet supported, please open an issue and I'll take a look when I get time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_progress-0.0.0.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

langchain_progress-0.0.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_progress-0.0.0.tar.gz.

File metadata

  • Download URL: langchain_progress-0.0.0.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for langchain_progress-0.0.0.tar.gz
Algorithm Hash digest
SHA256 bb32de4b8aa25ca3fad405c8d5879c6c9fd8750ca753daf04906a97d03105fce
MD5 6405f6bbb4770bdb7d4ecbc2c194a01a
BLAKE2b-256 b86b7d1b811ab26b8bb318ee2671c43d4a241c30564c4c4ebdf6aebb8c7e862e

See more details on using hashes here.

File details

Details for the file langchain_progress-0.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_progress-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d90703aba92b223a6cd6668b2f0150aff6700b0731567c4cef21a62ef7e26cd
MD5 fb62561372a371f6dc8c5318e05a0b8b
BLAKE2b-256 879cdea1ffa9d1dd21a4e92bd7275e4ecd44ab25a8c3d5ba7d91019925df8550

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page