Skip to main content

PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval augmented systems(RAG)

Project description

PineconeUtils

PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval systems(RAG).

Features

  • Load text data from .txt, .docx, and .pdf files.
  • Chunk text data for processing.
  • Prepare embeddings using either Cohere or OpenAI models.
  • Upsert prepared data into a Pinecone index.

Installation

To install PineconeUtils, you can use pip:

pip install pineconeutils

Usage

Here's a quick example of how to use PineconeUtils:

Setup

First, ensure you have the necessary API keys and setup information:

pinecone_api_key = "your_pinecone_api_key"
cohere_api_key = "your_cohere_api_key"
openai_api_key = "your_openai_api_key"
index_name = "your_index_name"
namespace_id = "your_namespace_id"

Load Data

Load data from a supported file format:

from pineconeutils import PineconeUtils

# Create instance of PineconeUtils
pinecone = PineconeUtils(pinecone_api_key=pinecone_api_key, openai_api_key=openai_api_key,cohere_api_key =cohere_api_key, index_name=index_name, namespace_id=namespace_id)

path = "path_to_your_file.docx"
data = pinecone.load_data(path)
print("Loaded Data:", data)

Process Data

Chunk and prepare data for embedding:

For openai

chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)

prepared_data = pinecone.prepare_data(chunks, model="text-embedding-ada-002", service="openai")

For cohere

chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)

prepared_data = pinecone.prepare_data(chunks, model="embed-english-v3.0", service="cohere",input_type="search_document")

For more about Cohere Embeddings: Cohere Embeddings

Upsert Data

Upsert data into Pinecone index:

successful = pinecone.upsert_data(prepared_data)
print("Data upsertion was", "successful" if successful else "unsuccessful")

Development

To contribute to the development of PineconeUtils, you can clone the repository and submit pull requests.

Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pineconeutils-0.0.4.tar.gz (7.3 kB view hashes)

Uploaded Source

Built Distribution

pineconeutils-0.0.4-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page