Skip to main content

PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval augmented systems(RAG)

Project description

PineconeUtils

PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval systems(RAG).

Features

  • Load text data from .txt, .docx, and .pdf files.
  • Chunk text data for processing.
  • Prepare embeddings using either Cohere or OpenAI models.
  • Upsert prepared data into a Pinecone index.

Installation

To install PineconeUtils, you can use pip:

pip install pineconeutils

Usage

Here's a quick example of how to use PineconeUtils:

Setup

First, ensure you have the necessary API keys and setup information:

pinecone_api_key = "your_pinecone_api_key"
cohere_api_key = "your_cohere_api_key"
openai_api_key = "your_openai_api_key"
index_name = "your_index_name"
namespace_id = "your_namespace_id"

Load Data

Load data from a supported file format:

from pineconeutils import PineconeUtils

# Create instance of PineconeUtils
pinecone = PineconeUtils(pinecone_api_key=pinecone_api_key, openai_api_key=openai_api_key,cohere_api_key =cohere_api_key, index_name=index_name, namespace_id=namespace_id)

path = "path_to_your_file.docx"
data = pinecone.load_data(path)
print("Loaded Data:", data)

Process Data

Chunk and prepare data for embedding:

For openai

chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)

prepared_data = pinecone.prepare_data(chunks, model="text-embedding-ada-002", service="openai")

For cohere

chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)

prepared_data = pinecone.prepare_data(chunks, model="embed-english-v3.0", service="cohere",input_type="search_document")

For more about Cohere Embeddings: Cohere Embeddings

Upsert Data

Upsert data into Pinecone index:

successful = pinecone.upsert_data(prepared_data)
print("Data upsertion was", "successful" if successful else "unsuccessful")

Development

To contribute to the development of PineconeUtils, you can clone the repository and submit pull requests.

Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pineconeutils-0.0.4.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pineconeutils-0.0.4-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pineconeutils-0.0.4.tar.gz.

File metadata

  • Download URL: pineconeutils-0.0.4.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for pineconeutils-0.0.4.tar.gz
Algorithm Hash digest
SHA256 21b8a6cf6089869650c06824c6e53d234182de21b6f5063ca89f7ca159683ccb
MD5 b06e199fedb53d413bebf9c536b8d269
BLAKE2b-256 fa52a57b518653f62785ac4618fd57be3cc2714687472bba7e41fccd608d84b7

See more details on using hashes here.

File details

Details for the file pineconeutils-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: pineconeutils-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for pineconeutils-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6b464a5552de6d5e83b31e33604054bf7341dc6b31068ca4f2a5ade113ae70b5
MD5 80c3192e7943b2456acb5ce2deb4b117
BLAKE2b-256 13dff3b458bda18e1d266dcdd475efedf7f0aacf33448ab22ba06e5537d90926

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page