Skip to main content

Refiner is a python package that allows you to store text as vectors in Pinecone and then search for similar text. It uses OpenAI to generate embeddings and then uses Pinecone to store and search for similar text.

Project description

refiner-python

The Refiner Python package can be used to convert and store text and metadata as vector embeddings. Embeddings are generated using OpenAI and stored as vectors in Pinecone. Stored embeddings can then be "queried" using the search method. Matched embeddings contain contextually relevant metadata that can be used for AI chatbots, semantic text search APIs, etc.

Installation

pip install refiner

OpenAI and Pinecone API Keys.

You'll need API keys for OpenAI and Pinecone.

Once you have your API keys, you can either set local ENV variables in a shell:

export PINECONE_API_KEY="API_KEY"
export PINECONE_ENVIRONMENT_NAME="ENV_NAME"
export OPENAI_API_KEY="API_KEY"

or you can create a .env (dotenv) config file and pass in the file path when initializing the Embeddings class:

from refiner.embeddings import Embeddings
embeddings_client = Embeddings(config_file="/path/to/.env")

Your .env file should follow key/value format:

PINECONE_API_KEY="API_KEY"
PINECONE_ENVIRONMENT_NAME="ENV_NAME"
OPENAI_API_KEY="API_KEY"

Create Embeddings

from refiner.embeddings import Embeddings
embeddings_client = Embeddings(config_file="/path/to/.env")
payload = [(
    "index-id", # an id for your embedding
    "Example text to embed", # some text to embed
    {"key": "value"} # add any metadata you want here
)]
embeddings_client.create(payload, "example-index")
# {'upserted_count': 1}

Semantic Search

embeddings_client = Embeddings(config_file="/path/to/.env")
limit = 10
embeddings_client.search('search text', 'index-id', limit)
# {'matches': [...]}

Loaders

from loaders import Loaders
url = "https://news.yahoo.com/"
loaders = Loaders()
data = loaders.get_document_from_url(url)
# [
#  Document {
#    pageContent:

Transformers

from e import Embeddings
from transformers import Transformers
embeddings_client = Embeddings(config_file="/path/to/.env")
transformers = Transformers()
data = transformers.split_text(data, chunk_size=500, chunk_overlap=0)
vectors = []
for i, index in data:
    embeddings = openai_client.create_embeddings(i[1])
    vector = (
        str(index),
        embeddings,
        {"page_content": i[1], "source": url},
    )
    vectors.append(vector)
created = embeddings_client.create(vectors, "test-index");
# { upsertedCount: 251 }

Document Chatbot Example

question = "what are the top news stories today?"
results = e.search(question, "test-index", 10)
document = ''
for result in results["matches"]:
    document += result['metadata']['page_content']

prompt = '''
  Q: {}\n
  Using this document answer the question as a friendly chatbot that knows about the details in the document.
  You can answer questions only with the information in the documents you've been trained on.
  {}\n
  A:
  '''.format(question, document)

payload = {
    "model": "text-davinci-003",
    "prompt": prompt,
    "max_tokens": 50,
    "temperature": 0,
    "stream": True
}

response = openai_client.create_completion(payload)
for resp in response:
    print(resp.choices[0].text)

CLI

You can install the CLI to create and search your vectors.

pip install refiner-cli

The --help option can be used to learn about the create and search commands.

refiner --help
refiner create --help
refiner search --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refiner-0.0.7.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refiner-0.0.7-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file refiner-0.0.7.tar.gz.

File metadata

  • Download URL: refiner-0.0.7.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for refiner-0.0.7.tar.gz
Algorithm Hash digest
SHA256 4abdc1430d540b76b635bfe31bf3be93ec05af93b39c78f832bcf682d96cca32
MD5 d2671da5413565028fe2f643134638f2
BLAKE2b-256 ea207111d81ad68de5f04d546b2e5d427f2404b881e3b8a3ec60585854ed97b7

See more details on using hashes here.

File details

Details for the file refiner-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: refiner-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for refiner-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b7dd5b29562d964bad0fc99499cd42f4fb306808337c6cd7d7f55180100b0c44
MD5 0d39dfe8caae03c1fc6f4361c862a144
BLAKE2b-256 7fc9f48b0ab09efa421fda70201399b43bb1cbe4bad1ae1d13aeb98d6df834a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page