Skip to main content

File search tool using OpenAI assistant.

Project description

File search tools using OpenAI Assistant

Work in progress, still trying polish a few features and getting some initial feedback.

Installation (pip)

pip install lumei

Usage

Example

The following is an example of processing a list of pdf files and extracting the vendor and price data from the files. The command requires an OpenAI API key which can be obtained from here https://platform.openai.com/account/api-keys.

lumei \
  --input-files ~/folder_1/*.pdf,~/folder_2/*.pdf \
  --output-file ~/output.json \
  --openai-api-key=<OPENAI_API_KEY> \
  --query="[
  	{'name': 'vendor', 'search': 'Name of the vendor who issued the invoice.'}, 
  	{'name': 'price', 'search': 'Total bill from the invoice.'},
  	{'name': 'file path', 'attribute': 'FILE_PATH'},
  	{
  	  'names': {'variable 1': 'var1', 'variable 2': 'var2' }, 
  	  'command': 'stat -f %SB %input_file_path% && var1=1 && var2=2'
  	}
  ]"

Input Parameters

--input-files

Source files to process on. Multiple files can be provided, and they are seperated by a comma "," character. File inputs can be expressed as a path to a single file or a regex.

--output-file

Path of the file that the results will be written to. Input must be a file path to a single file. Supported file formate are ".csv", ".xlsx", and ".json". Output file will only be written to when all results have been obtained.

--openai-api-key [Optional]

API key for OpenAI, necessary for file search functionalities. Key can be obtained from here https://platform.openai.com/account/api-keys.

Alternative way to provide the API key is to set it as the "OPENAI_API_KEY" environment variable.

--query

Name and the description of data to search for. Input should be an array of JSON objects. name is the name of the data to search for. Name of the data will be the column name for the result dataset. search is the description of the data to search for. attribute is a piece metadata related to the query, list of possible attributes can be found below. command is the output of a bash command. The command can reference the file path using the %input_file_path% variable. names is map of column names to environment variables names. This is only supported for commands.

Example:

[
    {
        'name': 'vendor', 
        'search': 'Name of the vendor who issued the invoice.'
    }, 
    {
        'name': 'price', 
        'search': 'Total bill from the invoice.'
    },
    {
        'name': 'file path', 
        'attribute': 'FILE_PATH'
    },
    {
        'names': {'variable 1': 'var1', 'variable 2': 'var2' }, 
        'command': 'stat -f %SB %input_file_path% && var1=1 && var2=2'
    }
]
Possible Attributes

FILE_PATH, START_TIMESTAMP, END_TIMESTAMP, START_DATETIME, END_DATETIME

Standalone Methods

openai_file_search

Example of using the file search method directly without CLI.

from lumei import openai_file_search
from typing import Optional

results: Optional[dict[str, str]] = openai_file_search(
  openai_api_key="<OPENAI_API_KEY>",
  input_file_path="~/example_invoice_file.pdf",
  file_search_query={
    "vendor": "Name of the vendor who issued the invoice.",
    "price": "Total bill from the invoice.",
  }
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lumei-0.6.0.tar.gz (46.6 kB view details)

Uploaded Source

Built Distribution

lumei-0.6.0-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file lumei-0.6.0.tar.gz.

File metadata

  • Download URL: lumei-0.6.0.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for lumei-0.6.0.tar.gz
Algorithm Hash digest
SHA256 559831e4e4b1e883dd14c059c81746353577be6b5b51c90495c22ec2c63a35de
MD5 19197b34609471def463770be31ee03d
BLAKE2b-256 a2e36b575ca122b4dc80940f7a6cb8cd4c2b5a99a153b4bd48143c353804bcd0

See more details on using hashes here.

File details

Details for the file lumei-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: lumei-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for lumei-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e7a5b6dbb7bb3c104d937d740d54d68472e9f9aeb7cb9083619ddc7d0d0444d
MD5 695919f785b5dbbe8a12eaacaa6d66b8
BLAKE2b-256 7d6da7e7b0971c5e5dbcdbfe17322d758c31f176197876265f8dcaab8921e13a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page