An integration package created by the company LOGYCA to interact with ChatGPT and analyze documents, files and other functionality of the OpenAI library.
Project description
LOGYCA public libraries
About us
LOGYCA public libraries: To interact with ChatGPT and analyze documents, files and other functionality of the OpenAI library.
Source code | Package (PyPI) | Samples
To interact with the examples, keep the following in mind
FastAPI example. Through Swagger, you can:
- https://github.com/logyca/python-libraries/tree/main/logyca-ai/samples/fastapi_async
- Use the example endpoints to obtain the input schemas for the post method and interact with the available parameters.
- Endpoint publishing is asynchronous of openai SDK.
- The model currently used is ChatGPT-4o, no other models have been tested so far.
- Currently the formats supported to receive files and extract the text to interact with artificial intelligence are: txt, csv, pdf, images, Microsoft (docx, xlsx).
Script example. Through of code, you can:
- https://github.com/logyca/python-libraries/tree/main/logyca-ai/samples/script_app_sync
- Examples shared with the example written in FastAPI.
- The examples use synchronous functionality of openai SDK.
- The model used is ChatGPT-4o for testing.
Environment variables documentation for example: fastapi_async
The examples are built in the Microsoft Azure OpenAI environment, and the variables to use are the following:
.env.sample
# Environment variables documentation:
# API_KEY:
# The general API key used for authentication with services. This key is typically used for accessing cloud-based or other API-driven platforms. Replace '***' with the actual key.
# AZURE_OPENAI_DEPLOYMENT:
# The name or identifier of the OpenAI deployment within Azure. This defines the specific model version and configuration you are using in Azure OpenAI Service. Set this to the name of the deployed model, such as 'chatgpt3.5-turbo-1106'.
# AZURE_OPENAI_ENDPOINT:
# The base URL of the Azure OpenAI Service endpoint. This is the URL where API requests are sent, typically formatted like 'https://<your-endpoint>.openai.azure.com/'.
# AZURE_OPENAI_MODEL_NAME:
# The name of the specific OpenAI model being used in Azure, for example, 'gpt-35-turbo'. This identifies which model variant will be used for processing requests.
# AZURE_OPENAI_MODEL_VERSION:
# The version of the OpenAI model deployed in Azure. This typically reflects updates or optimizations to the model, such as '1106' to indicate a version from November 6th.
# OPENAI_API_KEY:
# The API key provided by OpenAI directly (not through Azure). This is used to authenticate and access OpenAI services outside of Azure.
# OPENAI_API_VERSION:
# The version of the OpenAI API being used. This specifies the version of the API and its capabilities, for example, '2023-03-15-preview'. It dictates the available features and request format.
API_KEY=***
AZURE_OPENAI_DEPLOYMENT=***
AZURE_OPENAI_ENDPOINT=***
AZURE_OPENAI_MODEL_NAME=***
AZURE_OPENAI_MODEL_VERSION=***
OPENAI_API_KEY=***
OPENAI_API_VERSION=***
# Example
# API_KEY=CUSTOM_ABC
# AZURE_OPENAI_DEPLOYMENT=chat4omni
# AZURE_OPENAI_ENDPOINT=azurenameforendpoint
# AZURE_OPENAI_MODEL_NAME=gpt-4o
# AZURE_OPENAI_MODEL_VERSION=2024-05-13
# OPENAI_API_KEY=AZURE_ABC
# OPENAI_API_VERSION=2024-07-01-preview
OCR engine to extract images.
- Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development was sponsored by Google in 2006
Install
- (Source Code) https://tesseract-ocr.github.io/tessdoc/Downloads.html
- (Windows Binaries) https://github.com/UB-Mannheim/tesseract/wiki
- (Linux/Docker) apt-get -y install tesseract-ocr
Example for simple conversation.
{
"system": "Voy a definirte tu personalidad, contexto y proposito.\nActua como un experto en venta de frutas.\nSe muy positivo.\nTrata a las personas de usted, nunca tutees sin importar como te escriban.",
"messages": [
{
"additional_content": "",
"type": "text",
"user": "Dime 5 frutas amarillas"
},
{
"assistant": "\n¡Claro! Aquà te van 5 frutas amarillas:\n\n1. Plátano\n2. Piña\n3. Mango\n4. Melón\n5. Papaya\n"
},
{
"additional_content": "",
"type": "text",
"user": "Dame los nombres en ingles."
}
]
}
Example for image conversation.
Using public published URL for image
{
"system": "Actua como una maquina lectora de imagenes.\nDevuelve la información sin lenguaje natural, sólo responde lo que se está solicitando.\nEl dispositivo que va a interactuar contigo es una api, y necesita la información sin markdown u otros caracteres especiales.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "https://raw.githubusercontent.com/logyca/python-libraries/main/logyca-ai/logyca_ai/assets_for_examples/file_or_documents/image.png",
"image_format": "image_url",
"image_resolution": "auto"
},
"type": "image_url",
"user": "Extrae el texto que recibas en la imagen y devuelvelo en formato json."
}
]
}
Using image content in base64
{
"system": "Actua como una maquina lectora de imagenes.\nDevuelve la información sin lenguaje natural, sólo responde lo que se está solicitando.\nEl dispositivo que va a interactuar contigo es una api, y necesita la información sin markdown u otros caracteres especiales.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "<base64 image png content>",
"image_format": "png",
"image_resolution": "auto"
},
"type": "image_base64",
"user": "Extrae el texto que recibas en la imagen y devuelvelo en formato json."
}
]
}
Example for pdf conversation.
Using public published URL for pdf
{
"system": "No uses lenguaje natural para la respuesta.\nDame la información que puedas extraer de la imagen en formato JSON.\nSolo devuelve la información, no formatees con caracteres adicionales la respuesta.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "https://raw.githubusercontent.com/logyca/python-libraries/main/logyca-ai/logyca_ai/assets_for_examples/file_or_documents/pdf.pdf",
"pdf_format": "pdf_url"
},
"type": "pdf_url",
"user": "Dame los siguientes datos: Expediente, radicación, Fecha, Numero de registro, Vigencia."
}
]
}
Using pdf content in base64
{
"system": "No uses lenguaje natural para la respuesta.\nDame la información que puedas extraer de la imagen en formato JSON.\nSolo devuelve la información, no formatees con caracteres adicionales la respuesta.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "<base64 pdf content>",
"pdf_format": "pdf"
},
"type": "pdf_base64",
"user": "Dame los siguientes datos: Expediente, radicación, Fecha, Numero de registro, Vigencia."
}
]
}
Example for plain_text conversation.
Using public published URL for plain_text
{
"system": "No uses lenguaje natural para la respuesta.\n Dame la información que puedas extraer en formato JSON.\n Solo devuelve la información, no formatees con caracteres adicionales la respuesta.\n Te voy a enviar un texto que representa información en formato csv.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "https://raw.githubusercontent.com/logyca/python-libraries/main/logyca-ai/logyca_ai/assets_for_examples/file_or_documents/plain_text.csv",
"file_format": "plain_text_url"
},
"type": "plain_text_url",
"user": "Dame los siguientes datos de la primera fila del documento: Expediente, radicación, Fecha, Numero de registro, Vigencia.\n A partir de la fila 2 del documento, suma los valores de la columna Valores_A.\n A partir de la fila 2 del documento, Suma los valores de la columna Valores_B."
}
]
}
Using plain_text content in base64
{
"system": "No uses lenguaje natural para la respuesta.\n Dame la información que puedas extraer en formato JSON.\n Solo devuelve la información, no formatees con caracteres adicionales la respuesta.\n Te voy a enviar un texto que representa información en formato csv.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "<base64 pdf content>",
"file_format": "csv"
},
"type": "plain_text_base64",
"user": "Dame los siguientes datos de la primera fila del documento: Expediente, radicación, Fecha, Numero de registro, Vigencia.\n A partir de la fila 2 del documento, suma los valores de la columna Valores_A.\n A partir de la fila 2 del documento, Suma los valores de la columna Valores_B."
}
]
}
Example for Microsoft files conversation (Word, Excel).
Using public published URL for Excel file
{
"system": "No uses lenguaje natural para la respuesta.\n Dame la información que puedas extraer de la imagen en formato JSON.\n Solo devuelve la información, no formatees con caracteres adicionales la respuesta.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "https://raw.githubusercontent.com/logyca/python-libraries/main/logyca-ai/logyca_ai/assets_for_examples/file_or_documents/ms_excel.xlsx",
"file_format": "ms_url"
},
"type": "ms_url",
"user": "Dame los siguientes datos: Expediente, radicación, Fecha, Numero de registro, Vigencia."
}
]
}
Using Excel file content in base64
{
"system": "No uses lenguaje natural para la respuesta.\n Dame la información que puedas extraer de la imagen en formato JSON.\n Solo devuelve la información, no formatees con caracteres adicionales la respuesta.",
"messages": [
{
"additional_content": {
"base64_content_or_url": "<base64 pdf content>",
"file_format": "xlsx"
},
"type": "ms_base64",
"user": "Dame los siguientes datos: Expediente, radicación, Fecha, Numero de registro, Vigencia."
}
]
}
Semantic Versioning
logyca_ai < MAJOR >.< MINOR >.< PATCH >
- MAJOR: version when you make incompatible API changes
- MINOR: version when you add functionality in a backwards compatible manner
- PATCH: version when you make backwards compatible bug fixes
Definitions for releasing versions
-
https://peps.python.org/pep-0440/
- X.YaN (Alpha release): Identify and fix early-stage bugs. Not suitable for production use.
- X.YbN (Beta release): Stabilize and refine features. Address reported bugs. Prepare for official release.
- X.YrcN (Release candidate): Final version before official release. Assumes all major features are complete and stable. Recommended for testing in non-critical environments.
- X.Y (Final release/Stable/Production): Completed, stable version ready for use in production. Full release for public use.
Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Types of changes
- Added for new features.
- Changed for changes in existing functionality.
- Deprecated for soon-to-be removed features.
- Removed for now removed features.
- Fixed for any bug fixes.
- Security in case of vulnerabilities.
[0.0.1aX] - 2024-08-02
Added
- First tests using pypi.org in develop environment.
[0.1.0] - 2024-08-02
Added
- Completion of testing and launch into production.
[0.1.1] - 2024-08-16
Added
- The functions of extracting text from PDF files are refactored, using disk to optimize the use of ram memory and methods are added to extract text from images within the pages of the PDF files.
[0.2.0] - 2024-08-30
Added
- New feature of attaching documents with txt, csv, docx, xlsx extension
[0.2.1] - 2024-09-16
Added
- New tiktoken function to count tokens and check model capacity, returning if it meets the maximum_request_tokens requirements for both input and output.
Fixed
- Extract excel files to output formats json, csv and list.
[0.2.2] - 2024-10-22
Added
- New functionalities are added to extract images from documents in base64 lists: extract_images_from_pdf_file, extract_images_from_docx_file
- The Swagger documentation is improved in the FastAPI example, adding the parameter: just_extract_images to the POST method to use the new document image extraction features.
[0.2.3] - 2024-10-31
Added
- new functionality when extracting text in Excel, you can select only extraction of visible sheets or all sheets.
[0.2.4] - 2024-11-01
Fixed
- Minimum adjustment when extracting images from an Excel file, leaving the extension in lowercase in the result.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file logyca_ai-0.2.4.tar.gz
.
File metadata
- Download URL: logyca_ai-0.2.4.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83f55b8e5c1644f200fc0f1a7e7371f382b25ba49116f0f8b9b059ab067dd004 |
|
MD5 | 15f6a7fa07224286f727f3a38d4b186e |
|
BLAKE2b-256 | fd3898e9d46dd159de18bf5271c6ab7d5e8ad5e2788307d8418ea0b25aa46216 |
File details
Details for the file logyca_ai-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: logyca_ai-0.2.4-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34e40e13f977bffb14dfc850eca3716dd1f79a4aa0635db2dd244f4297e16861 |
|
MD5 | 6163931f20d04f3404b0631d5d953c38 |
|
BLAKE2b-256 | 522c92a60560375380e6b2ad8ad8dc9f147517f61d23effdf28f4871bbb2480d |