Skip to main content

LLM-based CSV parsing for Process Mining purposes

Project description

# csv-pm-llm-parsing

LLM-based CSV parsing for Process Mining purposes. It is compatible with advanced LLMs exposing the OpenAI’s API.

## How to install

pip install -U csv_pm_llm_parsing

## How to set up the LLM connection

Please provide the openai_api_url, openai_api_key, and openai_model as in the examples below.

Alternatively, they could be set up in the system environment variables OPENAI_API_URL, OPENAI_API_KEY, and OPENAI_MODEL.

Examples settings: * OpenAI’s GPT-4O: openai_api_url=’https://api.openai.com/v1’, openai_api_key=’sk’, openai_model=’gpt-4o’ * Locally run (small) LLM (https://ollama.com/library/qwen2:72b-instruct-q6_K): openai_api_url=’http://127.0.0.1:11434/v1, openai_api_key=’sk’, openai_model=’qwen2:72b-instruct-q6_K’ * DeepInfra (Qwen/Qwen2-72B-Instruct): openai_api_url=’https://api.deepinfra.com/v1/openai/’, openai_api_key=’adssad’, openai_model=’Qwen/Qwen2-72B-Instruct’

## Modules

### Separator and Quotechar detection (using LLMs)

Example code:

import csv_pm_llm_parsing

csv_path = “testfiles/sep_detection/01_comma_doublequote.csv” format = csv_pm_llm_parsing.detect_sep_and_quote(csv_path, input_encoding=”utf-8”, openai_api_url=”https://api.openai.com/v1”, openai_api_key=”sk-”, openai_model=”gpt-4o”, return_detected_sep=True) print(format)

### Case ID, Activity, and Timestamp columns detection (using LLMs)

Example code:

import pandas as pd import csv_pm_llm_parsing

csv_path = “testfiles/cid_acti_timest/01.csv” dataframe = pd.read_csv(csv_path) main_columns = csv_pm_llm_parsing.detect_caseid_activity_timestamp(dataframe, openai_api_url=”https://api.openai.com/v1”, openai_api_key=”sk-”, openai_model=”gpt-4o”, return_suggestions=True) print(main_columns)

### Timestamp Format detection (using LLMs)

Example code:

import pandas as pd import csv_pm_llm_parsing

csv_path = “testfiles/timest_format/05_rfc1123.csv” dataframe = pd.read_csv(csv_path) timest_column = “time:timestamp” timest_format = csv_pm_llm_parsing.detect_timest_format(dataframe, timest_column=timest_column, openai_api_url=”https://api.openai.com/v1”, openai_api_key=”sk-”, openai_model=”gpt-4o”, return_timest_format=True) print(timest_format) dataframe[timest_column] = pd.to_datetime(dataframe[timest_column], format=timest_format) dataframe.info()

## OVERALL CSV PARSING (executes all the modules)

Example code:

import csv_pm_llm_parsing

csv_path = “testfiles/overall/01.csv” dataframe = csv_pm_llm_parsing.full_parse_csv_for_pm(csv_path, openai_api_url=”https://api.openai.com/v1”, openai_api_key=”sk-”, openai_model=”gpt-4o”) dataframe.info() print(dataframe)

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_pm_llm_parsing-0.1.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

csv_pm_llm_parsing-0.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file csv_pm_llm_parsing-0.1.tar.gz.

File metadata

  • Download URL: csv_pm_llm_parsing-0.1.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for csv_pm_llm_parsing-0.1.tar.gz
Algorithm Hash digest
SHA256 3da64809bccd8a619b44089a8923bf9a3589926add01cbd7fb92a0a9a47bc646
MD5 4ccaa2620a38f466d12955bb5c8e7e4a
BLAKE2b-256 2a3dfe434196ab90717b0622a5eb913d0624cc44205c2988db65362a5043d505

See more details on using hashes here.

File details

Details for the file csv_pm_llm_parsing-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for csv_pm_llm_parsing-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab748c6efbe88a3da0a59dbabfb3f3179c1849ec2d80d16e5a16e9c8748b1f17
MD5 8058bf5126b716be4f07351f18b8c13a
BLAKE2b-256 6f6c5b269749d756afd69bf8b67ae66d884f867d89f9aac58341795170084fb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page