Skip to main content

LogicLM

Project description

LogicLM: natural language data analytics

LogicLM demo

What is LogicLM?

LogicLM is a lightweight open source natural language data analytics interface.

Vision of LogicLM: Use of predicate calculus as an intermediate representation between natural language and data retrieval allows for reliable and efficient application of artificial intelligence to data analysis.

LogicLM uses Logica, which is an open source logic programming language.

Defining measures, dimensions and filters as predicates makes writing configuration easy and results in a powerful OLAP query generation.

Large language models are used for translating user request formulated in natural language to a structured config. User stays in control, since the config is displayed to the user and can be directly edited.

Supported back-ends

LogicLM uses Logica to generate and execute data query. Database back-ends supported by Logica are

  • SQLite: lightweight in-process database that among other things comes with Python. No installation or configuration is necessary to use this back-end. Perfect for analyzing datasets up to 1GB.

  • PostgreSQL: one of the most popular open-source database servers. Perfect for analyzing datasets up to 10GB.

  • BigQuery: Google's distributed data warehouse capable of processing practically unlimited volumes of data. It's a paid product, but it comes with a free tier.

To understand user's request LogicLM needs an LLM API key. Supported LLM services are: Google GenAI, OpenAI and MistralAI.

LogicLM Configuration

You configure an instance of LogicLM describing your data cube measures, dimensions and filters. Configuration consists of two files a Logica program and a JSON-format file.

  • Logic of measures, dimensions and filtes is defined in a Logica program via rules specifying the corresponding predicates.

  • Use JSON part-of-config file to specify which predicates correspond to measures, which to dimensions and which to filters. In this file you also specify hints to the LLM, like meaning of the measures, dimensions and some examples of answering questions.

Configuration Examples

LogicLM repo comes with two examples of configurations:

  • reach: synthetic dataset for measuring reach (i.e. number of people) that were exposed to a collection of online-advertising campaigns.

  • baby_names: dataset about names given to babies in United States, broken by gender and state. Configuration uses BigQuery as the back-end.

Installation

To run LogicLM clone repo and install requirements.

git clone https://github.com/google/logiclm
cd logiclm
python3 -m pip install -r requirements.txt

You will also need to install an LLM API that you would like to use, e.g. to install Google Generative AI run

python3 -m pip install google-generativeai

If you want to use BigQuery then you will need Python SDK.

Starting a UI server

Example reach is good for a quick start, as it uses SQLite and runs without any external database dependencies.

To enable natural language query translation you would need an LLM API key for the system that you would like to use, i.e one of LOGICLM_GOOGLE_GENAI_API_KEY, LOGICLM_OPENAI_API_KEY or LOGICLM_MISTRALAI_API_KEY.

To start a LogicLM instance powered by reach config enter the root repo folder and run

export LOGICLM_GOOGLE_GENAI_API_KEY=your_key_should_be_here
python3 logiclm.py examples/reach/reach.json start_server

Then proceed to http://localhost:1791/.

Programmatic usage

You can call logiclm.py script from command line. For example to build SQL for a natural language question use understand_and_sql command. If you have Google Cloud configured you can pipe the SQL to bq tool to query the result.

$ python3 logiclm.py examples/baby_names/baby_names.json understand_and_sql "What are top popular names on westcoast?"  | bq query --nouse_legacy_sql
+---------+------------------+
| Name<>  | NumberOfBabies<> |
+---------+------------------+
| Michael |           545822 |
| David   |           475426 |
| Robert  |           457956 |
+---------+------------------+

See main function in logiclm.py for examples of calling LogicLM library functions.

Unless otherwise noted, the LogicLM source files are distributed under the Apache 2.0 license found in the LICENSE file.

LogicLM is not an officially supported Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logiclm-1.0.1.tar.gz (18.4 kB view hashes)

Uploaded Source

Built Distribution

logiclm-1.0.1-py3-none-any.whl (20.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page