Skip to main content

NLQL (Natural Language Query Language) is a tool that helps you search through text using simple commands that look like SQL. Just like how SQL helps you find information in databases, NLQL helps you find information in regular text.

Project description

NLQL (Natural Language Query Language)

A SQL-like query language designed specifically for natural language processing and text retrieval.

Overview

NLQL is a query language that brings the power and simplicity of SQL to natural language processing. It provides a structured way to query and analyze unstructured text data, making it particularly useful for RAG (Retrieval-Augmented Generation) systems and large language models.

Key Features

  • SQL-like syntax for intuitive querying
  • Multiple text unit support (character, word, sentence, paragraph, document)
  • Rich set of operators for text analysis
  • Semantic search capabilities
  • Vector embedding support
  • Extensible plugin system
  • Performance optimizations with indexing and caching

Basic Syntax

SELECT <UNIT> 
[FROM <SOURCE>]
[WHERE <CONDITIONS>]
[GROUP BY <FIELD>]
[ORDER BY <FIELD>]
[LIMIT <NUMBER>]

Query Units

  • CHAR: Character level
  • WORD: Word level
  • SENTENCE: Sentence level
  • PARAGRAPH: Paragraph level
  • DOCUMENT: Document level

Basic Operators

CONTAINS("text")              -- Contains specified text
STARTS_WITH("text")          -- Starts with specified text
ENDS_WITH("text")            -- Ends with specified text
LENGTH(<|>|=|<=|>=) number   -- Length conditions

Semantic Operators

SIMILAR_TO("text", threshold)     -- Semantic similarity
TOPIC_IS("topic")                 -- Topic matching
SENTIMENT_IS("positive"|"negative"|"neutral")  -- Sentiment analysis

Vector Operators

EMBEDDING_DISTANCE("text", threshold)  -- Vector distance
VECTOR_SIMILAR("vector", threshold)    -- Vector similarity

Usage Examples

Basic Queries

-- Find sentences containing "artificial intelligence"
SELECT SENTENCE WHERE CONTAINS("artificial intelligence")

-- Find paragraphs with less than 100 characters
SELECT PARAGRAPH WHERE LENGTH < 100

Advanced Queries

-- Find semantically similar sentences
SELECT SENTENCE 
WHERE SIMILAR_TO("How to improve productivity", 0.8)

-- Find positive sentences about innovation
SELECT SENTENCE 
WHERE CONTAINS("innovation") 
AND SENTIMENT_IS("positive")
-- Here LENGTH is not a keyword, you need to register it manually. -> nlql.register_metadata_extractor("LENGTH", lambda x: len(x))
ORDER BY LENGTH 
LIMIT 10

Implementation

The system is implemented with three main components:

  1. Tokenizer: Breaks down query strings into tokens
  2. Parser: Converts tokens into an abstract syntax tree (AST)
  3. Executor: Executes the query and returns results

Performance Optimizations

  • Inverted index for text search
  • Vector index for semantic search
  • Query result caching
  • Parallel processing for large datasets

Extension System

NLQL supports custom extensions through:

  1. Plugin System
    • Register custom operators
    • Add new query units
    • Implement custom functions

Getting Started

  1. Install the package:
pip install nlql
  1. Basic usage:
from nlql import NLQL

# Initialize NLQL
nlql = NLQL()

# Add text for querying
raw_text = """
Natural Language Processing (NLP) is a branch of artificial intelligence 
that helps computers understand human language. This technology is used 
in many applications. For example, virtual assistants use NLP to 
understand your commands.
"""
nlql.text(raw_text)

# Execute query
results = nlql.execute("SELECT SENTENCE WHERE CONTAINS('artificial intelligence')")

# Print results
for result in results:
    print(result)

Contributing

We welcome contributions! Please see our contributing guidelines for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlql-0.1.3.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlql-0.1.3-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file nlql-0.1.3.tar.gz.

File metadata

  • Download URL: nlql-0.1.3.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for nlql-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9353e8d7d70da3167600b4843b170e5b3031c5196f814c9350a16c761486c225
MD5 88d951a3d44f9ea917b7fc2b154a0bd8
BLAKE2b-256 0e987ac5185e3a3f5ef9a83ba5c2582db9da276cf32cca3693dbdb6aff8feda8

See more details on using hashes here.

File details

Details for the file nlql-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: nlql-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for nlql-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 93ad10b2baeb23d1915bd6e71458e004bd3b4360019fcb868e9f74b3f49d8532
MD5 f9500d13a2cac7ee07ae9c8303aeebef
BLAKE2b-256 01cfd604c83b1ea3bf2f354c8458975edf38d69205d8107df58266bb69762839

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page