Hash query language, a universal Query Language

These details have not been verified by PyPI

Project links

Project description

Disclaimer

Still in the primordial ooze state, only a small set of things work, and it's not a guarantee. I'm not accepting contributions right now. Goal is to have a solid working set of features for DEATH-Con 2025.

Hash Query Language (Hql)

Hash Query Language (Hql) is a query language designed to implement a single feature set across all database backends. Is this accomplished by using a modified grammar of Kusto Query Language (KQL), a query language by Microsoft made for Azure Data Explorer, the basis for Log Analytics Workspace. Hql seeks to provide a uniform feature set to any backend database, enabling the use of alternative database backends such as Elasticsearch or PostgreSQL without compromising on capabilities.

The inspiration of Hql comes from the frustration of using Graylog with my personal homelab after setting it up at DEATHCON 2024, see the original rant idea here. The implementation differs from Kusto in that it supports and embraces nosql datasets, instead of a proprietary backend structured SQL-like database. There are also many other feature changes, it is a completely different language, but attempts to mimic Kusto's capabilities/feature set.

Additionally, features unsupported by Kusto include joining datasets across different database types. In Kusto you can join or query across databases or clusters, however only their's. Here, the below is possible:

let ElasticZeek = database("tf11-elastic").index("so-network-2022.10")
| where event.module == "zeek"
| extend IPAddress = source.ip;
database("sentinel").SigninLogs
| where Username == "iamcompromised"
| project IPAddress
| join type=inner ElasticZeek on IPAddress
| summarize count() by destination.ip, destination.port, source.ip, source.port

In the above we found an attacker IOC in Azure, aka o365, and were able to instantly pivot to the zeek logs we have in Elasticsearch. This is possibly the most extreme case of using the features provided. A more common usecase could be enhancing Elasticsearch to support anything other than basic filtering.

Where a given backend does not support a given feature, such as analytic functions, it gets implemented by Hql. Below, lines 1-3 are able to be collapsed into a single query to elastic. The results are returned and ingested into a polars DataFrame, which then the follow operations are done:

extend
- A new column Hostname in the DataFrame is created with the contents of winlog.computer_name
- Column event.code is cast to INT64 and assigned to column EventID.
project
- The column EventID is fed into series_stats generating a dict with keys for each stat value.
- Since this function is provided as the single expression, with no assigned name, it gets expanded as the new output DataFrame.

1 database("tf11-elastic").index("so-beats-2022.10.*")
2 | where ['@timestamp'] between ("2022-10-21T15:45:00.000Z" .. "2022-10-21T15:55:00.000Z")
3 | where winlog.computer_name == "asarea.vxnwua.net"
4 | extend Hostname = winlog.computer_name, EventID = toint(event.code)
5 | project series_stats(EventID)

Resulting in:

[{"series_stats_EventID_min": 1, "series_stats_EventID_min_idx": 105, "series_stats_EventID_max": 16394, "series_stats_EventID_max_idx": 225, "series_stats_EventID_avg": 1709.3838936669272, "series_stats_EventID_stdev": 2257.263833183075, "series_stats_EventID_variance": 5095240.012596348}]

The use of Polars as the backend compute engine allows for super fast processing. The two main limiters of performance right now across the board are:

IO wait on databases, scrolling, etc
Parsing since it's still done in python.

Implemented features

See the implemented features here. I'll put these into issues at some point.

Might be better to look at closed issues until I get to documentation.

Running

To run you need:

Python (I'm using 3.9.21)

For Elasticsearch support I'm unsure how far back things go but:

An Elasticsearch instance (I'm using 7.17)

But if you don't have Elasticsearch or want to use it, you can use some examples I've made. They pull from my server for JSON and CSV files that can be queried against. Others use the datatable operator for smaller POCs.

# copy and configure Hql
cp conf.json.example conf.json

python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

# make a query and put it into a text file
python3 Hql.py -v -f ./my-query.hql

# Examples
python3 Hql.py -v -f ./examples/operators/join/join-on-1.hql

# Cool ip4 stuff
python3 Hql.py -v -f ./examples/operators/where/ip4_subnetting.hql

Not all examples are guaranteed to work right now! I'm kinda bad at working in them.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.9

Nov 8, 2025

0.0.8

Nov 8, 2025

0.0.7

Nov 8, 2025

0.0.6

Nov 7, 2025

This version

0.0.4

Oct 20, 2025

0.0.2

Oct 13, 2025

0.0.1

Oct 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhql-0.0.4.tar.gz (49.3 MB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyhql-0.0.4-py3-none-any.whl (1.2 MB view details)

Uploaded Oct 20, 2025 Python 3

File details

Details for the file pyhql-0.0.4.tar.gz.

File metadata

Download URL: pyhql-0.0.4.tar.gz
Upload date: Oct 20, 2025
Size: 49.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pyhql-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`6659363c5bb92796138a02f57f95d2410939f7342b02c702f456c07871a4cf76`
MD5	`c12ffb1062e9e540f0644c4323ffa379`
BLAKE2b-256	`89697ffea20b4a1e7ef7ab4fb8d36d9c64fbd970ae6b4451e407c7f1f55aa5af`

See more details on using hashes here.

File details

Details for the file pyhql-0.0.4-py3-none-any.whl.

File metadata

Download URL: pyhql-0.0.4-py3-none-any.whl
Upload date: Oct 20, 2025
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pyhql-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d127cbec609ad3600e6319a2642b38538fd8345a1a2de7791c4ddef688f7569`
MD5	`a7fb3de9e2dba682f354b717ba6f8a81`
BLAKE2b-256	`53575994eeb7f80bb5310f38777a15690d8c903ae2ee97b632994c25b2b25f03`

See more details on using hashes here.

pyhql 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Disclaimer

Hash Query Language (Hql)

Implemented features

Running

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes