es-sql

Use sql to query from Elasticsearch

These details have not been verified by PyPI

Project links

Homepage

Project description

# Installation

pip install es-sql

# Usage

```
import es_sql
es_sql.execute_sql(
'http://127.0.0.1:9200',
'SELECT COUNT(*) FROM your_index WHERE field=%(param)s',
arguments={'param': 'value'})
```

arguments is optional if no %(param)s specified in the sql

```es-sql``` command can also be used in commandline:

```
cat << EOF | es-sql http://127.0.0.1:9200
SELECT COUNT(*) FROM your_index
EOF
```

# Syntax

The goal is to be able to express all the necessary elasticsearch DSL
(used in the context of OLAP database, not full-text search engine) using SQL.

## Query multiple index

```FROM quote``` => ```quote*```

```FROM index('quote')``` => ```quote```

```FROM index('quote-%Y-%m-%d', '2015-01-01')``` => ```quote-2015-01-01```

```FROM index('quote-%Y-%m-%d', '2015-01-01', '2015-01-03')``` => ```quote-2015-01-01,quote-2015-01-02,quote-2015-01-03```

```FROM index('quote-%Y-%m-%d', now())```

```FROM index('quote-%Y-%m-%d', now() - interval('2 DAYS'))```

```FROM (index('quote') UNION index('symbol')) AS my_table``` => ```quote,symbol```

```FROM (quote EXCEPT index('quote-2015-01-01')) AS my_table``` => ```quote*,-quote-2015-01-01```

## Drill down by sub aggregation

Elasticsearch support sub aggregations. It can be expressed by multiple sql statements

```
WITH all_symbols AS (SELECT MAX(market_cap) AS max_all_times FROM symbol);
WITH per_ipo_year AS (SELECT ipo_year, MAX(market_cap) AS max_this_year INSIDE all_symbols
GROUP BY ipo_year LIMIT 2);
```

```SELECT INSIDE``` can also be ```SELECT FROM```

## Client side join

```
SELECT symbol FROM symbol WHERE sector='Finance' LIMIT 5;
SAVE RESULT AS finance_symbols;
SELECT MAX(adj_close) FROM quote
JOIN finance_symbols ON quote.symbol = finance_symbols.symbol;
REMOVE RESULT finance_symbols;
```

## Server side join

It requires https://github.com/sirensolutions/siren-join

```
WITH finance_symbols AS (SELECT symbol FROM symbol WHERE sector='Finance' LIMIT 5);
SELECT MAX(adj_close) FROM quote
JOIN finance_symbols ON quote.symbol = finance_symbols.symbol;
```

## Pagination

TODO

# Full text queries

## Match Query

TODO

## Multi Match Query

TODO

## Common Terms Query

TODO

## Query String Query

TODO

## Simple Query String Query

TODO

# Term level queries

## Term Query

```
{
"term" : { "user" : "Kimchy" }
}
```

```
WHERE user='Kimchy'
```

If field is analyzed, term query actually means contains instead of fully equal

## Terms Query

```
{
"constant_score" : {
"filter" : {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
}
```
```
WHERE user IN ('kimchy', 'elasticsearch')
```

Terms look up will not be supported, use server side join instead.

## Range Query

```
{
"range" : {
"age" : {
"gte" : 10,
"lte" : 20
}
}
}
```

```
WHERE age >= 10 AND age <= 20
```

```
{
"range" : {
"date" : {
"gte" : "now-1d",
"lt" : "now"
}
}
}
```

```
WHERE "date" >= now() - INTERVAL '1 day' AND "date" < now()
```

```
{
"range" : {
"date" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
```
```
WHERE "date" >= today() - interval('1 day') AND "date" < today()
```
```
{
"range" : {
"born" : {
"gte": "01/01/2012",
"lte": "2013",
"format": "dd/MM/yyyy||yyyy"
}
}
}
```
```
WHERE born >= TIMESTAMP '2012-01-01 00:00:00' AND born <= TIMESTAMP '2013-01-01 00：00：00'
```
Suported datetime function are

- datetime: TIMESTAMP '2012-01-01 00:00:00' can also be timestamp('2012-01-01 00:00:00')
- day/hour/minute/second interval: INTERVAL '1 DAY' can also be interval('1 day')
- current datetime: now()
- current day: today()

TODO: timezone

## Exists Query

```
{
"exists" : { "field" : "user" }
}
```
```
WHERE user IS NOT NULL
```

## Prefix Query

TODO

## Wildcard Query

```
{
"wildcard" : { "user" : "ki*y" }
}
```
```
WHERE user LIKE 'ki%y'
```

```
{
"wildcard" : { "user" : "ki?y" }
}
```
```
WHERE user LIKE 'ki_y'
```

## Regexp Query

TODO

## Fuzzy Query

TODO

## Type Query

```
{
"type" : {
"value" : "my_type"
}
}
```
```
WHERE _type='my_type'
```

## Ids Query

```
{
"ids" : {
"values" : ["1", "4", "100"]
}
}
```
```
WHERE _id IN ('1','4','100')
```
```
{
"ids" : {
"type" : "my_type",
"values" : ["1", "4", "100"]
}
}
```
```
WHERE _type='my_type' AND _id IN ('1','4','100')
```

# Compound queries

## Bool Query

```
{
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "wow" }
},
{
"term" : { "tag" : "elasticsearch" }
}
]
}
}
```
```
WHERE user='kimchy' AND tag='tech' AND NOT (age >= 10 AND age < 20) AND (tag='wow' OR tag='elasticsearch')
```

TODO: minimum_should_match

## Indicies Query

TODO

## Limit Query

TODO

# Joining queries

## Nested Query

TODO

## Has Child Query

TODO

## Has Parent Query

TODO

# Geo queries

## GeoShape Query

TODO

## Geo Bounding Box Query

TODO

## Geo Distance Query

TODO

## Geo Distance Range Query

TODO

## Geo Polygon Query

TODO

## Geohash Cell Query

TODO

# Specialized queries

## Template Query

TODO

## Script Query

TODO

# Metric Aggregations

## Avg Aggregation

```
{
"aggs" : {
"avg_grade" : { "avg" : { "field" : "grade" } }
}
}
```
```
SELECT avg(grade) AS avg_grade
```

TODO: script, missing

## Cardinality Aggregation

```
{
"aggs" : {
"author_count" : {
"cardinality" : {
"field" : "author"
}
}
}
}
```
```
SELECT COUNT(DISTINCT author) AS author_count
```
TODO: Precision control, script, missing

## Extended Stats Aggregation

```
{
"aggs" : {
"grades_stats" : { "extended_stats" : { "field" : "grade" } }
}
}
```
will return
```
{
"grade_stats": {
"count": 9,
"min": 72,
"max": 99,
"avg": 86,
"sum": 774,
"sum_of_squares": 67028,
"variance": 51.55555555555556,
"std_deviation": 7.180219742846005,
"std_deviation_bounds": {
"upper": 100.36043948569201,
"lower": 71.63956051430799
}
}
}
```
```
SELECT SUM_OF_SQUARES(grade)
SELECT VARIANCE(grade)
SELECT STD_DEVIATION(grade)
SELECT STD_DEVIATION_UPPER_BOUND(grade)
SELECT STD_DEVIATION_LOWER_BOUND(grade)
```

TODO: script, missing

## Geo Bounds Aggregation

TODO

## Geo Centroid Aggregation

TODO

## Max Aggregation

```
{
"aggs" : {
"max_price" : { "max" : { "field" : "price" } }
}
}
```
```
SELECT MAC(price) AS max_price
```

TODO: script, missing

## Min Aggregation

```
{
"aggs" : {
"min_price" : { "min" : { "field" : "price" } }
}
}
```
```
SELECT MIN(price) AS min_price
```

TODO: script, missing

## Percentiles Aggregation

TODO

## Percentile Ranks Aggregation

TODO

## Scripted Metric Aggregation

TODO

## Sum Aggregation

```
{
"aggs" : {
"intraday_return" : { "sum" : { "field" : "change" } }
}
}
```
```
SELECT SUM(change) AS intraday_return
```

TODO: script, missing

## Top hits Aggregation

TODO

## Value Count Aggregation

```
{
"aggs" : {
"grades_count" : { "value_count" : { "field" : "grade" } }
}
}
```
```
SELECT COUNT(grade) AS grades_count
```

TODO: script

# Bucket Aggregations

## Children Aggregation

TODO

## Date Historgram Aggregation

```
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
}
}
}
```
```
GROUP BY DATE_TRUNC('month', "date") AS articles_over_time
```
```
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "1M",
"format" : "yyyy-MM-dd"
}
}
}
}
```
```
GROUP BY TO_CHAR(DATE_TRUNC('month', "date"),'%Y-%m-%d') AS articles_over_time
```

TODO: 1.5 hours interval, timezone, offset, script, missing

## Filter Aggregation

```
{
"aggs" : {
"red_products" : {
"filter" : { "term": { "color": "red" } },
"aggs" : {
"avg_price" : { "avg" : { "field" : "price" } }
}
}
}
}
```
```
WITH all_products AS (SELECT COUNT(*) FROM product);
SELECT AVG(price) AS avg_price FROM all_products WHERE color='red';
```

If from table is not another named sql, the where condition will be translated to query instead of filter aggregation.

## Filters Aggregation

```
{
"aggs" : {
"messages" : {
"filters" : {
"other_bucket_key": "other_messages",
"filters" : {
"errors" : { "term" : { "body" : "error" }},
"warnings" : { "term" : { "body" : "warning" }}
}
}
}
}
}
```
```
GROUP BY CASE WHEN body='error' THEN 'errors' WHEN body='warning' THEN 'warnings' ELSE 'other_messages' END AS messages
```

## Geo Distance Aggregation

TODO

## GeoHash grid Aggregation

TODO

## Histogram Aggregation

```
{
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 50
}
}
}
}
```
```
GROUP BY histogram(price, 50) AS prices
```
```
{
"aggs" : {
"prices" : {
"histogram" : {
"field" : "price",
"interval" : 50,
"order" : { "_key" : "desc" }
}
}
}
}
```
```
GROUP BY histogram(price, 50) AS prices ORDER BY prices DESC
```

TODO: min_doc_count, offset, buckets_path, missing

## IPv4 Range Aggregation

TODO

## Missing Aggregation

TODO

## Nested Aggregation

TODO

## Range Aggregation

```
{
"aggs" : {
"price_ranges" : {
"range" : {
"field" : "price",
"ranges" : [
{ "to" : 50 },
{ "from" : 50, "to" : 100 },
{ "from" : 100 }
]
}
}
}
}
```
```
GROUP BY CASE
WEHN price < 50 THEN 'range1'
WHEN price >= 50 AND price < 100 THEN 'range2'
WHEN price >= 100 THEN 'range3'
END AS price_ranges
```

TODO: script

## Reverse nested Aggregation

TODO

## Sampler Aggregation

TODO

## Significant Terms Aggregation

TODO

## Terms Aggregation

```
{
"aggs" : {
"genders" : {
"terms" : { "field" : "gender" }
}
}
}
```
```
GROUOP BY gender AS genders
```
```
{
"aggs" : {
"products" : {
"terms" : {
"field" : "product",
"size" : 5
}
}
}
}
```
```
GROUP BY product AS products LIMIT 5
```
```
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "_count" : "asc" }
}
}
}
}
```
```
SELECT COUNT(*) AS c FROM xxx
GROUP BY gender AS genders ORDER BY c
```
```
{
"aggs" : {
"genders" : {
"terms" : {
"field" : "gender",
"order" : { "height_stats.std_deviation" : "desc" }
},
"aggs" : {
"height_stats" : { "extended_stats" : { "field" : "height" } }
}
}
}
}
```
```
SELECT STD_DEVIATION(height) AS s FROM xxx
GROUP BY gender AS genders ORDER BY s
```
```
{
"aggs" : {
"countries" : {
"terms" : {
"field" : "address.country",
"order" : { "females>height_stats.avg" : "desc" }
},
"aggs" : {
"females" : {
"filter" : { "term" : { "gender" : "female" }},
"aggs" : {
"avg_height" : { "avg" : { "field" : "height" }}
}
}
}
}
}
}
```
```
WITH all AS (SELECT * FROM xxx GROUP BY address.country AS countries ORDER BY female_avg_height);
SELECT AVG(height) AS female_avg_height FROM all WHERE gender='female'
```

TODO: document count error, min_doc_count, script, filtering, collect-to, missing

# Pipeline Aggregations

## Avg Bucket Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"avg_monthly_sales": {
"avg_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
```
```
WITH sales_per_month AS (SELECT month, SUM(price) AS sales FROM sale GROUP BY DATE_TRUNC('month', "date") AS month);
SELECT AVG(sales) AS avg_monthly_sales FROM sales_per_month;
```

TODO: gap_policy

## Derivative Aggregation

First Order Derivative
```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales"
}
}
}
}
}
}
```
```
SELECT month, SUM(price) AS sales, DERIVATIVE(sales) AS sales_deriv
FROM sale GROUP BY DATE_TRUNC('month', "date") AS month
```
Second Order Derivative
```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"sales_deriv": {
"derivative": {
"buckets_path": "sales"
}
},
"sales_2nd_deriv": {
"derivative": {
"buckets_path": "sales_deriv"
}
}
}
}
}
}
```
```
SELECT month, SUM(price) AS sales, DERIVATIVE(sales) AS sales_deriv, DERIVATIVE(sales_deriv) AS sales_2nd_deriv
FROM sale GROUP BY DATE_TRUNC('month', "date") AS month
```

TODO: unit, gap_policy

## Max Bucket Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"max_monthly_sales": {
"max_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
```
```
WITH sales_per_month AS (SELECT month, SUM(price) AS sales FROM sale GROUP BY DATE_TRUNC('month', "date") AS month);
SELECT MAX(sales) AS max_monthly_sales FROM sales_per_month;
```

TODO: gap_policy

## Min Bucket Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"min_monthly_sales": {
"min_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
```
```
WITH sales_per_month AS (SELECT month, SUM(price) AS sales FROM sale GROUP BY DATE_TRUNC('month', "date") AS month);
SELECT MIN(sales) AS min_monthly_sales FROM sales_per_month;
```

TODO: gap_policy

## Sum Bucket Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"sum_monthly_sales": {
"sum_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
```
```
WITH sales_per_month AS (SELECT month, SUM(price) AS sales FROM sale GROUP BY DATE_TRUNC('month', "date") AS month);
SELECT SUM(sales) AS sum_monthly_sales FROM sales_per_month;
```

TODO: gap_policy

## Stats Bucket Aggregation

TODO

## Extended Stats Bucket Aggregation

TODO

## Percentiles Bucket Aggregation

TODO

## Moving Average Aggregation

```
{
"moving_avg": {
"buckets_path": "the_sum",
"model": "holt",
"window": 5,
"gap_policy": "insert_zero",
"settings": {
"alpha": 0.8
}
}
}
```
```
SELECT moving_avg(the_sum, '{"model":"holt","window":5,"gap_policy":"insert_zero","settings":{"alpha":0.8}}')
```
Can also be
```
SELECT moving_avg(the_sum, model='holt', window=5, gap_policy='insert_zero', settings='{"alpha":0.8}')
```

## Cumulative Sum Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"cumulative_sales": {
"cumulative_sum": {
"buckets_path": "sales"
}
}
}
}
}
}
```
```
SELECT month, SUM(price) AS sales, CSUM(sales) AS cumulative_sales
FROM sale GROUP BY DATE_TRUNC('month', "date") AS month
```

## Bucket Script Aggregation

TODO

## Bucket Selector Aggregation

```
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
}
"sales_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"totalSales": "total_sales"
},
"script": "totalSales <= 50"
}
}
}
}
}
}
```
```
SELECT month, SUM(price) AS total_sales
FROM sale GROUP BY DATE_TRUNC('month', "date") AS month
HAVING total_sales <= 50
```

TODO: gap_policy

## Serial Differencing Aggregation

```
{
"aggs": {
"my_date_histo": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"the_sum": {
"sum": {
"field": "lemmings"
}
},
"thirtieth_difference": {
"serial_diff": {
"buckets_path": "the_sum",
"lag" : 30
}
}
}
}
}
}
```
```
SELECT SUM(lemmings) AS the_sum, SERIAL_DIFF(the_sum, lag=30) AS thirtieth_difference FROM xxx
GROUP BY DATE_TRUNC('day', "timestamp") AS my_date_histo
```

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.0

Mar 1, 2016

1.0.1

Feb 26, 2016

1.0.0

Feb 25, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

es-sql-2.0.0.tar.gz (61.9 kB view details)

Uploaded Mar 1, 2016 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

es_sql-2.0.0-py2-none-any.whl (83.1 kB view details)

Uploaded Mar 1, 2016 Python 2

File details

Details for the file es-sql-2.0.0.tar.gz.

File metadata

Download URL: es-sql-2.0.0.tar.gz
Upload date: Mar 1, 2016
Size: 61.9 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for es-sql-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`ddd7117b41691b8a38e483933bd4c0cf328505a69a0a28868068369de8bf3e08`
MD5	`905fa93acb26d9e9ef39ae2336bc2193`
BLAKE2b-256	`94adfca4ec209fda933d15d40f6be4b776a334bdf0d60f8db722e73a956457f6`

See more details on using hashes here.

File details

Details for the file es_sql-2.0.0-py2-none-any.whl.

File metadata

Download URL: es_sql-2.0.0-py2-none-any.whl
Upload date: Mar 1, 2016
Size: 83.1 kB
Tags: Python 2
Uploaded using Trusted Publishing? No

File hashes

Hashes for es_sql-2.0.0-py2-none-any.whl
Algorithm	Hash digest
SHA256	`0d84ae210e969efba3bc7736e0e9f8e7b86f524f6ad3561e7ce40aefcb3119e6`
MD5	`d83149adf1a93fdd68c4ee39fe7e3406`
BLAKE2b-256	`141e130f896cf648b20ca35b734b3192c824eccbbbaede158c97a732a8eda328`

See more details on using hashes here.

es-sql 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes