shillelagh

Making it easy to query APIs via SQL

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://coveralls.io/repos/github/betodealmeida/shillelagh/badge.svg?branch=master

Shillelagh is a Python DB API and SQLAlchemy dialect for querying non-SQL resources like APIs and files. You can use it to write queries like this:

INSERT INTO "csv:///path/to/file.csv"
SELECT time, chance_of_rain
FROM "https://api.weatherapi.com/v1/history.json?key=XXX&q=London"
WHERE strftime('%Y-%m-%d', time) IN (
  SELECT day
  FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=2064361835"
)

The query above reads holidays from a Google Sheet, uses the days to get weather data from WeatherAPI, and writes the change of rain at each hour of the holidays into a (pre-existing) CSV file.

Each of these resources is implemented via an adapter, and writing adapters is relatively straightforward.

Using Shillelagh

You can use Shillelagh similar to how you would use SQLite (Shillelagh is built on top of APSW):

# currently there's just the APSW backend, but in the future
# we could implement more
from shillelagh.backends.apsw.db import connect

connection = connect(":memory:")  # or connect("database.sqlite")
cursor = connection.cursor()

query = "SELECT * FROM some_table"
for row in cursor.execute(query):
    print(row)

You can also use it with SQLAlchemy:

from sqlalchemy.engine import create_engine

engine = create_engine("shillelagh://")
connection = engine.connect()

The main advantage of Shillelagh is that it allows you to treat non-SQL resources like a table. For example, if you have a Google Spreadsheet URL you can query it directly:

SELECT country, SUM(cnt)
FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1648320094"
WHERE cnt > 0
GROUP BY country

When you run the query above Shillelagh will automatically create a new virtual table (if it doesn’t exist) and associate it with an adapter.

Supported adapters

Currently, Shillelagh supports the following adapters:

CSV files

CSV (comma separated values) are supported via the csv:// scheme (example):

SELECT * FROM "csv:///path/to/file.csv"

The adapter supports full DML, so you can also INSERT, UPDATE, or DELETE rows from the CSV file. Deleted rows are marked for deletion, modified and inserted rows are appended at the end of the file, and garbage collection is applied when the connection is closed.

Google Spreadsheets

Google Spreadsheets can be accessed as tables. To SELECT data from a spreadsheets simply use its URL as the table name (example):

SELECT country, SUM(cnt)
FROM "https://docs.google.com/spreadsheets/d/1_rN3lm0R_bU3NemO0s9pbFkY5LQPcuy1pscv8ZXPtg8/edit#gid=1648320094"
WHERE cnt > 0
GROUP BY country

Authentication is supported, and necessary if you want to use INSERT, UPDATE or DELETE on the spreadsheets. You need to pass credentials via the service_account_info or service_account_file arguments when creating the connection:

service_account_info = {
    "type": "service_account",
    "project_id": "XXX",
    ...,
}

engine = create_engine(
    "shillelagh://",
    adapter_kwargs={
        "gsheetsapi": {
            "service_account_info": service_account_info,
            "subject": "user@example.com",
        },
    },
)

When present, the subject email will be used to impersonate a given user; if not present the connection will have full access to all spreadsheets in a given project, so be careful. Also, make sure the service account has access to the following scopes:

https://www.googleapis.com/auth/drive.readonly
https://www.googleapis.com/auth/spreadsheets
https://spreadsheets.google.com/feeds

You should also confirm that the Google Drive and Google Sheets APIs are active in the project.

Shillelagh also defines a custom dialect called gsheets:// which has only the Google Spreadsheets adapter enabled. Use this is you don’t want users connecting to other resources supported by Shillelagh.

engine = create_engine(
    "gsheets://",
    service_account_info=service_account_info,
    subject="user@example.com",
)

Socrata

The Socrata Open Data API is a simple API used by many governments, non-profits, and NGOs around the world, including the CDC. Similarly to the Google Spreadsheets adapter, with the Socrata adapter you can query any API URL directly (example):

SELECT date, administered_dose1_recip_4
FROM "https://data.cdc.gov/resource/unsk-b7fc.json"
WHERE location = 'US'
ORDER BY date DESC
LIMIT 10

The adapter is currently read-only.

WeatherAPI

The WeatherAPI adapter was the first one to be written, and provides access to historical weather data (forecasts should be easy to implement as well). You need an API key in order to use it (example):

from datetime import datetime, timedelta
from shillelagh.backends.apsw.db import connect

three_days_ago = datetime.now() - timedelta(days=3)

# sign up for an API key at https://www.weatherapi.com/my/
api_key = "XXX"

connection = connect(":memory:")
cursor = connection.cursor()

sql = f"""
SELECT *
FROM "https://api.weatherapi.com/v1/history.json?key={api_key}&q=94923" AS bodega_bay
WHERE time >= ?
"""
for row in cursor.execute(sql, three_days_ago):
    print(row)

Writing a new adapter

Let’s say we want to fetch data from WeatherAPI using SQL. Their API is pretty straightforward — to fetch data for a given day in a given location all we need is an HTTP request:

https://api.weatherapi.com/v1/history.json?key=XXX&q=94158&dt=2020-01-01

This will return data for 2020-01-01 in the ZIP code 94158 as a JSON payload.

The response contains many different variables, but let’s assume we’re only interested in timestamp and temperature for the sake of this example. Of those two, timestamp is special because it can be used to filter data coming from the API, reducing the amount that needs to be downloaded.

We start by defining an “adapter” class, with the columns we’re interested in:

from shillelagh.adapters.base import Adapter

class WeatherAPI(Adapter):

    ts = DateTime(filters=[Range], order=Order.ASCENDING, exact=False)
    temperature = Float()

The ts (timestamp) column has the type DateTime, and can be filtered with a desired range (for example, WHERE ts >= '2020-01-01' AND ts <= '2020-01-07'). We know that the values will be returned in ascending order by the API, so we annotate that to help the SQL engine. If a query has ORDER BY ts ASC we know that we don’t need to sort the payload.

In addition, we declare that the results from filtering ts are not exact. This is because the API returns data for every hour of a given day. To make our lives easier we’re going to filter the data down to the daily granularity, and let the SQL engine filter the rest. For example, imagine this query:

SELECT * FROM weatherapi WHERE ts > '2020-01-01T12:00:00' AND ts < '2020-01-02T12:00:00'

In this case, the adapter is going to download all data for the days 2020-01-01 and 2020-01-02, and pass them to the SQL engine to narrow it down to between noon in each day. We could do that filtering ourselves in the adapter, but since we’re not discarding a lot of data it’s ok.

For temperature we simply declare it as float, since we can’t use temperature values to pre-filter data in the API.

Now we define our __init__ method, which initializes the adapter with the location and API key:

def __init__(self, location: str, api_key: str):
    self.location = location
    self.api_key = api_key

Finally, we define a method to download data from the API:

def get_data(self, bounds: Dict[str, Filter], order: List[Tuple[str, RequestedOrder]]) -> Iterator[Row]:
    ts_range: Range = bounds["ts"]
    today = date.today()
    start = ts_range.start.date() if ts_range.start else today - timedelta(days=7)
    end = ts_range.end.date() if ts_range.end else today

    while start <= end:
        url = (
            f"https://api.weatherapi.com/v1/history.json?key={self.api_key}"
            f"&q={self.location}&dt={start}"
        )
        response = requests.get(url)
        if response.ok:
            payload = response.json()
            hourly_data = payload["forecast"]["forecastday"][0]["hour"]
            for record in hourly_data:
                dt = dateutil.parser.parse(record["time"])
                yield {
                    "rowid": int(dt.timestamp()),
                    "ts": dt.isoformat(),
                    "temperature": record["temp_c"],
                }

        start += timedelta(days=1)

The important thing to know here is that since we defined ts as being filtered through a Range, a corresponding range will be passed to the get_data method specifying how ts should be filtered. The range has optional start and end values, which when not present are defaulted to 7 days ago and today, respectively.

Note also that the method yields rows as dictionaries. In addition to values for ts and temperature it also returns a row ID. This should be a unique value for each row.

We also need to define some dispatching methods, so our adapter can be found:

@staticmethod
def supports(uri: str) -> bool:
    """https://api.weatherapi.com/v1/history.json?key=XXX&q=94158"""
    parsed = urllib.parse.urlparse(uri)
    query_string = urllib.parse.parse_qs(parsed.query)
    return (
        parsed.netloc == "api.weatherapi.com"
        and parsed.path == "/v1/history.json"
        and "key" in query_string
        and "q" in query_string
    )

@staticmethod
def parse_uri(uri: str) -> Tuple[str, str]:
    parsed = urllib.parse.urlparse(uri)
    query_string = urllib.parse.parse_qs(parsed.query)
    location = query_string["q"][0]
    api_key = query_string["key"][0]

    return (location, api_key)

Now we can use our class to query the API using Sqlite:

from shillelagh.backends.apsw.db import connect

connection = connect(":memory:")
cursor = connection.cursor()

api_key = "XXX"
query = f"""
    SELECT *
    FROM "https://api.weatherapi.com/v1/history.json?key={api_key}&q=94923" AS bodega_bay
    WHERE ts >= '2020-01-01T12:00:00'
"""
for row in cursor.execute(query):
    print(row)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.2.19

Apr 3, 2024

1.2.18

Mar 27, 2024

1.2.17

Feb 23, 2024

1.2.16

Feb 22, 2024

1.2.15

Feb 13, 2024

1.2.14

Jan 5, 2024

1.2.13

Jan 4, 2024

1.2.12

Dec 5, 2023

1.2.11

Nov 27, 2023

1.2.10

Nov 9, 2023

1.2.9

Nov 9, 2023

1.2.8

Oct 21, 2023

1.2.7

Aug 14, 2023

1.2.6

Jul 21, 2023

1.2.5

Jul 14, 2023

1.2.4

May 15, 2023

1.2.3

May 15, 2023

1.2.2

Apr 18, 2023

1.2.1

Apr 14, 2023

1.2.0

Feb 17, 2023

1.1.5

Dec 9, 2022

1.1.4

Dec 7, 2022

1.1.3

Nov 17, 2022

1.1.2

Nov 1, 2022

1.1.1

Oct 27, 2022

1.1.0

Jul 28, 2022

1.0.16

Jul 15, 2022

1.0.15

Jul 13, 2022

1.0.14

May 25, 2022

1.0.13

May 2, 2022

1.0.12

Apr 29, 2022

1.0.11

Apr 14, 2022

1.0.10

Mar 14, 2022

1.0.9

Mar 12, 2022

1.0.8

Mar 11, 2022

1.0.7

Mar 3, 2022

1.0.6

Dec 31, 2021

1.0.5

Dec 3, 2021

1.0.4

Aug 31, 2021

1.0.3

Aug 24, 2021

1.0.2

Aug 24, 2021

1.0.1

Aug 23, 2021

1.0.0

Aug 19, 2021

1.0.0rc5 pre-release

Jul 15, 2021

1.0.0rc4 pre-release

Jul 14, 2021

1.0.0rc3 pre-release

Jul 14, 2021

1.0.0rc2 pre-release

Jul 14, 2021

1.0.0rc1 pre-release

Jul 12, 2021

0.8.1

Jul 11, 2021

This version

0.8.0

Jul 9, 2021

0.7.4

Jul 3, 2021

0.7.3.2

Jul 14, 2021

0.7.3.1

Jul 14, 2021

0.7.3

Jul 2, 2021

0.7.2

Jul 2, 2021

0.7.1

Jul 2, 2021

0.7.0

Jul 2, 2021

0.6.1

Jun 22, 2021

0.6.0

Jun 17, 2021

0.5.2

Jun 3, 2021

0.5.1

May 24, 2021

0.5.0

May 22, 2021

0.4.3

Apr 20, 2021

0.4.2

Apr 18, 2021

0.4.1

Apr 12, 2021

0.4

Apr 10, 2021

0.3.1

Mar 19, 2021

0.3

Mar 18, 2021

0.2.1

Mar 16, 2021

0.2

Feb 17, 2021

0.1

Oct 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shillelagh-0.8.0.tar.gz (148.7 kB view hashes)

Uploaded Jul 9, 2021 Source

Built Distribution

shillelagh-0.8.0-py2.py3-none-any.whl (60.7 kB view hashes)

Uploaded Jul 9, 2021 Python 2 Python 3

Hashes for shillelagh-0.8.0.tar.gz

Hashes for shillelagh-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`bd80e71ec8e9e63c3447d22590f587070a7c15288ffa483adf8f8c622180a7bd`
MD5	`0fea7bebb0ae3b7eb6514541fd404316`
BLAKE2b-256	`23d015ac7133536afb5c944d293ba714714f8490667bc3dc98ebcc4232513371`

Hashes for shillelagh-0.8.0-py2.py3-none-any.whl

Hashes for shillelagh-0.8.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`6009d72eca57cc5fc4d98b9b1f01f1029e8feb21a243ce2d393293adef40e3a8`
MD5	`8ad709abc3325906443eee3925eebb26`
BLAKE2b-256	`9aa784118551c3f05e56182a63a2e31043f09a4dfc04f00ae4afa6f16c6ddae5`