Malloy is an experimental language for describing data relationships and transformations
Project description
What is it?
Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.
Note: These APIs are still in development and are subject to change.
How do I get it?
Binary installers for the latest released version are available at the Python Package Index (PyPI).
python3 -m pip install malloy
Resources
- Malloy Language GitHub - Primary location for the malloy language source, documentation, and information
- Malloy Language - A quick introduction to the language
- eCommerce Example Analysis - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)
- Modeling Walkthrough - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)
- Malloy on YouTube - Watch demos / walkthroughs of Malloy
Join The Community
- Join our Malloy Slack Community! Use this community to ask questions, meet other Malloy users, and share ideas with one another.
- Use GitHub issues to provide feedback, suggest improvements, report bugs, and start new discussions.
Syntax Examples
Run a named query from a Malloy file
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_file(home_dir + "/imdb.malloy").run(
named_query="genre_movie_map")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Get SQL from an in-line query, using a Malloy file as a source
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/faa"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
[sql, connection
] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
run: flights -> {
where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
group_by:
flight_date is dep_time.day
carrier
aggregate:
daily_flight_count is flight_count
aircraft.aircraft_count
nest: per_plane_data is {
limit: 20
group_by: tail_num
aggregate: plane_flight_count is flight_count
nest: flight_legs is {
order_by: 2
group_by:
tail_num
dep_minute is dep_time.minute
origin_code
dest_code is destination_code
dep_delay
arr_delay
}
}
}
""")
print(sql)
if __name__ == "__main__":
asyncio.run(main())
Write an in-line Malloy model, and run a query
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb/data"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_source("""
source:titles is duckdb.table('titles.parquet') extend {
primary_key: tconst
dimension:
movie_url is concat('https://www.imdb.com/title/',tconst)
}
""").run(query="""
run: titles -> {
group_by: movie_url
limit: 5
}
""")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Querying BigQuary tables
BigQuery auth via OAuth using gcloud.
gcloud auth login --update-adc
gcloud config set project {my_project_id} --installation
Actual usage is similar to DuckDB.
import asyncio
import malloy
from malloy.data.bigquery import BigQueryConnection
async def main():
with malloy.Runtime() as runtime:
runtime.add_connection(BigQueryConnection())
data = await runtime.load_source("""
source:ga_sessions is bigquery.table('bigquery-public-data.google_analytics_sample.ga_sessions_20170801') extend {
measure:
hits_count is hits.count()
}
""").run(query="""
run: ga_sessions -> {
where: trafficSource.`source` != '(direct)'
group_by: trafficSource.`source`
aggregate: hits_count
limit: 10
}
""")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Development
Initial setup
git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh
Regenerate Protobuf files
scripts/gen-protos.sh
Tests
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for malloy-2024.1092-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d1ffd94a2fb1b87ad08fe83c68ef0467582a1a053861a9c433a4d8e932a1011 |
|
MD5 | b4b470a3140abeb5d68525d21103f103 |
|
BLAKE2b-256 | d7e0bbb1361eed0a3b237e7fb2af3e357c2059e797a6d4e5b39af16481c4c5ff |