Malloy is an experimental language for describing data relationships and transformations
Project description
What is it?
Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.
Note: These APIs are still in development and are subject to change.
How do I get it?
Binary installers for the latest released version are available at the Python Package Index (PyPI).
python3 -m pip install malloy
Resources
- Malloy Language GitHub - Primary location for the malloy language source, documentation, and information
- Malloy Language - A quick introduction to the language
- eCommerce Example Analysis - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)
- Modeling Walkthrough - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)
- Malloy on YouTube - Watch demos / walkthroughs of Malloy
Join The Community
- Join our Malloy Slack Community! Use this community to ask questions, meet other Malloy users, and share ideas with one another.
- Use GitHub issues to provide feedback, suggest improvements, report bugs, and start new discussions.
Syntax Examples
Run named query from malloy file:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_file(home_dir + "/5_movie_complex.malloy").run(
named_query="horror_combo")
dataframe = data.df()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Get SQL from inline query using malloy file as source:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/faa"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
[sql, connection
] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
query: flights -> {
where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
group_by:
flight_date is dep_time.day
carrier
aggregate:
daily_flight_count is flight_count
aircraft.aircraft_count
nest: per_plane_data is {
top: 20
group_by: tail_num
aggregate: plane_flight_count is flight_count
nest: flight_legs is {
order_by: 2
group_by:
tail_num
dep_minute is dep_time.minute
origin_code
dest_code is destination_code
dep_delay
arr_delay
}
}
}
""")
print(sql)
if __name__ == "__main__":
asyncio.run(main())
Write inline malloy model source and run query:
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/auto_recalls"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_source("""
source: auto_recalls is table('duckdb:auto_recalls.csv') {
declare:
recall_count is count()
percent_of_recalls is recall_count/all(recall_count)*100
}
""").run(query="""
query: auto_recalls -> {
group_by: Manufacturer
aggregate:
recall_count
percent_of_recalls
}
""")
dataframe = data.df()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Development
Initial setup
git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh
Regenerate Protobuf files
scripts/gen-protos.sh
Tests
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for malloy-2024.1072-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62124762a66c37f5942dfc9a9e09df3d9a255ac11b954e8855aba47ec80b2b6c |
|
MD5 | 4ae4a5b7fd80fe4bd781bddd1544d49a |
|
BLAKE2b-256 | d7685db78e084f716212e2302d9176016eb331ef8955976b06484758c2332f11 |