Malloy is an experimental language for describing data relationships and transformations
Project description
What is it?
Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.
Note: These APIs are still in development and are subject to change.
How do I get it?
Binary installers for the latest released version are available at the Python Package Index (PyPI).
python3 -m pip install malloy
Resources
- Malloy Language GitHub - Primary location for the malloy language source, documentation, and information
- Malloy Language - A quick introduction to the language
- eCommerce Example Analysis - A walkthrough of the basics on an ecommerce dataset (BigQuery public dataset)
- Modeling Walkthrough - An introduction to modeling via the Iowa liquor sales public data set (BigQuery public dataset)
- Malloy on YouTube - Watch demos / walkthroughs of Malloy
Join The Community
- Join our Malloy Slack Community! Use this community to ask questions, meet other Malloy users, and share ideas with one another.
- Use GitHub issues to provide feedback, suggest improvements, report bugs, and start new discussions.
Syntax Examples
Run a named query from a Malloy file
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_file(home_dir + "/imdb.malloy").run(
named_query="genre_movie_map")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Get SQL from an in-line query, using a Malloy file as a source
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/faa"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
[sql, connection
] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
run: flights -> {
where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
group_by:
flight_date is dep_time.day
carrier
aggregate:
daily_flight_count is flight_count
aircraft.aircraft_count
nest: per_plane_data is {
limit: 20
group_by: tail_num
aggregate: plane_flight_count is flight_count
nest: flight_legs is {
order_by: 2
group_by:
tail_num
dep_minute is dep_time.minute
origin_code
dest_code is destination_code
dep_delay
arr_delay
}
}
}
""")
print(sql)
if __name__ == "__main__":
asyncio.run(main())
Write an in-line Malloy model, and run a query
import asyncio
import malloy
from malloy.data.duckdb import DuckDbConnection
async def main():
home_dir = "/path/to/samples/duckdb/imdb/data"
with malloy.Runtime() as runtime:
runtime.add_connection(DuckDbConnection(home_dir=home_dir))
data = await runtime.load_source("""
source:titles is duckdb.table('titles.parquet') extend {
primary_key: tconst
dimension:
movie_url is concat('https://www.imdb.com/title/',tconst)
}
""").run(query="""
run: titles -> {
group_by: movie_url
limit: 5
}
""")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Querying BigQuary tables
BigQuery auth via OAuth using gcloud.
gcloud auth login --update-adc
gcloud config set project {my_project_id} --installation
Actual usage is similar to DuckDB.
import asyncio
import malloy
from malloy.data.bigquery import BigQueryConnection
async def main():
with malloy.Runtime() as runtime:
runtime.add_connection(BigQueryConnection())
data = await runtime.load_source("""
source:ga_sessions is bigquery.table('bigquery-public-data.google_analytics_sample.ga_sessions_20170801') extend {
measure:
hits_count is hits.count()
}
""").run(query="""
run: ga_sessions -> {
where: trafficSource.`source` != '(direct)'
group_by: trafficSource.`source`
aggregate: hits_count
limit: 10
}
""")
dataframe = data.to_dataframe()
print(dataframe)
if __name__ == "__main__":
asyncio.run(main())
Development
Initial setup
git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh
Regenerate Protobuf files
scripts/gen-protos.sh
Tests
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file malloy-2024.1096.tar.gz
.
File metadata
- Download URL: malloy-2024.1096.tar.gz
- Upload date:
- Size: 86.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fe3019a980c885e50d0198273130400fb95a80ad1d94ce36d1c0806252412ab |
|
MD5 | a0c9cdfdeca8509fc1ee23c05fb44241 |
|
BLAKE2b-256 | 54e5c7c7e367be1d80e38872c0441a7ff4fb865b947d98f4bee2bff5e7110176 |
File details
Details for the file malloy-2024.1096-py3-none-any.whl
.
File metadata
- Download URL: malloy-2024.1096-py3-none-any.whl
- Upload date:
- Size: 86.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb5837a92d2fd27089dd942f31887d466dd798566e0030d604192d4df67614b8 |
|
MD5 | 1ee4d752af06c10c440a07751ad29e93 |
|
BLAKE2b-256 | e43afcf324a8f0d4670032b89ddea2fe5b5bb37cc2fb61e9958aad9f9c56ebc3 |