Skip to main content

Malloy is an experimental language for describing data relationships and transformations

Project description

Malloy Logo

What is it?

Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.

Note: These APIs are still in development and are subject to change.

How do I get it?

Binary installers for the latest released version are available at the Python Package Index (PyPI).

python3 -m pip install malloy

Resources

Join The Community

  • Join our Malloy Slack Community! Use this community to ask questions, meet other Malloy users, and share ideas with one another.
  • Use GitHub issues to provide feedback, suggest improvements, report bugs, and start new discussions.

Syntax Examples

Run a named query from a Malloy file

import asyncio

import malloy
from malloy.data.duckdb import DuckDbConnection

async def main():
  home_dir = "/path/to/samples/duckdb/imdb"
  with malloy.Runtime() as runtime:
    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    data = await runtime.load_file(home_dir + "/imdb.malloy").run(
        named_query="genre_movie_map")

    dataframe = data.to_dataframe()
    print(dataframe)

if __name__ == "__main__":
  asyncio.run(main())

Get SQL from an in-line query, using a Malloy file as a source

import asyncio

import malloy
from malloy.data.duckdb import DuckDbConnection

async def main():
  home_dir = "/path/to/samples/duckdb/faa"
  with malloy.Runtime() as runtime:
    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    [sql, connection
    ] = await runtime.load_file(home_dir + "/flights.malloy").get_sql(query="""
                  run: flights -> {
                    where: carrier ? 'WN' | 'DL', dep_time ? @2002-03-03
                    group_by:
                      flight_date is dep_time.day
                      carrier
                    aggregate:
                      daily_flight_count is flight_count
                      aircraft.aircraft_count
                    nest: per_plane_data is {
                      limit: 20
                      group_by: tail_num
                      aggregate: plane_flight_count is flight_count
                      nest: flight_legs is {
                        order_by: 2
                        group_by:
                          tail_num
                          dep_minute is dep_time.minute
                          origin_code
                          dest_code is destination_code
                          dep_delay
                          arr_delay
                      }
                    }
                }
            """)

    print(sql)

if __name__ == "__main__":
  asyncio.run(main())

Write an in-line Malloy model, and run a query

import asyncio

import malloy
from malloy.data.duckdb import DuckDbConnection


async def main():
  home_dir = "/path/to/samples/duckdb/imdb/data"
  with malloy.Runtime() as runtime:
    runtime.add_connection(DuckDbConnection(home_dir=home_dir))

    data = await runtime.load_source("""
        source:titles is duckdb.table('titles.parquet') extend {
          primary_key: tconst
          dimension:
            movie_url is concat('https://www.imdb.com/title/',tconst)
        }
        """).run(query="""
        run: titles -> {
          group_by: movie_url
          limit: 5
        }
        """)

    dataframe = data.to_dataframe()
    print(dataframe)


if __name__ == "__main__":
  asyncio.run(main())
  

Querying BigQuary tables

BigQuery auth via OAuth using gcloud.

gcloud auth login --update-adc
gcloud config set project {my_project_id} --installation

Actual usage is similar to DuckDB.

import asyncio
import malloy
from malloy.data.bigquery import BigQueryConnection

async def main():
  with malloy.Runtime() as runtime:
    runtime.add_connection(BigQueryConnection())

    data = await runtime.load_source("""
        source:ga_sessions is bigquery.table('bigquery-public-data.google_analytics_sample.ga_sessions_20170801') extend {
          measure:
            hits_count is hits.count()
        }
        """).run(query="""
        run: ga_sessions -> {
            where: trafficSource.`source` != '(direct)'
            group_by: trafficSource.`source`
            aggregate: hits_count
            limit: 10
          }
        """)

    dataframe = data.to_dataframe()
    print(dataframe)

if __name__ == "__main__":
  asyncio.run(main())

Development

Initial setup

git submodule init
git submodule update
python3 -m pip install -r requirements.dev.txt
scripts/gen-services.sh

Regenerate Protobuf files

scripts/gen-protos.sh

Tests

python3 -m pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malloy-2024.1096.tar.gz (86.8 MB view details)

Uploaded Source

Built Distribution

malloy-2024.1096-py3-none-any.whl (86.4 MB view details)

Uploaded Python 3

File details

Details for the file malloy-2024.1096.tar.gz.

File metadata

  • Download URL: malloy-2024.1096.tar.gz
  • Upload date:
  • Size: 86.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for malloy-2024.1096.tar.gz
Algorithm Hash digest
SHA256 2fe3019a980c885e50d0198273130400fb95a80ad1d94ce36d1c0806252412ab
MD5 a0c9cdfdeca8509fc1ee23c05fb44241
BLAKE2b-256 54e5c7c7e367be1d80e38872c0441a7ff4fb865b947d98f4bee2bff5e7110176

See more details on using hashes here.

File details

Details for the file malloy-2024.1096-py3-none-any.whl.

File metadata

  • Download URL: malloy-2024.1096-py3-none-any.whl
  • Upload date:
  • Size: 86.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for malloy-2024.1096-py3-none-any.whl
Algorithm Hash digest
SHA256 fb5837a92d2fd27089dd942f31887d466dd798566e0030d604192d4df67614b8
MD5 1ee4d752af06c10c440a07751ad29e93
BLAKE2b-256 e43afcf324a8f0d4670032b89ddea2fe5b5bb37cc2fb61e9958aad9f9c56ebc3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page