Skip to main content

Productivity-centric Python Big Data Framework

Project description

Ibis

Documentation status Project chat Anaconda badge PyPI Build status Build status Codecov branch

What is Ibis?

Ibis is the portable Python dataframe library:

See the documentation on "Why Ibis?" to learn more.

Getting started

You can pip install Ibis with a backend and example data:

pip install 'ibis-framework[duckdb,examples]'

๐Ÿ’ก Tip

See the installation guide for more installation options.

Then use Ibis:

>>> import ibis
>>> ibis.options.interactive = True
>>> t = ibis.examples.penguins.fetch()
>>> t
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ species โ”ƒ island    โ”ƒ bill_length_mm โ”ƒ bill_depth_mm โ”ƒ flipper_length_mm โ”ƒ body_mass_g โ”ƒ sex    โ”ƒ year  โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ string  โ”‚ string    โ”‚ float64        โ”‚ float64       โ”‚ int64             โ”‚ int64       โ”‚ string โ”‚ int64 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Adelie  โ”‚ Torgersen โ”‚           39.1 โ”‚          18.7 โ”‚               181 โ”‚        3750 โ”‚ male   โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           39.5 โ”‚          17.4 โ”‚               186 โ”‚        3800 โ”‚ female โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           40.3 โ”‚          18.0 โ”‚               195 โ”‚        3250 โ”‚ female โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           NULL โ”‚          NULL โ”‚              NULL โ”‚        NULL โ”‚ NULL   โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           36.7 โ”‚          19.3 โ”‚               193 โ”‚        3450 โ”‚ female โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           39.3 โ”‚          20.6 โ”‚               190 โ”‚        3650 โ”‚ male   โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           38.9 โ”‚          17.8 โ”‚               181 โ”‚        3625 โ”‚ female โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           39.2 โ”‚          19.6 โ”‚               195 โ”‚        4675 โ”‚ male   โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           34.1 โ”‚          18.1 โ”‚               193 โ”‚        3475 โ”‚ NULL   โ”‚  2007 โ”‚
โ”‚ Adelie  โ”‚ Torgersen โ”‚           42.0 โ”‚          20.2 โ”‚               190 โ”‚        4250 โ”‚ NULL   โ”‚  2007 โ”‚
โ”‚ โ€ฆ       โ”‚ โ€ฆ         โ”‚              โ€ฆ โ”‚             โ€ฆ โ”‚                 โ€ฆ โ”‚           โ€ฆ โ”‚ โ€ฆ      โ”‚     โ€ฆ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
>>> g = t.group_by("species", "island").agg(count=t.count()).order_by("count")
>>> g
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ species   โ”ƒ island    โ”ƒ count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ string    โ”‚ string    โ”‚ int64 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Adelie    โ”‚ Biscoe    โ”‚    44 โ”‚
โ”‚ Adelie    โ”‚ Torgersen โ”‚    52 โ”‚
โ”‚ Adelie    โ”‚ Dream     โ”‚    56 โ”‚
โ”‚ Chinstrap โ”‚ Dream     โ”‚    68 โ”‚
โ”‚ Gentoo    โ”‚ Biscoe    โ”‚   124 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ’ก Tip

See the getting started tutorial for a full introduction to Ibis.

Python + SQL: better together

For most backends, Ibis works by compiling its dataframe expressions into SQL:

>>> ibis.to_sql(g)
SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" ASC

You can mix SQL and Python code:

>>> a = t.sql("SELECT species, island, count(*) AS count FROM penguins GROUP BY 1, 2")
>>> a
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ species   โ”ƒ island    โ”ƒ count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ string    โ”‚ string    โ”‚ int64 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Adelie    โ”‚ Torgersen โ”‚    52 โ”‚
โ”‚ Adelie    โ”‚ Biscoe    โ”‚    44 โ”‚
โ”‚ Adelie    โ”‚ Dream     โ”‚    56 โ”‚
โ”‚ Gentoo    โ”‚ Biscoe    โ”‚   124 โ”‚
โ”‚ Chinstrap โ”‚ Dream     โ”‚    68 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
>>> b = a.order_by("count")
>>> b
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ species   โ”ƒ island    โ”ƒ count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ string    โ”‚ string    โ”‚ int64 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Adelie    โ”‚ Biscoe    โ”‚    44 โ”‚
โ”‚ Adelie    โ”‚ Torgersen โ”‚    52 โ”‚
โ”‚ Adelie    โ”‚ Dream     โ”‚    56 โ”‚
โ”‚ Chinstrap โ”‚ Dream     โ”‚    68 โ”‚
โ”‚ Gentoo    โ”‚ Biscoe    โ”‚   124 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

This allows you to combine the flexibility of Python with the scale and performance of modern SQL.

Backends

Ibis supports nearly 20 backends:

How it works

Most Python dataframes are tightly coupled to their execution engine. And many databases only support SQL, with no Python API. Ibis solves this problem by providing a common API for data manipulation in Python, and compiling that API into the backendโ€™s native language. This means you can learn a single API and use it across any supported backend (execution engine).

Ibis broadly supports two types of backend:

  1. SQL-generating backends
  2. DataFrame-generating backends

Ibis backend types

Portability

To use different backends, you can set the backend Ibis uses:

>>> ibis.set_backend("duckdb")
>>> ibis.set_backend("polars")
>>> ibis.set_backend("datafusion")

Typically, you'll create a connection object:

>>> con = ibis.duckdb.connect()
>>> con = ibis.polars.connect()
>>> con = ibis.datafusion.connect()

And work with tables in that backend:

>>> con.list_tables()
['penguins']
>>> t = con.table("penguins")

You can also read from common file formats like CSV or Apache Parquet:

>>> t = con.read_csv("penguins.csv")
>>> t = con.read_parquet("penguins.parquet")

This allows you to iterate locally and deploy remotely by changing a single line of code.

๐Ÿ’ก Tip

Check out the blog on backend agnostic arrays for one example using the same code across DuckDB and BigQuery.

Community and contributing

Ibis is an open source project and welcomes contributions from anyone in the community.

Join our community by interacting on GitHub or chatting with us on Zulip.

For more information visit https://ibis-project.org/.

Governance

The Ibis project is an independently governed open source community project to build and maintain the portable Python dataframe library. Ibis has contributors across a range of data companies and institutions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turntable_spoonbill-10.0.3.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

turntable_spoonbill-10.0.3-py2.py3-none-any.whl (1.9 MB view details)

Uploaded Python 2Python 3

File details

Details for the file turntable_spoonbill-10.0.3.tar.gz.

File metadata

  • Download URL: turntable_spoonbill-10.0.3.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for turntable_spoonbill-10.0.3.tar.gz
Algorithm Hash digest
SHA256 39110da7bbc7e41c92269a7782af4afd6c7b1edb03d8612ffbac001c64956ec1
MD5 a3009e7ec4956e061ba5b0c01161cf18
BLAKE2b-256 6fadf3e843964563e487e2b14ae05320308e52cc8a14e3685156e702728f964e

See more details on using hashes here.

Provenance

The following attestation bundles were made for turntable_spoonbill-10.0.3.tar.gz:

Publisher: publish.yml on turntable-so/spoonbill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file turntable_spoonbill-10.0.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for turntable_spoonbill-10.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c4180575230d39ed8499aa781cb75f32aebd15e812ca017558b65377679ce67a
MD5 51960e613b13b2c8f58500fadec9b855
BLAKE2b-256 10db40aa145e81ca17b26381721387301863ea2586a11d48e1905033b589714c

See more details on using hashes here.

Provenance

The following attestation bundles were made for turntable_spoonbill-10.0.3-py2.py3-none-any.whl:

Publisher: publish.yml on turntable-so/spoonbill

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page