Skip to main content

Productivity-centric Python Big Data Framework

Project description

Ibis

Documentation Status Anaconda-Server Badge PyPI Build status Build status Codecov branch

What is Ibis?

Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It helps Python users explore and transform data of any size, stored anywhere.

Ibis has three primary components:

  1. A dataframe API for Python. This means that Python users can write Ibis code to manipulate tabular data.
  2. Interfaces to 10+ query engines. This means that wherever data is stored, data scientists can use Ibis as their API of choice to communicate with any of those query engines.
  3. Deferred execution. Ibis uses deferred execution, meaning that execution of code is pushed to the query engine. This means users can execute at the speed of their backend, not their local computer.

Why Use Ibis?

Ibis aims to be a future-proof solution to interacting with data using Python and can accomplish this goal through its main features:

  • Familiar API: Ibis’s API design borrows from popular APIs like pandas and dplyr that most users already know and like to use.
  • Consistent syntax: Ibis aims to be universal Python API for tabular data, big or small.
  • Deferred execution: Ibis pushes code execution to the query engine and only moves required data into memory when it has to. This leads to more faster, more efficient analytics workflows
  • Interactive mode: Ibis also provides an interactive mode, in which users can quickly diagnose problems, do exploratory data analysis, and mock up workflows locally.
  • 10+ supported backends: Ibis supports multiple query engines and DataFrame APIs. Use one interface to transform with your data wherever it lives: from DataFrames in pandas to parquet files through DuckDB to tables in BigQuery.
  • Minimize rewrites: Depending on backend capabilities, teams can often keep most of their Ibis code the same whether a team changes anything on the backend, like increasing or decreasing computing power, changing the number or size of their databases, or switching backend engines.

Common Use Cases

  • Speed up prototype to production. Scale code written and tested locally to the cloud of distributed systems with minimum rewrites.
  • Boost performance of existing Python or pandas code. For example a general rule of thumb for pandas is "Have 5 to 10 times as much RAM as the size of your dataset". When a dataset exceeds this rule, using in-memory frameworks, like pandas, can be slow. Instead, using Ibis will significantly speed up your workflows because of its deferred execution. Ibis also empowers you to switch to a faster database engine, without changing much of your code.
  • Get rid of long, error-prone, fstrings. Ibis provides one syntax for multiple query engines and dataframe APIs that lets you avoid learning new flavors of SQL or other framework-specific code. Learn the syntax once and use that syntax anywhere.

Backends

Ibis acts as a universal frontend to the following systems:

The list of supported backends is continuously growing. Anyone can get involved in adding new ones! Learn more about contributing to ibis in our contributing docs at https://github.com/ibis-project/ibis/blob/master/docs/CONTRIBUTING.md

Installation

Install Ibis from PyPI with:

pip install ibis-framework

Or from conda-forge with:

conda install ibis-framework -c conda-forge

(It’s a common mistake to pip install ibis. If you try to use Ibis and get errors early on try uninstalling ibis and installing ibis-framework)

For specific backends, include the backend name in brackets for PyPI:

pip install ibis-framework[duckdb]

Or use ibis-$BACKEND where $BACKEND the specific backend you want to use:

conda install ibis-postgres -c conda-forge

Getting Started with Ibis

You can find a number of helpful tutorials on the Ibis website here including:

You can also get started analyzing any dataset, anywhere with just a few lines of Ibis code. Here’s an example of how to use Ibis with an SQLite database.

Download the SQLite database from the ibis-tutorial-data GCS (Google Cloud Storage) bucket, then connect to it using ibis.

# make a directory called geo_dir and add the geography database to that folder
mkdir -p geo_dir
curl -LsS -o geo_dir/geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'

Connect to the the database and show the available tables

>>> import ibis
>>> ibis.options.interactive = True
>>> connection = ibis.sqlite.connect('geo_dir/geography.db')
>>> connection.list_tables()
['countries', 'gdp', 'independence']

Choose the countries table and preview its first few rows

>>> countries = connection.table('countries')
countries.head()
iso_alpha2 iso_alpha3 iso_numeric fips name capital area_km2 population continent
0 AD AND 20 AN Andorra Andorra la Vella 468 84000 EU
1 AE ARE 784 AE United Arab Emirates Abu Dhabi 82880 4975593 AS
2 AF AFG 4 AF Afghanistan Kabul 647500 29121286 AS
3 AG ATG 28 AC Antigua and Barbuda St. Johns 443 86754 NA
4 AI AIA 660 AV Anguilla The Valley 102 13254 NA
# Select the name, continent and population columns and filter them to only return countries from Asia

asian_countries = countries['name', 'continent', 'population'].filter(countries['continent'] == 'AS')
asian_countries.limit(6)
name continent population
0 United Arab Emirates AS 4975593
1 Afghanistan AS 29121286
2 Armenia AS 2968000
3 Azerbaijan AS 8303512
4 Bangladesh AS 156118464
5 Bahrain AS 738004

Community and Contributing

Ibis is an open source project and welcomes contributions from anyone in the community. Read more about how you can contribute here. We care about keeping our community welcoming for all to participate and have a code of conduct to ensure this. The Ibis project is open sourced under the Apache License.

Join our community here:

For more information visit our official website here.

Project details


Release history Release notifications | RSS feed

This version

4.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibis_framework-4.0.0.tar.gz (779.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ibis_framework-4.0.0-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file ibis_framework-4.0.0.tar.gz.

File metadata

  • Download URL: ibis_framework-4.0.0.tar.gz
  • Upload date:
  • Size: 779.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.10.9 Linux/5.15.0-1024-azure

File hashes

Hashes for ibis_framework-4.0.0.tar.gz
Algorithm Hash digest
SHA256 edefdbb6e0970a4af9e5f72dac068b1df35905cab4895e17e27b0ebb33315c09
MD5 7850d8501198f2763dc993fb742cb864
BLAKE2b-256 42bf71d106f28ecaac3c20f20348e46c6369cb3a138cfa01097efb0cd9105b1f

See more details on using hashes here.

File details

Details for the file ibis_framework-4.0.0-py3-none-any.whl.

File metadata

  • Download URL: ibis_framework-4.0.0-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.0 CPython/3.10.9 Linux/5.15.0-1024-azure

File hashes

Hashes for ibis_framework-4.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 522bf41b6f3a75b81ad1eaa80cd7f418e1ced5ce9de12a87a5732ed32a590f5a
MD5 cdd8910efd8e5c967da5313884e6c882
BLAKE2b-256 7de252c5b8084532e3e66dff480274806df7735c74fe1f2d7aacae4194183c1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page