Skip to main content

An SQL-focused data analysis library for Python.

Project description

Overview

faro is a wrapper for the Python SQLite API, and aims to be a SQL-driven data analysis library for Python. It is intended to complement the existing data analysis packages in the Python eco-system, such as numpy and pandas. With faro you can use pure SQL to work with a collection of table objects in memory and easily interoperate with numpy and pandas when needed. Lastly, faro plays nicely with IPython so you can easily interact and explore your query results.

Usage

Create a Database object and give it a name.

from faro import Database

db = Database('transportation')

To add tables to the in-memory database, simply specify the name of the file. Supported file types include: csv, json, and xlsx. add_table inserts the contents of a file into a new table within the database. It can automatically detect the filetype and parse the file contents accordingly. In this example we load two different tables, one in csv format, and the other in json format.

db.add_table('cars.json', name='cars')
db.add_table('airports.csv', name='airports')

We can also directly pass pandas.DataFrame or faro.Table objects to be added to the database. A helpful pattern when dealing with more complex parsing for a specific file is to read it into memory using pandas then add the DataFrame to the faro.Database.

buses = pd.DataFrame({
  'id' : [1, 2, 3, 4, 5],
  'from' : ['Houston', 'Atlanta', 'Chicago', 'Boston', 'New York'],
  'to' : ['San Antonio', 'Charlotte', 'Milwaukee', 'Cape Cod', 'Buffalo']
})

db.add_table(buses, name='buses')

We can now query against any table in the database using pure SQL, and easily interact with the results in a Jupyter Notebook.

sql = """
SELECT iata,
       name,
       city,
       state
  FROM airports
 WHERE country = 'USA'
 LIMIT 5
"""
db.query(sql)
iata name city state
0 00M Thigpen Bay Springs MS
1 00R Livingston Municipal Livingston TX
2 00V Meadow Lake Colorado Springs CO
3 01G Perry-Warsaw Perry NY
4 01J Hilliard Airpark Hilliard FL

If we want to interact with the data returned by the query, we can easily transform it into whatever data type is most convenient for the situation. Supported type conversions include: List[Tuple], Dict[List], numpy.ndarray, and pandas.DataFrame.

table = db.query(sql)
type(table)
>>> faro.table.Table

df = table.to_dataframe()
type(df)
>>> pandas.core.frame.DataFrame

matrix = table.to_numpy()
type(matrix)
>>> numpy.ndarray

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for faro, version 0.0.2
Filename, size & hash File type Python version Upload date
faro-0.0.2-py3-none-any.whl (6.4 kB) View hashes Wheel py3
faro-0.0.2.tar.gz (5.3 kB) View hashes Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page