An SQL-focused data analysis library for Python.
Project description
Overview
faro
is a wrapper for the Python SQLite API, and aims to be a SQL-driven data analysis library for Python. It is intended to complement the existing data analysis packages in the Python eco-system, such as numpy
and pandas
. With faro
you can use pure SQL to work with a collection of table objects in memory and easily interoperate with numpy
and pandas
when needed. Lastly, faro
plays nicely with IPython
so you can easily interact and explore your query results.
Usage
Create a Database
object and give it a name.
from faro import Database
db = Database('transportation')
To add tables to the in-memory database, simply specify the name of the file. Supported file types include: csv
, json
, and xlsx
. add_table
inserts the contents of a file into a new table within the database. It can automatically detect the filetype and parse the file contents accordingly. In this example we load two different tables, one in csv
format, and the other in json
format.
db.add_table('cars.json', name='cars')
db.add_table('airports.csv', name='airports')
We can also directly pass pandas.DataFrame
or faro.Table
objects to be added to the database. A helpful pattern when dealing with more complex parsing for a specific file is to read it into memory using pandas
then add the DataFrame
to the faro.Database
.
buses = pd.DataFrame({
'id' : [1, 2, 3, 4, 5],
'from' : ['Houston', 'Atlanta', 'Chicago', 'Boston', 'New York'],
'to' : ['San Antonio', 'Charlotte', 'Milwaukee', 'Cape Cod', 'Buffalo']
})
db.add_table(buses, name='buses')
We can now query against any table in the database using pure SQL, and easily interact with the results in a Jupyter Notebook.
sql = """
SELECT iata,
name,
city,
state
FROM airports
WHERE country = 'USA'
LIMIT 5
"""
db.query(sql)
iata | name | city | state | |
---|---|---|---|---|
0 | 00M | Thigpen | Bay Springs | MS |
1 | 00R | Livingston Municipal | Livingston | TX |
2 | 00V | Meadow Lake | Colorado Springs | CO |
3 | 01G | Perry-Warsaw | Perry | NY |
4 | 01J | Hilliard Airpark | Hilliard | FL |
If we want to interact with the data returned by the query, we can easily transform it into whatever data type is most convenient for the situation. Supported type conversions include: List[Tuple]
, Dict[List]
, numpy.ndarray
, and pandas.DataFrame
.
table = db.query(sql)
type(table)
>>> faro.table.Table
df = table.to_dataframe()
type(df)
>>> pandas.core.frame.DataFrame
matrix = table.to_numpy()
type(matrix)
>>> numpy.ndarray
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.