sqldf for pandas
Project description
pandasql
========
pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar way of manipulating and cleaning data for people new to Python or pandas.
Installation
===========
.. code:: python
$ pip install -U pandasql
Bascis
===========
The main function used in pandasql is sqldf. sqldf accepts 2 parametrs
- a sql query string
- an set of session/environment variables (locals() or globals())
Specifying locals() or globals() can get tedious. You can defined a short helper function to fix this.
.. code:: python
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
Querying
===========
pandasql uses <a href="http://www.sqlite.org/lang.html">SQLite syntax</a>. Any pandas dataframes will be automatically detected by pandasql. You can query them as you would any regular SQL table.
.. code:: python
>>> from pandasql import sqldf, load_meat, load_births
>>> pysqldf = lambda q: sqldf(q, globals())
>>> meat = load_meat()
>>> births = load_births()
>>> print pysqldf("SELECT * FROM meat LIMIT 10;").head()
date beef veal pork lamb_and_mutton broilers other_chicken turkey
0 1944-01-01 00:00:00 751 85 1280 89 None None None
1 1944-02-01 00:00:00 713 77 1169 72 None None None
2 1944-03-01 00:00:00 741 90 1128 75 None None None
3 1944-04-01 00:00:00 650 89 978 66 None None None
4 1944-05-01 00:00:00 681 106 1029 78 None None None
joins and aggregations are also supported
.. code:: python
>>> q = """SELECT
m.date, m.beef, b.births
FROM
meats m
INNER JOIN
births b
ON m.date = b.date;"""
>>> joined = pyqldf(q)
>>> print joined.head()
date beef births
403 2012-07-01 00:00:00 2200.8 368450
404 2012-08-01 00:00:00 2367.5 359554
405 2012-09-01 00:00:00 2016.0 361922
406 2012-10-01 00:00:00 2343.7 347625
407 2012-11-01 00:00:00 2206.6 320195
>>> q = "select
strftime('%Y', date) as year
, SUM(beef) as beef_total
FROM
meat
GROUP BY
year;"
>>> print pysqldf(q).head()
year beef_total
0 1944 8801
1 1945 9936
2 1946 9010
3 1947 10096
4 1948 8766
More information and code samples available in the [examples](https://github.com/yhat/pandasql/blob/master/examples/demo.py) folder or on [our blog](http://blog.yhathq.com/posts/pandasql-sql-for-pandas-dataframes.html).
========
pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar way of manipulating and cleaning data for people new to Python or pandas.
Installation
===========
.. code:: python
$ pip install -U pandasql
Bascis
===========
The main function used in pandasql is sqldf. sqldf accepts 2 parametrs
- a sql query string
- an set of session/environment variables (locals() or globals())
Specifying locals() or globals() can get tedious. You can defined a short helper function to fix this.
.. code:: python
from pandasql import sqldf
pysqldf = lambda q: sqldf(q, globals())
Querying
===========
pandasql uses <a href="http://www.sqlite.org/lang.html">SQLite syntax</a>. Any pandas dataframes will be automatically detected by pandasql. You can query them as you would any regular SQL table.
.. code:: python
>>> from pandasql import sqldf, load_meat, load_births
>>> pysqldf = lambda q: sqldf(q, globals())
>>> meat = load_meat()
>>> births = load_births()
>>> print pysqldf("SELECT * FROM meat LIMIT 10;").head()
date beef veal pork lamb_and_mutton broilers other_chicken turkey
0 1944-01-01 00:00:00 751 85 1280 89 None None None
1 1944-02-01 00:00:00 713 77 1169 72 None None None
2 1944-03-01 00:00:00 741 90 1128 75 None None None
3 1944-04-01 00:00:00 650 89 978 66 None None None
4 1944-05-01 00:00:00 681 106 1029 78 None None None
joins and aggregations are also supported
.. code:: python
>>> q = """SELECT
m.date, m.beef, b.births
FROM
meats m
INNER JOIN
births b
ON m.date = b.date;"""
>>> joined = pyqldf(q)
>>> print joined.head()
date beef births
403 2012-07-01 00:00:00 2200.8 368450
404 2012-08-01 00:00:00 2367.5 359554
405 2012-09-01 00:00:00 2016.0 361922
406 2012-10-01 00:00:00 2343.7 347625
407 2012-11-01 00:00:00 2206.6 320195
>>> q = "select
strftime('%Y', date) as year
, SUM(beef) as beef_total
FROM
meat
GROUP BY
year;"
>>> print pysqldf(q).head()
year beef_total
0 1944 8801
1 1945 9936
2 1946 9010
3 1947 10096
4 1948 8766
More information and code samples available in the [examples](https://github.com/yhat/pandasql/blob/master/examples/demo.py) folder or on [our blog](http://blog.yhathq.com/posts/pandasql-sql-for-pandas-dataframes.html).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandasql-0.3.1.tar.gz
(23.3 kB
view details)
Built Distribution
pandasql-0.3.1-py2.7.egg
(26.9 kB
view details)
File details
Details for the file pandasql-0.3.1.tar.gz
.
File metadata
- Download URL: pandasql-0.3.1.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
dcda76304ba08e719186d0c33e7a02115f4cbf8f708c38687a6e209d1b2166ce
|
|
MD5 |
87a52197d59f9b95195ba3463f50cf87
|
|
BLAKE2b-256 |
ac4f0c978cc7b5d460ff8796fc69893e2c60f6f70740edaba3519b103756c492
|
File details
Details for the file pandasql-0.3.1-py2.7.egg
.
File metadata
- Download URL: pandasql-0.3.1-py2.7.egg
- Upload date:
- Size: 26.9 kB
- Tags: Egg
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
9c0d3b2d78609a2ebb08b0689c5053d116b611c1cfed6fa320fb7f30dcf6f9d7
|
|
MD5 |
711b3ab305d91d9d290edf1abcd6946e
|
|
BLAKE2b-256 |
9239ce9d62391ebb7e4842ccbeec0d8c147b2236f7202659657723899b5eae2d
|