Tools for working with pydata.pandas in your Django projects
Project description
Tools for working with pandas in your Django projects
Contributors
Dependencies
django-pandas supports Django (>=1.4.5) or later and requires django-model-utils (>= 1.4.0) and Pandas (0.12.0). Note because of problems with the requires directive of setuptools you probably need to install numpy in your virtualenv before you install this package or if you want to run the test suite
pip install numpy python setup.py test
Some pandas functionality requires parts of the Scipy stack. You may wish to consult http://www.scipy.org/install.html for more information on installing the Scipy stack.
Contributing
Please file bugs and send pull requests to the GitHub repository and issue tracker.
Installation
Start by creating a new virtualenv for your project
mkvirtualenv myproject
Next install numpy and optionally pandas
pip install numpy pip install pandas
You may want to consult the scipy documentation for more information on installing the Scipy stack.
Finally, install the development version of django-pandas from the github repository using pip:
pip install https://github.com/chrisdev/django-pandas/tarball/master
Usage
To use django-pandas in your Django project, modify the INSTALLED_APPS in your settings module to include django_pandas.
DataFrameManager
django-pandas provides a custom manager to use with models that you want to render as Pandas Dataframes. The DataFrameManager manager provides the to_dataframe method that returns your models queryset as a Pandas DataFrame. To use the DataFrameManager, first override the default manager in your model’s definition as shown in the example below
#models.py from django_pandas.managers import DataFrameManager class MyModel(models.Model): full_name = models.CharField(max_length=25) age = models.IntegerField() department = models.CharField(max_length=3) wage = models.FloatField() objects = DataFrameManager()
This will qive you access to the following QuerySet methods:
to_datafame
to_timeseries
to_pivot table
to_dataframe
Returns a DataFrame from the QuerySet
Parameters
- fields: The model fields to utilise in creating the frame.
to span a relationship, just use the field name of related fields across models, separated by double underscores,
- index: specify the field to use for the index. If the index
field is not in the field list it will be appended
- fill_na: fill in missing observations using one of the following
this is a string specifying a pandas fill method {‘backfill, ‘bill’, ‘pad’, ‘ffill’} or a scalar value
- coerce_float: Attempt to convert the numeric non-string fields
like object, decimal etc. to float if possible
Examples
Create a dataframe using all the fields in your model as follows
df = MyModel.to_dataframe()
This will include you primary key create a DataFrame only from secified fields:
df = MyData.to_dataframe('age', 'department', 'wage')
To set full_name as the index
MyData.to_dataframe('age', 'department', 'wage', index='full_name')
You can use filters and excludes
MyData.filter(age__gt=20, department='IT').to_dataframe(index='full_name')
to_timeseries
A convenience method for creating a time series i.e the DataFrame index is instance of a DateTime or PeriodIndex
Parameters
- fields: The model fields to utilise in creating the frame.
to span a relationship, just use the field name of related fields across models, separated by double underscores,
- index: specify the field to use for the index. If the index
field is not in the field list it will be appended. This is mandatory.
- storage: Specify if the queryset uses the wide or long format
for data.
- pivot_column: Required once the you specify long format
storage. This could either be a list or string identifying the field name or combination of field. If the pivot_column is a single column then the unique values in this column become a new columns in the DataFrame If the pivot column is a list the values in these columns are concatenated (using the ‘-’ as a separator) and these values are used for the new timeseries columns
- values: Also required if you utilize the long storage the
values column name is use for populating new frame values
freq: the offset string or object representing a target conversion
rs_kwargs: Arguments based on pandas.DataFrame.resample
Examples
Using a long storage format
#models.py class LongTimeSeries(models.Model): date_ix = models.DateTimeField() series_name = models.CharField(max_length=100) value = models.FloatField() objects = DataFrameManager()
Some sample data::
======== ===== ===== date mame value ======== ===== ====== 2010-01-01 gdp 204699 2010-01-01 inflation 2.0 2010-01-01 wages 100.7 2010-02-01 gdp 204704 2010-02-01 inflation 2.4 2010-03-01 wages 100.4 2010-02-01 gdp 205966 2010-02-01 inflation 2.5 2010-03-01 wages 100.5 ========== ========== ======
Create a QuerySet
qs = LongTimeSeries.objects.filter(date_ix__year__gte=2010)
Create a timeseries dataframe
df = qs.to_timeseries(index='date_ix', pivot_columns='series_name', values='value', storage='long') df.head() date gdp inflation wages 2010-01-01 204966 2.0 100.7 2010-02-01 204704 2.4 100.4 2010-03-01 205966 2.5 100.5
Using a wide storage format
class WideTimeSeries(models.Model): date_ix = models.DateTimeField() col1 = models.FloatField() col2 = models.FloatField() col3 = models.FloatField() col4 = models.FloatField() objects = DataFrameManager() qs = WideTimeSeries.objects.all() rs_kwargs = {'how': 'sum', 'kind': 'period'} df = qs.to_timeseries(index='date_ix', pivot_columns='series_name', values='value', storage='long', freq='M', rs_kwargs=rs_kwargs)
to_pivot_table
A convenience method for creating a pivot table from a QuerySet
Parameters
- fields: The model fields to utilise in creating the frame.
to span a relationship, just use the field name of related fields across models, separated by double underscores,
values : column to aggregate, optional
- rowslist of column names or arrays to group on
Keys to group on the x-axis of the pivot table
- colslist of column names or arrays to group on
Keys to group on the y-axis of the pivot table
- aggfuncfunction, default numpy.mean, or list of functions
If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
- fill_valuescalar, default None
Value to replace missing values with
- marginsboolean, default False
Add all row / columns (e.g. for subtotal / grand totals)
dropna : boolean, default True
Example
# models.py class PivotData(models.Model): row_col_a = models.CharField(max_length=15) row_col_b = models.CharField(max_length=15) row_col_c = models.CharField(max_length=15) value_col_d = models.FloatField() value_col_e = models.FloatField() value_col_f = models.FloatField() objects = DataFrameManager()
Usage
rows = ['row_col_a', 'row_col_b'] cols = ['row_col_c'] pt = qs.to_pivot_table(values='value_col_d', rows=rows, cols=cols)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.