Skip to main content

Serves up pandas dataframes via the Django REST Framework for client-side(i.e. d3.js) visualizations

Project description

Django REST Framework + pandas = A Model-driven Visualization API

Django REST Pandas (DRP) provides a simple way to generate and serve pandas DataFrames via the Django REST Framework. The resulting API can serve up CSV (and a number of other formats) for consumption by a client-side visualization tool like d3.js.

The design philosophy of DRP enforces a strict separation between data and presentation. This keeps the implementation simple, but also has the nice side effect of making it trivial to provide the source data for your visualizations. This capability can often be leveraged by sending users to the same URL that your visualization code uses internally to load the data.

DRP does not include any JavaScript code, leaving the implementation of interactive visualizations as an exercise for the implementer. That said, DRP is commonly used in conjunction with the wq.app library, which provides wq/chart.js and wq/pandas.js, a collection of chart functions and data loaders that work well with CSV served by DRP.

Latest PyPI Release Release Notes License GitHub Stars GitHub Forks GitHub Issues

Travis Build Status Python Support Django Support Django REST Framework Support

Live Demo

The climata-viewer project uses Django REST Pandas and wq/chart.js to provide interactive visualizations and spreadsheet downloads.

Supported Formats

The following output formats are provided by default. These are provided as renderer classes in order to leverage the content type negotiation built into Django REST Framework. This means clients can specify a format via:

  • an HTTP “Accept” header (Accept: text/csv),

  • a format parameter (/path?format=csv), or

  • a format extension (/path.csv)

The HTTP header and format parameter are enabled by default on every pandas view. Using the extension requires a custom URL configuration (see below).

Format

Content Type

pandas DataFrame Function

Notes

HTML

text/html

to_html()

See notes on HTML output

CSV

text/csv

to_csv()

TXT

text/plain

to_csv()

Useful for testing, as most browsers will download a CSV file instead of displaying it

JSON

application/j son

to_json()

`date_format` ` and ``orient <htt p://pandas.pyda ta.org/pandas-d ocs/stable/gene rated/pandas.Da taFrame.to_json .html>`__ can be provided in URL (e.g. /path.json?or ient=columns)

XLSX

application/v nd.openxml...sh eet

to_excel()

XLS

application/v nd.ms-excel

to_excel()

PNG

image/png

plot()

Currently not very customizable, but a simple way to view the data as an image.

SVG

image/svg

plot()

Eventually these could become a fallback for clients that can’t handle d3.js

The underlying implementation is a set of serializers that take the normal serializer result and put it into a dataframe. Then, the included renderers generate the output using the built in pandas functionality.

Usage

Getting Started

# Recommended: create virtual environment
# python3 -m venv venv
# . venv/bin/activate
pip install rest-pandas

NOTE: Django REST Pandas relies on pandas, which itself relies on NumPy and other scientific Python libraries written in C. This is usually fine, since pip can use Python Wheels to install precompiled versions. If you are having trouble installing DRP due to dependency issues, you may need to pre-install pandas using apt or conda.

Usage Examples

No Model

The example below allows you to create a simple API for an existing Pandas DataFrame, e.g. generated from an existing file.

# views.py
from rest_pandas import PandasSimpleView
import pandas as pd

class TimeSeriesView(PandasSimpleView):
    def get_data(self, request, *args, **kwargs):
        return pd.read_csv('data.csv')

Model-Backed

The example below assumes you already have a Django project set up with a single TimeSeries model.

# views.py
from rest_pandas import PandasView
from .models import TimeSeries
from .serializers import TimeSeriesSerializer

# Short version (leverages default DRP settings):
class TimeSeriesView(PandasView):
    queryset = TimeSeries.objects.all()
    serializer_class = TimeSeriesSerializer
    # That's it!  The view will be able to export the model dataset to any of
    # the included formats listed above.  No further customization is needed to
    # leverage the defaults.

# Long Version and step-by-step explanation
class TimeSeriesView(PandasView):
    # Assign a default model queryset to the view
    queryset = TimeSeries.objects.all()

    # Step 1. In response to get(), the underlying Django REST Framework view
    # will load the queryset and then pass it to the following function.
    def filter_queryset(self, qs):
        # At this point, you can filter queryset based on self.request or other
        # settings (useful for limiting memory usage).  This function can be
        # omitted if you are using a filter backend or do not need filtering.
        return qs

    # Step 2. A Django REST Framework serializer class should serialize each
    # row in the queryset into a simple dict format.  A simple ModelSerializer
    # should be sufficient for most cases.
    serializer_class = TimeSeriesSerializer  # extends ModelSerializer

    # Step 3.  The included PandasSerializer will load all of the row dicts
    # into array and convert the array into a pandas DataFrame.  The DataFrame
    # is essentially an intermediate format between Step 2 (dict) and Step 4
    # (output format).  The default DataFrame simply maps each model field to a
    # column heading, and will be sufficient in many cases.  If you do not need
    # to transform the dataframe, you can skip to step 4.

    # If you would like to transform the dataframe (e.g. to pivot or add
    # columns), you can do so in one of two ways:

    # A. Create a subclass of PandasSerializer, define a function called
    # transform_dataframe(self, dataframe) on the subclass, and assign it to
    # pandas_serializer_class on the view.  You can also use one of the three
    # provided pivoting serializers (see Advanced Usage below).
    #
    # class MyCustomPandasSerializer(PandasSerializer):
    #     def transform_dataframe(self, dataframe):
    #         dataframe.some_pivot_function(in_place=True)
    #         return dataframe
    #
    pandas_serializer_class = MyCustomPandasSerializer

    # B. Alternatively, you can create a custom transform_dataframe function
    # directly on the view.  Again, if no custom transformations are needed,
    # this function does not need to be defined.
    def transform_dataframe(self, dataframe):
        dataframe.some_pivot_function(in_place=True)
        return dataframe

    # NOTE: As the name implies, the primary purpose of transform_dataframe()
    # is to apply a transformation to an existing dataframe.  In PandasView,
    # this dataframe is created by serializing data queried from a Django
    # model.  If you would like to supply your own custom DataFrame from the
    # start (without using a Django model), you can do so with PandasSimpleView
    # as shown in the first example.

    # Step 4. Finally, the provided renderer classes will convert the DataFrame
    # to any of the supported output formats (see above).  By default, all of
    # the formats above are enabled.  To restrict output to only the formats
    # you are interested in, you can define renderer_classes on the view:
    renderer_classes = [PandasCSVRenderer, PandasExcelRenderer]
    # You can also set the default renderers for all of your pandas views by
    # defining the PANDAS_RENDERERS in your settings.py.

Django Pandas Integration

You can also let Django Pandas handle querying and generating the dataframe, and only use Django REST Pandas for the rendering:

# models.py
from django_pandas.managers import DataFrameManager

class TimeSeries(models.Model):
    # ...
    objects = DataFrameManager()
# views.py
from rest_pandas import PandasSimpleView
from .models import TimeSeries

class TimeSeriesView(PandasSimpleView):
    def get_data(self, request, *args, **kwargs):
        return TimeSeries.objects.to_timeseries(
            index='date',
        )

Registering URLs

# urls.py
from django.conf.urls import patterns, include, url

from .views import TimeSeriesView
urlpatterns = patterns('',
    url(r'^data', TimeSeriesView.as_view()),
)

# This is only required to support extension-style formats (e.g. /data.csv)
from rest_framework.urlpatterns import format_suffix_patterns
urlpatterns = format_suffix_patterns(urlpatterns)

The default PandasView will serve up all of the available data from the provided model in a simple tabular form. You can also use a PandasViewSet if you are using Django REST Framework’s ViewSets and Routers.

Customizing Renderers

You can override the default renderers by setting PANDAS_RENDERERS in your settings.py, or by overriding renderer_classes in your individual view(s). PANDAS_RENDERERS is defined separately from Django REST Framework’s own DEFAULT_RENDERER_CLASSES setting, in case you want to have DRP-enabled views intermingled with regular DRF views.

You can also include DRP renderers in DEFAULT_RENDERER_CLASSES. In that case, be sure to have all of your views extend PandasMixin, otherwise you may get an error saying the serializer output is not a DataFrame. In short, there are three paths to getting DRP renderers working with your views:

  1. Extend PandasView, PandasSimpleView, or PandasViewSet, and use the PANDAS_RENDERERS setting (which defaults to the list above).

  2. Extend PandasMixin and customize REST_FRAMEWORK['DEFAULT_RENDERER_CLASSES'] to add one or more rest_pandas renderers.

  3. Extend any of the Pandas* classes and set renderer_classes explicitly on the view.

class TimeSeriesView(PandasView):
    # renderer_classes default to PANDAS_RENDERERS
    ...

class TimeSeriesView(PandasMixin, ListAPIView):
    # renderer_classes default to REST_FRAMEWORK['DEFAULT_RENDERER_CLASSES']
    ...

Date Formatting

By default, Django REST Framework will serialize dates as strings before they are processed by the renderer classes. In many cases, you may want to preserve the dates as datetime objects and let Pandas handle the rendering. To do this, define an explicit DateTimeField or DateField on your DRF serializer and set format=None:

# serializers.py
class TimeSeriesSerializer(serializers.ModelSerializer):
    date = serializers.DateField(format=None)
    class Meta:
        model = TimeSeries
        fields = '__all__'

Alternately, you can disable date serialization globally by setting DATETIME_FORMAT and/or DATE_FORMAT to None in your settings.py:

# settings.py
DATE_FORMAT = None

HTML Output

The HTML renderer provides the ability to create an interactive view that shares the same URL as your data API. The dataframe is processed by to_html(), then passed to TemplateHTMLRenderer with the following context:

context variable

description

table

Output <table> from to_html()

name

View name

description

View description

url

Current URL Path (without parameters)

url_params

URL parameters

available_format s

Array of allowed extensions (e.g. 'csv', 'json', 'xlsx')

wq_chart_type

Recommended chart type (for use with wq/chartapp.js, see below)

As with TemplateHTMLRenderer, the template name is controlled by the view. If you are using DRP together with the wq framework, you can leverage the default mustache/rest_pandas.html template, which is designed for use with the wq/chartapp.js plugin. Otherwise, you will probably want to provide a custom template and/or set template_name on the view.

If you need to do a lot of customization, and/or you don’t really need the entire dataframe rendered in a <table>, you can always create another view for the interface and make the PandasView only handle the API.

Note: For backwards compatibility, PandasHTMLRenderer is only included in the default PANDAS_RENDERERS if rest_pandas is listed in your installed apps.

Building Interactive Charts

In addition to use as a data export tool, DRP is well-suited for creating data API backends for interactive charts. In particular, DRP can be used with d3.js, wq/pandas.js, and wq/chart.js, to create interactive time series, scatter, and box plot charts - as well as any of the infinite other charting possibilities d3.js provides.

To facilitate data API building, the CSV renderer is the default in Django REST Pandas. While the pandas JSON serializer is improving, the primary reason for making CSV the default is the compactness it provides over JSON when serializing time series data. The default CSV output from DRP will have single row of column headers, making it suitable as-is for use with e.g. d3.csv(). However, DRP is often used with the custom serializers below to produce a dataframe with nested multi-row column headers. This is harder to parse with d3.csv() but can be easily processed by wq/pandas.js, an extension to d3.js.

// mychart.js
define(['d3', 'wq/pandas', 'wq/chart'], function(d3, pandas, chart) {

// Unpivoted data (single-row header)
d3.csv("/data.csv", render);

// Pivoted data (multi-row header)
pandas.get('/data.csv', render);

function render(error, data) {
    d3.select('svg')
       .selectAll('rect')
       .data(data)
       // ...
}

});

DRP includes three custom serializers with transform_dataframe() functions that address common use cases. These serializer classes can be leveraged by assigning them to pandas_serializer_class on your view. If you are using the wq framework, these serializers can automatically leverage DRP’s default HTML template together with wq/chartapp.js to provide interactive charts. If you are not using the full wq framework, you can still use wq/pandas.js and wq/chart.js directly with the CSV output of these serializers.

For documentation purposes, the examples below assume the following dataset:

Location

Measurement

Date

Value

site1

temperature

2016-01-01

3

site1

humidity

2016-01-01

30

site2

temperature

2016-01-01

4

site2

temperature

2016-01-02

5

PandasUnstackedSerializer

PandasUnstackedSerializer unstacks the dataframe so a few key attributes are listed in a multi-row column header. This makes it easier to include metadata about e.g. a time series without repeating the same values on every data row.

To specify which attributes to use in column headers, define the attribute pandas_unstacked_header on your ModelSerializer subclass. You will generally also want to define pandas_index, which is a list of metadata fields unique to each row (e.g. the timestamp).

# serializers.py
from rest_framework import serializers
from .models import TimeSeries

class TimeSeriesSerializer(serializers.ModelSerializer):
    class Meta:
        model = MultiTimeSeries
        fields = ['date', 'location', 'measurement', 'value']
        pandas_index = ['date']
        pandas_unstacked_header = ['location', 'measurement']

# views.py
from rest_pandas import PandasView, PandasUnstackedSerializer
from .models import TimeSeries
from .serializers import TimeSeriesSerializer

class TimeSeriesView(PandasView):
    queryset = TimeSeries.objects.all()
    serializer_class = TimeSeriesSerializer
    pandas_serializer_class = PandasUnstackedSerializer

With the above example data, this configuration would output a CSV file with the following layout:

Value

Value

Value

Location

site1

site1

site2

Measurement

temperature

humidity

temperature

Date

2016-01-01

3

30

4

2016-01-02

5

This could then be processed by wq/pandas.js into the following structure:

[
    {
        "location": "site1",
        "measurement": "temperature",
        "data": [
            {"date": "2016-01-01", "value": 3}
        ]
    },
    {
        "location": "site1",
        "measurement": "humidity",
        "data": [
            {"date": "2016-01-01", "value": 30}
        ]
    },
    {
        "location": "site2",
        "measurement": "temperature",
        "data": [
            {"date": "2016-01-01", "value": 4},
            {"date": "2016-01-02", "value": 5}
        ]
    }
]

The output of PandasUnstackedSerializer can be used with the timeSeries() chart provided by wq/chart.js:

define(['d3', 'wq/pandas', 'wq/chart'], function(d3, pandas, chart) {

var svg = d3.select('svg');
var plot = chart.timeSeries();
pandas.get('/data/timeseries.csv', function(data) {
    svg.datum(data).call(plot);
});

});

PandasScatterSerializer

PandasScatterSerializer unstacks the dataframe and also combines selected attributes to make it easier to plot two measurements against each other in an x-y scatterplot.

To specify which attributes to use for the coordinate names, define the attribute pandas_scatter_coord on your ModelSerializer subclass. You can also specify additional metadata attributes to include in the header with pandas_scatter_header. You will generally also want to define pandas_index, which is a list of metadata fields unique to each row (e.g. the timestamp).

# serializers.py
from rest_framework import serializers
from .models import TimeSeries

class TimeSeriesSerializer(serializers.ModelSerializer):
    class Meta:
        model = MultiTimeSeries
        fields = ['date', 'location', 'measurement', 'value']
        pandas_index = ['date']
        pandas_scatter_coord = ['measurement']
        pandas_scatter_header = ['location']

# views.py
from rest_pandas import PandasView, PandasScatterSerializer
from .models import TimeSeries
from .serializers import TimeSeriesSerializer

class TimeSeriesView(PandasView):
    queryset = TimeSeries.objects.all()
    serializer_class = TimeSeriesSerializer
    pandas_serializer_class = PandasScatterSerializer

With the above example data, this configuration would output a CSV file with the following layout:

temperature-value

humidity-value

temperature-value

Location

site1

site1

site2

Date

2014-01-01

3

30

4

2014-01-02

5

This could then be processed by wq/pandas.js into the following structure:

[
    {
        "location": "site1",
        "data": [
            {
                "date": "2016-01-01",
                "temperature-value": 3,
                "humidity-value": 30
            }
        ]
    },
    {
        "location": "site2",
        "data": [
            {
                "date": "2016-01-01",
                "temperature-value": 4
            },
            {
                "date": "2016-01-02",
                "temperature-value": 5
            }
        ]
    }
]

The output of PandasScatterSerializer can be used with the scatter() chart provided by wq/chart.js:

define(['d3', 'wq/pandas', 'wq/chart'], function(d3, pandas, chart) {

var svg = d3.select('svg');
var plot = chart.scatter()
    .xvalue(function(d) {
        return d['temperature-value'];
    })
    .yvalue(function(d) {
        return d['humidity-value'];
    });

pandas.get('/data/scatter.csv', function(data) {
    svg.datum(data).call(plot);
});

});

PandasBoxplotSerializer

PandasBoxplotSerializer computes boxplot statistics (via matplotlib’s boxplot_stats) and pushes the results out via an unstacked dataframe. The statistics can be aggregated for a specified group column as well as by date.

To specify which attribute to use for the group column, define the attribute pandas_boxplot_group on your ModelSerializer subclass. To specify an attribute to use for date-based grouping, define pandas_boxplot_date. You will generally also want to define pandas_boxplot_header, which will unstack any metadata columns and exclude them from statistics.

# serializers.py
from rest_framework import serializers
from .models import TimeSeries

class TimeSeriesSerializer(serializers.ModelSerializer):
    class Meta:
        model = MultiTimeSeries
        fields = ['date', 'location', 'measurement', 'value']
        pandas_boxplot_group = 'site'
        pandas_boxplot_date = 'date'
        pandas_boxplot_header = ['measurement']

# views.py
from rest_pandas import PandasView, PandasBoxplotSerializer
from .models import TimeSeries
from .serializers import TimeSeriesSerializer

class TimeSeriesView(PandasView):
    queryset = TimeSeries.objects.all()
    serializer_class = TimeSeriesSerializer
    pandas_serializer_class = PandasBoxplotSerializer

With the above example data, this configuration will output a CSV file with the same general structure as PandasUnstackedSerializer, but with the value spread across multiple boxplot statistics columns (value-mean, value-q1,value-whishi, etc.). An optionalgroup` parameter can be added to the query string to switch between various groupings:

name

purpose

?group=series

Group by series (pandas_boxplot_group)

?group=series-year

Group by series, then by year

?group=series-month

Group by series, then by month

?group=year

Summarize all data by year

?group=month

Summarize all data by month

The output of PandasBoxplotSerializer can be used with the boxplot() chart provided by wq/chart.js:

define(['d3', 'wq/pandas', 'wq/chart'], function(d3, pandas, chart) {

var svg = d3.select('svg');
var plot = chart.boxplot();
pandas.get('/data/boxplot.csv?group=year', function(data) {
    svg.datum(data).call(plot);
});

});

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rest-pandas-1.0.0.tar.gz (35.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page