Skip to main content

Create a report for mobility data with differential privacy guarantees.

Project description

https://img.shields.io/pypi/v/dp_mobility_report.svg Documentation Status

dp_mobility_report: A python package to create a mobility report with differential privacy (DP) guarantees, especially for urban human mobility data.

Install

pip install dp-mobility-report

or from GitHub:

pip install git+https://github.com/FreeMoveProject/dp_mobility_report

Data preparation

df:

  • A pandas DataFrame.

  • Expected columns: User ID uid, Trip ID tid, timestamp datetime (expected is a datetime-like string, e.g., in the format yyyy-mm-dd hh:mm:ss. If datetime contains int values, it is interpreted as sequence positions, i.e., if the dataset only consists of sequences without timestamps), latitude and longitude in CRS EPSG:4326 lat and lng. (We thereby closely followed the format of the scikit-mobility TrajDataFrame.)

  • Here you can find an example dataset.

tessellation:

  • A geopandas GeoDataFrame with polygons.

  • Expected columns: tile_id.

  • The tessellation is used for spatial aggregations of the data.

  • Here you can find an example tessellation.

  • If you don’t have a tessellation, you can use this code to create a tessellation.

Create a mobility report as HTML

import pandas as pd
import geopandas as gpd
from dp_mobility_report import DpMobilityReport

df = pd.read_csv(
    "https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_data.csv"
)
tessellation = gpd.read_file(
    "https://raw.githubusercontent.com/FreeMoveProject/dp_mobility_report/main/tests/test_files/test_tessellation.geojson"
)

report = DpMobilityReport(df, tessellation, privacy_budget=10, max_trips_per_user=5)

report.to_file("my_mobility_report.html")

The parameter privacy_budget (in terms of epsilon-DP) determines how much noise is added to the data. The budget is split between all analyses of the report. If the value is set to None no noise (i.e., no privacy guarantee) is applied to the report.

The parameter max_trips_per_user specifies how many trips a user can contribute to the dataset at most. If a user is represented with more trips, a random sample is drawn according to max_trips_per_user. If the value is set to None the full dataset is used. Note, that deriving the maximum trips per user from the data violates the differential privacy guarantee. Thus, None should only be used in combination with privacy_budget=None.

Please refer to the documentation for information on further parameters.

Examples

Berlin mobility data simulated using the DLR TAPAS Model: [Code used for Berlin]

Madrid CRTM survey data: [Code used for Madrid]

Beijing Geolife dataset: [Code used for Beijing]

(Here you find the code of the data preprocessing to obtain the needed format)

Citing

if you use dp-mobility-report please cite the following paper:

@article{
        doi:10.1080/17489725.2022.2148008,
        title = {Towards Mobility Reports with User-Level Privacy},
        author = {Kapp, Alexandra and {von Voigt}, Saskia Nu{\~n}ez and Mihaljevi{\'c}, Helena and Tschorsch, Florian},
        year = {2022},
        journal = {Journal of Location Based Services},
        eprint = {https://www.tandfonline.com/doi/pdf/10.1080/17489725.2022.2148008},
        publisher = {{Taylor \& Francis}},
        doi = {10.1080/17489725.2022.2148008}
}

Credits

This package was highly inspired by the pandas-profiling/pandas-profiling and scikit-mobility packages.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.5 (2022-12-12)

  • Remove scikit-mobility dependency and refactor od flow visualization.

0.1.4 (2022-12-07)

  • Remove Google Fonts from HTML.

0.1.3 (2022-12-05)

  • Handle FutureWarning of pandas.

0.1.2 (2022-11-24)

  • Enhanced documentation for all properties of DpMobilityReport class

0.1.1 (2022-10-27)

  • fix bug: prevent error “key trips not found” in trips_over_time if sum of trip_count is 0

0.1.0 (2022-10-21)

  • make tessellation an Optional parameter

  • allow DataFrames without timestamps but sequence numbering instead (i.e., integer for timestamp column)

  • allow to set seed for reproducible sampling of the dataset (according to max_trips_per_user)

0.0.8 (2022-10-20)

  • Fixes addressing deprecation warnings.

0.0.7 (2022-10-17)

  • parameter for a custom split of the privacy budget between different analyses

  • extend ‘analysis_selection’ to include single analyses instead of entire segments

  • parameter for ‘analysis_exclusion’ instead of selection

  • bug fix: include all possible categories for days and hour of days

  • bug fix: show correct percentage of outliers

  • show 95% confidence-interval instead of upper and lower bound

  • show privacy budget and confidence interval for each analysis

0.0.6 (2022-09-30)

  • Remove scaling of counts to match a consistent trip_count / record_count (from ds_statistics) in visits_per_tile, visits_per_tile_timewindow and od_flows. Scaling was implemented to keep the report consistent, though it is removed for now as it introduces new issues.

  • Minor bug fixes in the visualization: outliers were not correctly converted into percentage.

0.0.5 (2022-08-26)

Bug fix: correct scaling of timewindow counts.

0.0.4 (2022-08-22)

  • Simplify naming: from MobilityDataReport to DpMobilityReport

  • Simplify import: from from dp_mobility_report import md_report.MobilityDataReport to from dp_mobility_report import DpMobilityReport

  • Enhance documentation: change style and correctly include API reference.

0.0.3 (2022-07-22)

  • Fix broken link.

0.0.2 (2022-07-22)

  • First release to PyPi.

  • It includes all basic functionality, though still in alpha version and under development.

0.0.1 (2021-12-16)

  • First version used for evaluation in xx.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dp-mobility-report-0.1.5.tar.gz (77.6 kB view hashes)

Uploaded Source

Built Distribution

dp_mobility_report-0.1.5-py2.py3-none-any.whl (79.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page