Geographic masking tools for spatial data anonymization

These details have not been verified by PyPI

Project description

MaskMyPy

MaskMyPy is a (very alpha) Python package that performs geographic masking on GeoPandas geodataframes. It offers two main methods: street masking and donut masking.

MaskMyPy also supports k-anonymity estimation using population data and k-anonymity calculation using address data, as well as the calculation of displacement distance between sensitive and masked points.

Disclaimer: MaskMyPy is offered as-is, without warranty of any kind. Geographic masking is a hard problem that requires informed decisions and validation. MaskMyPy provides helpful tools for geographic masking, but does not replace expertise.

Installation

MaskmyPy is pip-installable, but relies on osmnx. If you do not have it installed, first get it using Anaconda:

conda install -c conda-forge osmnx

Then, install MaskMyPy using pip:

pip install maskmypy

Street Masking

Street masking automatically downloads OpenStreetMap road network data and uses it to geographically mask your sensitive points. It works by first downloading the road network data, snapping each sensitive point to the nearest node on the network (an intersection or dead end), and then calculating the average network-distance between that node and a pool of the closest x number of nodes (e.g. the clsoest 20 nodes on the network, known as the search depth). This average distance is the target displacement distance. Finally, it selects a node from the pool whose network-distance from the starting node is closest to the target displacement distance.

Usage: To street mask a geodataframe containing sensitive points with a search-depth value of 20, the code would be as follows:

from maskmypy import Street

streetmask = Street(
    sensitive_gdf, # Name of the sensitive geodataframe
    depth=20, # The search depth value used to calculate displacement distances. 
    extent_expansion_distance=2000, # Used to download road network data surrounding the study area. Needs to be sufficiently large to reduce edge effects. Increasing reduces edge effects, but uses more memory.
    max_street_length=500) # Optional, but recommended that you read below for full explanation of what this does.


streetmask.execute() # Single threaded by default. Add `parallel=True` as parameter to run on all CPU cores, drastically increasing performance.

masked_gdf = streetmask.masked

About max_street_length: when snapping points to the street network, the algorithm checks to make sure that the nearest node is actually connected to the network and has neighbors that are no more than max_street_length away (in meters). If it does not, then the next closest viable node is selected, checked, and so on. This acts as a sanity check to prevent extremely large masking distances. Feel free to change this to whatever you feel is appropriate.

Donut Masking

Usage: To perform basic donut geomasking on a geodataframe containing sensitive points, with a maximum displacement distance of 500 meters and an minimum displacement distance of 20% of the maximum distance (i.e. 100 meters), the code would look like this:

from maskmypy import Donut

donutmask = Donut(
    sensitive_gdf=sensitive_gdf, # Name of the sensitive geodataframe
    max_distance=250, # The maximum possible distance that points are displaced
    donut_ratio=0.1, # The ratio used to define the minimum distance points are displaced
    distribution='uniform', # The distribution to use when displacing points. Other options include 'gaussian' and 'areal'. 'Areal' distribution means points are more likely to be displaced further within the range.
    container_gdf=container_gdf) # Optional, a geodataframe used to ensure that points do not leave a particular area. 

donutmask.execute()

masked_gdf = donutmask.masked

To perform full donut geomasking (i.e. using census data and a target k-anonymity range rather than distance range) with a maximum k-anonymity of 1000 and minimum of 200, and a census geodataframe called population_gdf, the code would appear as follows:

from maskmypy import Donut_MaxK

donutmask = Donut_MaxK(
    sensitive_gdf, # Name of the sensitive geodataframe
    population_gdf=population_gdf, # Name of the census geodataframe
    population_column='pop', # Name of the column containing the population field
    max_k_anonymity=1000, # The maximum possible k-anonymity value
    donut_ratio=0.2, # The ratio used to define the minimum possible k-anonymity value.
    distribution='uniform', # The distribution to use when displacing points. Other options include 'gaussian' and 'areal'. 'Areal' distribution means points are more likely to be displaced further within the range.
    container_gdf=container_gdf) # Optional, a geodataframe used to ensure that points do not leave a particular area. 

donutmask.execute()

masked_gdf = donutmask.masked

K-Anonymity

Maskmypy is able to calculate the k-anonymity of each point after masking. Two methods are available for this: estimates, and exact calculations. Estimates of k-anoynmity are inferred from census data, and assume a homogeneously distributed population within each census polygon. Address-based k-anonymity is more accurate and uses actual home address data to calculate k-anonymity.

Estimate K-Anonymity

Usage: After the data has been masked, estimating k-anoynmity using census data would look like this and will add a column to the masked geodataframe:

mask.k_anonymity_estimate(
    population_gdf=population_gdf, # Name of the census geodataframe. Not necessary if you already included this parameter in the original masking steps.
    population_column='pop') # Name of the column containing the population field. Not necessary if you already included this parameter in the original masking steps.

Calculate K-Anonymity

Usage: After the data has been masked, calcualting address-based k-anoynmity would look like this and will add a column to the masked geodataframe:

mask.k_anonymity_actual(address_points_gdf='') # Name of the geodataframe including address points.

Displacement Distance

Usage: To add a column to the masked geodataframe that includes the actual displacement distances (in meters), one can just execute:

mask.displacement_distance()

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: GIS

Release history Release notifications | RSS feed

1.1.0

Oct 23, 2025

1.0.0

Jul 6, 2024

0.0.8

Aug 1, 2022

0.0.7

Jul 29, 2022

0.0.6

Jul 29, 2022

This version

0.0.5

Nov 11, 2021

0.0.3

Jun 23, 2020

0.0.2 yanked

Jun 23, 2020

0.0.1 yanked

Jun 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maskmypy-0.0.5.tar.gz (11.0 kB view details)

Uploaded Nov 11, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

maskmypy-0.0.5-py3-none-any.whl (9.9 kB view details)

Uploaded Nov 11, 2021 Python 3

File details

Details for the file maskmypy-0.0.5.tar.gz.

File metadata

Download URL: maskmypy-0.0.5.tar.gz
Upload date: Nov 11, 2021
Size: 11.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for maskmypy-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`c761eef5d119ec65e3d8a67b251fd85aabf496cba6715c0588e77821b11b1968`
MD5	`459aa1d680a7edcdff44eda49545f5af`
BLAKE2b-256	`1554225086f0d313a71342919c93f4ab22b5293cd45e9ee94d5b8bece8e979f9`

See more details on using hashes here.

File details

Details for the file maskmypy-0.0.5-py3-none-any.whl.

File metadata

Download URL: maskmypy-0.0.5-py3-none-any.whl
Upload date: Nov 11, 2021
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for maskmypy-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6edc06a847a2c36e7c77600ba0993c1e1ee44d799c65776908b8425c8af82016`
MD5	`853072c6d8d80fd1c6516cb17f3f169c`
BLAKE2b-256	`f1f877c49fb91431b7db59b057769486b9068f581817362b44b29c00a52fb2d1`

See more details on using hashes here.

maskmypy 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MaskMyPy

Installation

Street Masking

Donut Masking

K-Anonymity

Estimate K-Anonymity

Calculate K-Anonymity

Displacement Distance

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes