Python package to detect suspicious OpenStreetMap changesets
OSM Changeset Analyser, osmcha, is a Python package to detect suspicious OSM changesets. It was designed to be used with osmcha-django, but also can be used standalone or in other projects.
pip install osmcha
You can read a replication changeset file directly from the web:
c = ChangesetList('https://planet.openstreetmap.org/replication/changesets/002/236/374.osm.gz')
or from your local filesystem.
c = ChangesetList('tests/245.osm.gz')
c.changesets will return a list containing data of all the changesets listed in the file.
You can filter the changesets passing a GeoJSON file with a polygon with your interest area to ChangesetList as the second argument.
Finally, to analyse an especific changeset, do:
ch = Analyse(changeset_id) ch.full_analysis()
Customizing Detection Rules
You can customize the detection rules by defining your prefered values when initializing the Analyze class. See below the default values.
ch = Analyse(changeset_id, create_threshold=200, modify_threshold=200, delete_threshold=30, percentage=0.7, top_threshold=1000, suspect_words=[...], illegal_sources=[...], excluded_words=[...])
Command Line Interface
The command line interface can be used to verify an especific changeset directly from the terminal.
Usage: osmcha <changeset_id>
osmcha works by analysing how many map features the changeset created, modified or deleted, and by verifying the presence of some suspect words in the comment, source and imagery_used fields of the changeset. Furthermore, we also consider if the software editor used allows to import data or to do mass edits. We consider powerfull editors: JOSM, Merkaartor, level0, QGIS and ArcGis.
In the Usage section, you can see how to customize some of these detection rules.
We tag a changeset as a possible import if the number of created elements is greater than 70% of the sum of elements created, modified and deleted and if it creates more than 1000 elements or 200 elements case it used one of the powerfull editors.
We consider a changeset as a mass modification if the number of modified elements is greater than 70% of the sum of elements created, modified and deleted and if it modifies more than 200 elements.
All changesets that delete more than 1000 elements are considered a mass deletion. If the changeset deletes between 200 and 1000 elements and the number of deleted elements is greater than 70% of the sum of elements created, modified and deleted it’s also tagged as a mass deletion.
The suspect words are loaded from a yaml file. You can customize the words by setting another default file with a environment variable:
or pass a list of words to the Analyse class, more information on the section Customizing Detection Rules. We use a list of illegal sources to analyse the source and imagery_used fields and another more general list to examine the comment field. We have also a list of excluded words to avoid false positives.
Verify if the user has less than 5 edits or less than 5 mapping days.
User has multiple blocks
Changesets created by users that has received more than one block will be flagged.
To run the tests on osmcha:
git clone https://github.com/willemarcel/osmcha.git cd osmcha pip install -e .[test] py.test -v
Check CHANGELOG for the version history.
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|osmcha-0.4.7-py2.py3-none-any.whl (12.4 kB) Copy SHA256 hash SHA256||Wheel||3.6||Apr 2, 2018|
|osmcha-0.4.7.tar.gz (27.7 kB) Copy SHA256 hash SHA256||Source||None||Apr 2, 2018|