Python package to detect suspicious OpenStreetMap changesets
OSM Changeset Analyser, osmcha, is a Python package to detect suspicious OSM changesets. It was designed to be used with osmcha-django, but also can be used standalone or in other projects.
You can report issues or request new features in the the osmcha-frontend repository.
pip install osmcha
You can read a replication changeset file directly from the web:
c = ChangesetList('https://planet.openstreetmap.org/replication/changesets/002/236/374.osm.gz')
or from your local filesystem.
c = ChangesetList('tests/245.osm.gz')
c.changesets will return a list containing data of all the changesets listed in the file.
You can filter the changesets passing a GeoJSON file with a polygon with your interest area to ChangesetList as the second argument.
Finally, to analyse an especific changeset, do:
ch = Analyse(changeset_id) ch.full_analysis()
Customizing Detection Rules
You can customize the detection rules by defining your prefered values when initializing the Analyze class. See below the default values.
ch = Analyse(changeset_id, create_threshold=200, modify_threshold=200, delete_threshold=30, percentage=0.7, top_threshold=1000, suspect_words=[...], illegal_sources=[...], excluded_words=[...])
Command Line Interface
The command line interface can be used to verify an especific changeset directly from the terminal.
Usage: osmcha <changeset_id>
osmcha works by analysing how many map features the changeset created, modified or deleted, and by verifying the presence of some suspect words in the comment, source and imagery_used fields of the changeset. Furthermore, we also consider if the software editor used allows to import data or to do mass edits. We consider powerfull editors: JOSM, Merkaartor, level0, QGIS and ArcGis.
In the Usage section, you can see how to customize some of these detection rules.
We tag a changeset as a possible import if the number of created elements is greater than 70% of the sum of elements created, modified and deleted and if it creates more than 1000 elements or 200 elements case it used one of the powerfull editors.
We consider a changeset as a mass modification if the number of modified elements is greater than 70% of the sum of elements created, modified and deleted and if it modifies more than 200 elements.
All changesets that delete more than 1000 elements are considered a mass deletion. If the changeset deletes between 200 and 1000 elements and the number of deleted elements is greater than 70% of the sum of elements created, modified and deleted it’s also tagged as a mass deletion.
The suspect words are loaded from a yaml file. You can customize the words by setting another default file with a environment variable:
or pass a list of words to the Analyse class, more information on the section Customizing Detection Rules. We use a list of illegal sources to analyse the source and imagery_used fields and another more general list to examine the comment field. We have also a list of excluded words to avoid false positives.
Verify if the user has less than 5 edits or less than 5 mapping days.
User has multiple blocks
Changesets created by users that has received more than one block will be flagged.
To run the tests on osmcha:
git clone https://github.com/willemarcel/osmcha.git cd osmcha pip install -e .[test] py.test -v
Check CHANGELOG for the version history.
Release history Release notifications
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size osmcha-0.5.2-py2.py3-none-any.whl (22.1 kB)||File type Wheel||Python version py2.py3||Upload date||Hashes View|
|Filename, size osmcha-0.5.2.tar.gz (31.0 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for osmcha-0.5.2-py2.py3-none-any.whl