Open Source Package for Mobile Phone Metadata Preprocessing

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

====== gnuper

gnuper is an open source python package for the preprocessing of mobile phone metadata whilst keeping privacy in mind. It is being developed by the spin-off project knuper <https://www.knuper.com>_ of the Freie Universität Berlin. The package creates features on antenna level off of mobile phone metadata like simple incoming to outgoing ratios of calls or sms as well as entropy and isolation indicators of antennas. Additionally one can include antenna averages of the bandicoot individual indicators developed by the MIT. Further information on this toolkit can be found here <http://bandicoot.mit.edu/>_.

Prerequisites

Python3
Apache Spark

Structural Overview

The following tree diagram shows the entire process from raw mobile phone meta data to final features on antenna level. In order to keep the process fast and the intermediate tables as small as possible, several mid-aggregations are taking place. We call these intermediate steps levels. Each level represents another aggregation or transformation step. The color coding signifies the privacy risk. Orange is on user level, blue on antenna (single tower GPS coordinate) level.

.. figure:: docs/Raw_Data_Levels.png :alt: Data Levels :scale: 60 %

Level 0 This level simply unifies and preprocesses (kick out unnecessary information) the raw format. Also obvious machines are being filtered as well as the files restructured into chunks of users.
Level 1 For each user a home antenna is being estimated and then interactions aggregated on user level per day and hour. Also bandicoot variables might be calculated if the flag is set.
Level 2 Antenna level. Users are being allocated to their home antenna and then aggregated in three different categories. First, for each week between weekend or holidays and working days. Second, for every hour of the day irrespective of the date. And third, for all interactions between each pair of antennas.
Level 3 Features are being created for several categories: alltime (ratios & scaled) - over the whole time period variance (variances over weeks) - to catch regional variation daily (outgoing vs. incoming, week parts) - differences between working days and weekends/holidays hourly (workday, peaks) - differences between the usual working day and several peaks interactions (distance, isolation, entropy) - network and geolocation related active users (per home antenna) - number of users allocated to an antenna as their home antenna For the exact formulas, have a look at the specific queries.
Level 4 Unite the feature categories with the bandicoot features (aggregated on antenna level) to one final dataset, ready for analysis.

Sample Data

In order to create a small synthetic CDR sample data set (<20MB) run the ipython script cdr_mockup.ipynb <cdr_mockup.ipynb>_ e.g. with Jupyter.

Spark

The package has been built using SPARK version 2.2.1.

Instructions on how to install Spark in Ubuntu (tested with 18.04):

install java environment sudo apt-get install default-jre
install scala sudo apt-get install scala
download spark wget http://archive.apache.org/dist/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
unzip sudo tar -zxvf spark-2.2.1-bin-hadoop2.7.tgz
and remove rm spark-2.2.1-bin-hadoop2.7.tgz

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.3

Dec 22, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gnuper-0.0.3.tar.gz (22.2 kB view details)

Uploaded Dec 22, 2018 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gnuper-0.0.3-py3-none-any.whl (46.7 kB view details)

Uploaded Dec 22, 2018 Python 3

File details

Details for the file gnuper-0.0.3.tar.gz.

File metadata

Download URL: gnuper-0.0.3.tar.gz
Upload date: Dec 22, 2018
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.5

File hashes

Hashes for gnuper-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`03762142cb120cb0a9a82bf012983a15101db312f5fc76ce7d7fb06b65102ad4`
MD5	`8f0b803426ec6dc18551e40a43ea1eae`
BLAKE2b-256	`d968996c067687d25b3f5f09111a7823cebb014477e429055d2ce458ff1a9b9e`

See more details on using hashes here.

File details

Details for the file gnuper-0.0.3-py3-none-any.whl.

File metadata

Download URL: gnuper-0.0.3-py3-none-any.whl
Upload date: Dec 22, 2018
Size: 46.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.21.0 CPython/3.6.5

File hashes

Hashes for gnuper-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f0e68c770cd9e80d38cdafad4513e596b85af0e36d036eb76141710a9841ac7`
MD5	`f8ebb7a19c4553e219291f1108955571`
BLAKE2b-256	`c41cc7bd32f0ac56b24482a652dd7ef12424293a204e517d10e7bbe799acb7cf`

See more details on using hashes here.

gnuper 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

====== gnuper

Prerequisites

Structural Overview

Sample Data

Spark

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes