Skip to main content

Performs user classification into labels using a set of seed Twitter users with known labels andthe structure of the interaction network between them.

Project description

Performs user classification into labels using a set of seed Twitter users with known labels and the structure of the interaction network between them.



### Required packages - numpy - scipy - scikit-learn - networkx - [reveal-user-annotation]( - [reveal-graph-embedding](

### Installation To install for all users on Unix/Linux:

python3.4 build sudo python3.4 install


pip install reveal-user-classification

Reveal-FP7 Integration

The name of the entry point script is user_network_profile_classifier.


The following two arguments are for establishing a connection to a Mongo database and accessing the documents in a collection.

  • $MONGO_DB_URI example: “mongodb://admin:123456@”
  • $MONGO_ASSESSMENT_ID example: “new_tweets_database_name.new_tweets_collection_name”, separated by a “.” as shown.

The following two arguments are for using a Twitter app in order to fetch data from Twitter.

  • $TWITTER_APP_KEY and $TWITTER_APP_SECRET: Both are taken from one’s created app in the Twitter development site.

The following four arguments are for publishing messages to a RabbitMQ queue. The queue is used both for publishing a “SUCCESS” message at completion, but also for publishing the results of the module.

  • $AMQP_URI example: amqp://guest:guest@localhost:5672//
  • One must also supply: $AMQP_QUEUE_NAME, $AMQP_EXCHANGE and $AMQP_ROUTING_KEY

There are some optional arguments that can be considered. The following three can be used either together or apart; otherwise all of the tweets in the collection will be read.

  • $LATEST_N: The N latest chronologically documents will be read from the defined collection. In order for this to work properly, the “created_at” field of the tweets must be in the proper time format as defined by MongoDB.
  • $LOWER_TIMESTAMP: A UNIX timestamp; based on the created_at tweet field. Only tweets after this timestamp will be used for the analysis.
  • $UPPER_TIMESTAMP: Similarly, for an upper limit.

The following four arguments set various parameters for the execution of the module.

  • $NUMBER_OF_PARALLEL_TASKS: Number of parallel tasks initiated for each assessment analysis launch. If not specified, tries to set as number of cores.
  • $NUMBER_OF_USERS_TO_ANNOTATE: Number of users to annotate automatically, using Twitter data. Each user requires approximately at least an additional minute. Default value is 90. For faster testing, try a smaller number.

Some intermediate data and the resulting user-to-topic association will be written in a Mongo database on the same Mongo client used for the input.

  • $USER_NETWORK_PROFILE_CLASSIFIER_MONGO_DB: A distinctive name should be chosen so as not to interfere with the databases reserved for input data. The collection in which the results are written is: “user_topics_collection”.

The entry point script can be viewed on /reveal_user_classification/entry_points/ where the argument usage can be read in greater detail.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reveal-user-classification-0.2.8.tar.gz (23.2 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page