Skip to main content

A tool to track hashtags used by members of the U.S. congress.

Project description

purpletag
=========

A tool to track polarized hashtags used by members of the U.S. congress.

Install
-------

``pip install purpletag``

or, from source:

::

git clone https://github.com/casmlab/purpletag.git
cd purpletag
python setup.py install

Configuration
-------------

purpletag depends on `twutil <https://github.com/tapilab/twutil>`__ for
collecting data from Twitter. You'll need to put your credentials in the
following environmental variables:

- ``TW_CONSUMER_KEY``
- ``TW_CONSUMER_SECRET``
- ``TW_ACCESS_TOKEN``
- ``TW_ACCESS_TOKEN_SECRET``

purpletag also depends on a configuration file (see
```sample.cfg`` <sample.cfg>`__ for an example). By default, it is
assumed to be in ``~/.purpletag``, but you can specify a custom location
by setting the ``PURPLE_CFG`` environmental variable.

By default, all data will be written to ``/data/purpletag``, but you can
change this in the config file.

purpletag fetches the list of legislators and their Twitter handles from
http://www.govtrack.us/; these URLs are also specified in the config.

Getting started
---------------

purpletag consists of a number of command-line tools to collect, parse,
and analyze tweets sent by members of Congress.

To see the list of commands:

::

$ purpletag -h
usage: purpletag [--help] <command> [<args>...]

The most commonly used purpletag commands are:
collect Collect tweets from members of congress, stored in json
parse Parse tweet json
score Create score files containing polarization scores for hashtags and MOCs.
serve Launch a web service to visualize results.
See 'purpletag help <command>' for more information on a specific command.

The expected use case is that ``collect`` is run continuously, then
``parse``, ``score``, ``serve`` are run once daily. There is also
support for using historical data (see the ``-s`` option of ``collect``
and the ``-d`` option of ``parse``).

``collect``
~~~~~~~~~~~

This command will fetch tweets from all members of congress listed in
``twitter.yaml``.

::

purpletag collect -h
usage:
purpletag collect [options]
purpletag collect (-t | --track | -s | --search) [options]

Options
-h, --help
-o, --output <file> output path
-r, --refresh-handles fetch latest twitter handles for politicians
-t, --track collect tweets in real time using streaming API
-s, --search search historical tweets using search API

There are two modes of operation:

- ``track``: Use the Twitter Streaming API to collect tweets in
real-time.
- ``search``: Use the Twitter REST API to collect the most recent 3,200
tweets from each legislator.

Output is stored in ``/data/purpletag/jsons``.

You probably want to use ``search`` to first collect all historical
tweets, then run ``track`` to collect all tweets going forward.
**Note:** ``search`` will take a long time to run (hours), since the
script sleeps to wait out the rate limits imposed by the REST API.

``parse``
~~~~~~~~~

This command will parse all the collected tweets in
``/data/purpletag/jsons`` and extract the hashtags used by each
legislator.

::

purpletag parse -h
usage: purpletag parse [options]

Parse .json files into .tags files.

Options
-h, --help help
-t <timespans> sliding window timespans [default: 1,7,30]
-d <days> number of historical days to simulate [default: 1]

The output looks like this:

::

markwarner whistleblowers:1 studentdebt:1 nova:1 f22:1
repwestmoreland jobs:1 nationaldayofprayer:2 benghazi:3

For example, this indicates that Lynn Westmoreland used the hashtag
#jobs once, #nationaldayofprayer twice, and #benghazi three times.

The ``-t`` parameter indicates a list of timespans to use when
aggregating these statistics. For example ``purpletag parse -t 30`` will
parse all tweets posted in the past 30 days and compute output like the
example above. The file name itself will indicate this. For example,
``2014-05-02.30.tags`` is a tags file created when running this command
on May 2, 2014, collecting statistics for the past 30 days.

The ``-d`` parameter allows you to simulate running this for a number of
days in the past. This is useful after running ``purpletag collect -s``
to collect all historical data (up to 3,200 per legislator), then
generating tags files as if you had been running this daily.

Output is stored in ``/data/purpletag/tags``.

``score``
~~~~~~~~~

This command scores hashtags according to their polarity.

::

purpletag score -h
usage: purpletag score [options]

Compute polarity scores for all .tags files that we haven't yet processed.

Options
-h, --help help
-r, --refresh-mocs fetch latest legislator information from GovTrack
-c, --counts use hashtag count features instead of binary features
-o, --overwrite overwrite existing .scores files

These produce ``.scores`` files, one per ``.tags`` file. E.g.,
``2014-05-02.365.scores`` contains the scores for the hashtags used for
the 365 days prior to May 2, 2014. The scores range from -1 (liberal) to
+1 (conservative).

::

demandavote -0.004258
getcovered -0.003548
raisethewage -0.003548
.
.
.
senatemustact 0.001499
fairnessforall 0.001799
tcot 0.002249

Output is stored in ``/data/purpletag/scores``.

``serve``
~~~~~~~~~

This command will launch a simple web server to visualize tag polarity
over time, using ```dygraphs`` <http://dygraphs.com/>`__

::

purpletag serve -h
usage: purpletag serve [options]

Launch a web service to visualize results.

Options
-h, --help help
-n <tags> number of tags to show from each party [default: 100]

The web data is stored in ``/data/purpletag/web``. The default port is
set by the config file. So http://0.0.0.0:8000/1.html might look
something like this:

.. figure:: https://raw.githubusercontent.com/casmlab/purpletag/master/docs/sample-graph.png
:alt: sample

sample





History
-------

0.1.3 (2014-05-22)
++++++++++++++++++
* Bug fixes in parse
* Install web data
* Enable time series selection
* Graph ranks instead of scores
* UI tweaks

0.1.1 (2014-05-15)
++++++++++++++++++

* Add UI support
* Bug fixes
* standardize Chi-square scores
* support historical parsing


0.1.0 (2014-05-02)
++++++++++++++++++

* First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purpletag-0.1.4.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

purpletag-0.1.4-py2.py3-none-any.whl (17.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file purpletag-0.1.4.tar.gz.

File metadata

  • Download URL: purpletag-0.1.4.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for purpletag-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0738bd6b8f26de9df55b94ea881f0196b11585d34c3d808502aec52a7c658fc4
MD5 6645dee7295c8a8a4be7c9b3a8e1308b
BLAKE2b-256 e2351873c3780cbf94f27e85a2c403fdf5ce57c77217f570f75a3318d4064474

See more details on using hashes here.

File details

Details for the file purpletag-0.1.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for purpletag-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b0949e4609939cc216b04a929073a4ceb0c1294acc6f63cf6f36134291df0084
MD5 305775293382f38d2efae113a1598717
BLAKE2b-256 c3bf3a2efd939137451edaf60d2e0d74779ba8c848a6015ec5e77714fa43248f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page