This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
Python Library
and
Command Line Utilities
for Gnip Historical PowerTrack API


The process for launching and retrieveing data for an historical historical job
requires only a few steps:
1) create job
2) retrieve and review job quote
3) accept or reject job
4) download data files list
5) download data

Untilities are included to assist with each step.

SETUP UTILITY
=============
First, set up your Gnip credentials. There is a simple utility to create the local credential
file named ".gnip".

$ ./setup_gnip_creds.py
Username: shendrickson@gnip.com
Password:
Password again:
Endpoint URL. Enter your Account Name (eg https://historical.gnip.com:443/accounts/<account name="">/): shendrickson
Done creating file ./.gnip
Be sure to run:
chmod og-w .gnip

$ chmod og-w .gnip

If you use the example JSON job description, be sure to change the "serviceUserNameField"
to your own, i.e., for Twitter, use your Twitter handle.

You will likely wish to run these utilities from other directory locations so be sure the export an
updated PYTHONPATH,

$ export PYTHONPATH=${PYTHONPATH}:path-to-gnip-python-historical-utilities

CREATE JOB
==========
Create a job description by editing the example JSON file provided ("bieber_job1.json").

You will end up with a single JSON record like this (see GNIP documentation for option
details). the fromDate and toDate are in the format YYYYmmddHHMM:

{
"dataFormat" : "activity-streams",
"fromDate" : "201201010000",
"publisher" : "twitter",
"rules" :
[
{
"tag" : "bestRuleEver",
"value" : "bieber"
}
],
"serviceUsername" : "PUT_YOUR_TWITTER_HANDLE_HERE",
"streamType" : "track",
"title" : "BieberJob1",
"toDate" : "201201010001"
}

To create the job,

$ ./create_job.py -f./bieber_job1.json -t "Social Data Phenoms - Bieber"

The response is the JSON record returned by the server. It will describe the job (including
JobID and the JobURL, or any error messages.

To get help,

$ ./create_job.py -h
Usage: create_job.py [options]

Options:
-h, --help show this help message and exit
-u URL, --url=URL Job url.
-l, --prev-url Use previous Job URL (only from this configuration
file.).
-v, --verbose Detailed output.
-f FILENAME, --filename=FILENAME
File defining job (JSON)
-t TITLE, --title=TITLE
Title of project, this title supercedes title in file.


LIST JOBS, get JOB QUOTES and get JOB STATUS:
=============================================
$ ./list_jobs.py -h
Usage: list_jobs.py [options]

Options:
-h, --help show this help message and exit
-u URL, --url=URL Job url.
-l, --prev-url Use previous Job URL (only from this configuration
file.).
-v, --verbose Detailed output.
-d SINCEDATESTRING, --since-date=SINCEDATESTRING
Only list jobs after date, (default
2012-01-01T00:00:00)

For example, I have three completed jobs, a Gnip job, a Bieber job and a SXSW
job for which data is avaiable.

$ ./list_jobs.py
#########################
TITLE: GNIP2012
STATUS: finished
PROGRESS: 100.0 %
JOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/eeh2vte64.json
#########################
TITLE: Justin Bieber 2009
STATUS: finished
PROGRESS: 100.0 %
JOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/j5epx4e5c3.json
#########################
TITLE: SXSW2010-2012
STATUS: finished
PROGRESS: 100.0 %
JOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json


To see detailed information or download data filelist,
specify URL with -u or add -v flag (data_files.txt contains
only URLs from last job in list)

DOWNLOAD URLS OF FILES CONTAINING DATA
======================================
To retrieve the file locations for the data files this job created on S3, pass
the job URL with the -u flag (or if you used -u for this job previously, just use -l--see help),

$ ./list_jobs.py -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json
#########################
TITLE: SXSW2010-2012
STATUS: finished
PROGRESS: 100.0 %
JOB URL: https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d.json

RESULT:
Job completed at ........ 2012-09-01 04:35:23
No. of Activities ....... -1
No. of Files ............ -1
Files size (MB) ......... -1
Data URL ................ https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historical/track/jobs/sbxff05b8d/results.json
DATA SET:
No. of URLs ............. 131,211
File size (bytes)........ 2,151,308,466
Files (URLs) ............ https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/00_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=hDSc0a%2BRQeG%2BknaSAWpzSUoM1F0%3D
https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/10_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=DOZlXKuMByv5uKgmw4QrCOpmEVw%3D
https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/20_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=X4SFTxwM2X9Y7qwgKCwG6fH8h7w%3D
https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/30_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=WVubKurX%2BAzYeZLX9UnBamSCrHg%3D
https://archive.replay.historicals.review.s3.amazonaws.com/historicals/twitter/track/activity-streams/shendrickson/2012/08/28/20100101-20120815_sbxff05b8d/2010/01/01/00/40_activities.json.gz?AWSAccessKeyId=AKIAJ7O2S22DN2NDN7UQ&Expires=1349066046&Signature=OG9ygKlXNxFvJLlAEWi3hes5yyw%3D
...

Writing files to data_files.txt...

Filenames for the 131K files created on S3 by the job have been downloaded to a file in
the local directory, ./data_files.txt.

DOWNLOAD DATA
=============

To retrieve this data use the utility,

$ ./get_data_files.bash
...

This will lauch up to 8 simultaneousl cUrl connections to S3 to download the files
into a local ./data/year/month/day/hour... directory tree (see name_mangle.py for details).

ACCEPT/REJECT JOB
=================
After a job is quoted, you can accept or reject the job. The job will not start until it is accepted.

$ ./accept_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json

or

$ ./reject_job -u https://historical.gnip.com:443/accounts/shendrickson/publishers/twitter/historicals/track/jobs/c9pe0day6h.json

The module gnip_historical.py provides additional functionality you can access programatically.

==
Gnip-Python-Historical-Utilities by Scott Hendrickson is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/
Release History

Release History

0.4.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
gnip-historical-0.4.0.tar.gz (23.8 kB) Copy SHA256 Checksum SHA256 Source Sep 7, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting