Skip to main content

Media Cloud API Client Library

Project description

MediaCloud Python API Client

This is a python client for accessing the MediaCloud API v2. We support Python versions 2.7 and 3.6.

GitHub license Build Status

Related work:

Usage

First sign up for an API key. Then

pip install mediacloud

Check CHANGELOG.md for a detailed history of changes.

Examples

Find out how many stories in the top US online news sites mentioned "Zimbabwe" in the last year:

import mediacloud.api
mc = mediacloud.api.MediaCloud('MY_API_KEY')
res = mc.storyCount('zimbabwe AND president AND tags_id_media:58722749', 'publish_date:[NOW-1YEAR TO NOW]')
print(res['count']) # prints the number of stories found

Get 2000 stories from the NYT about a topic in 2018 and dump the output to json:

import mediacloud.api, json, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')

fetch_size = 500
stories = []
last_processed_stories_id = 0
while len(stories) < 2000:
    fetched_stories = mc.storyList('trump AND "north korea" AND media_id:1', 
                                   solr_filter=mc.dates_as_query_clause(datetime.date(2018,1,1), datetime.date(2019,1,1)),
                                   last_processed_stories_id=last_processed_stories_id, rows= fetch_size)
    stories.extend(fetched_stories)
    if len( fetched_stories) < fetch_size:
        break
    last_processed_stories_id = stories[-1]['processed_stories_id']
print(json.dumps(stories))

Find the most commonly used words in stories from the US top online news sites that mentioned "Zimbabwe" and "president" in 2013:

import mediacloud.api, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')
words = mc.wordCount('zimbabwe AND president AND tags_id_media:58722749',
                     mc.dates_as_query_clause( datetime.date( 2013, 1, 1), datetime.date( 2014, 1, 1)))
print(words[0])  # prints the most common word

To find out all the details about one particular story by id:

import mediacloud.api
mc = mediacloud.api.MediaCloud('MY_API_KEY')
story = mc.story(169440976)
print(story['url'])  # prints the url the story came from

To save the first 100 stories from one day to a database:

import mediacloud.api, datetime
mc = mediacloud.api.MediaCloud('MY_API_KEY')
db = mediacloud.storage.MongoStoryDatabase('one_day')
stories = mc.storyList('*', mc.dates_as_query_clause( datetime.date (2014, 1, 1), datetime.date(2014,1,2) ),
                       last_processed_stories_id=0,rows=100)
[db.addStory(s) for s in stories]
print(db.storyCount())

Take a look at the test in the mediacloud/test/ module for more detailed examples.

Development

If you are interested in adding code to this module, first clone the GitHub repository.

Testing

You need to create an MC_API_KEY envvar and set it to your API key (we use python-dotenv). Then run make test. We run continuous integration (via Travis), so every push runs the whole test suite (we also do this nightly and on PRs).

Distributing a New Version

If you want to, setup twin's keyring integration to avoid typing your PyPI password over and over.

  1. Run make test to make sure all the test pass
  2. Update the version number in mediacloud/__init__.py
  3. Make a brief note in the CHANGELOG.md about what changes
  4. Run make build-release to create an install package
  5. Run make release-test to upload it to PyPI's test platform
  6. Run make release to upload it to PyPI

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mediacloud-3.12.2.tar.gz (16.0 kB view details)

Uploaded Source

File details

Details for the file mediacloud-3.12.2.tar.gz.

File metadata

  • Download URL: mediacloud-3.12.2.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.20.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.6.5

File hashes

Hashes for mediacloud-3.12.2.tar.gz
Algorithm Hash digest
SHA256 ecbee3d713521db64ff5d4f6d5d63131cdd5088b852d90671e5ae9e64e3886d6
MD5 dbff64560a21f45c0c1a00d012e0483b
BLAKE2b-256 53531f40631e7fac13af81071b2d09fab86356ce3a599d57c5be5316a81ccc8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page