Python based framework to retreive Global Database of
Project description
GDELT in Python with ``gdeltPyR``
======================
``gdeltPyR`` is a Python-based framework to retrieve `[Global Database of Events, Language, and Tone (GDELT) 1.0 and 2.0 data <http://gdeltproject.org/data.html>`_ for analysis in Python Pandas or R dataframes (R dataframe output feature coming soon). A user can enter a single date, date range (two strings), or several individual dates and return a `tidy data set ready for scientific or data-driven exploration <http://vita.had.co.nz/papers/tidy-data.pdf>`_.
`gdeltPyR` retrieves `[Global Database of Events, Language, and Tone (GDELT) 1.0 and 2.0 data <http://gdeltproject.org/data.html>`_ via `[parallel HTTP GET requests <http://docs.python-requests.org/en/v0.10.6/user/advanced/#asynchronous-requests>`_ and is an alternative to `accessing GDELT data via Google BigQuery <http://gdeltproject.org/data.html#googlebigquery>`_. Therefore, the more CPUs or cores you have, the less time it takes to pull more data. Moreover, the more RAM you have, the more data you can pull. And finally, for RAM-limited workflows, create a pipeline that pulls data, writes to disc, and flushes. The only limitation with data pulls ``gdeltPyR`` is you hardware.
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. It monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and accesses the world’s breaking events and reaction in near-realtime as both the GDELT Event and Global Knowledge Graph update every 15 minutes. Visit the `GDELT website to learn more about the project <(http://gdeltproject.org/#intro)>`_.
Installation
--------------
Latest release installs from PyPi::
pip install gdelt
Latest dev version of ``gdeltPyR`` can be installed from GitHub.com::
pip install git+https://github.com/linwoodc3/gdeltPyR
.. image:: https://twistedsifter.files.wordpress.com/2015/06/people-tweeting-about-sunrises-over-a-24-hour-period.gif?w=700&h=453
:alt: GDELT can help you visualize the world's news!!! Analyze GDELT data with gdeltPyR!!
Basic Usage
--------------
.. code-block:: python
#############################
# Import gdeltPyR; instantiate
#############################
import gdelt
gd = gdelt.gdelt(version=2)
results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True)
Output Type
--------------
At the moment, ``gdeltPyR`` outputs pandas dataframes with the headers for which ever table/version your set as your input parameters. Performance and speed of query return depends on your computer. The more cores you have, the quicker your queries return. As an example, the query above takes ~36 seconds on a 4 core, MacOS Sierra 10.12 with 16GB of RAM laptop. It returns 900,000 rows of data with 61 columns (events database table).
Full-on open source project with the following contributors:
* Linwood Creekmore
2016-09-25
Released 0.1
Initial check-in of gdeltPyR
2016-10-23
Working on 0.1 release, basic functionality all there
2016-10-25
Changed MANIFEST.in file; cleaned up egg.info
2016-10-30
Edited the warning strings and aligned some code with PEP8
2016-10-31
Added human readable CAMEO Codes column
2016-10-31
Bug fix; file not loading or downloading on install
2016-11-03
Added ability to pull GKG 1.0 data
2016-11-06
GDELT changed url structure for 2.0 events database.
2016-11-07
Typo on line 290 of base.py; removed and fixed bug.
2016-11-12
Major update to code; fixed Windows date range/interval freezing bug.
Added more documentation and included ability to output GDELT 1.0 or 2.0
data as R dataframe, json, or csv (csv is joined with all other dates and
headers).
======================
``gdeltPyR`` is a Python-based framework to retrieve `[Global Database of Events, Language, and Tone (GDELT) 1.0 and 2.0 data <http://gdeltproject.org/data.html>`_ for analysis in Python Pandas or R dataframes (R dataframe output feature coming soon). A user can enter a single date, date range (two strings), or several individual dates and return a `tidy data set ready for scientific or data-driven exploration <http://vita.had.co.nz/papers/tidy-data.pdf>`_.
`gdeltPyR` retrieves `[Global Database of Events, Language, and Tone (GDELT) 1.0 and 2.0 data <http://gdeltproject.org/data.html>`_ via `[parallel HTTP GET requests <http://docs.python-requests.org/en/v0.10.6/user/advanced/#asynchronous-requests>`_ and is an alternative to `accessing GDELT data via Google BigQuery <http://gdeltproject.org/data.html#googlebigquery>`_. Therefore, the more CPUs or cores you have, the less time it takes to pull more data. Moreover, the more RAM you have, the more data you can pull. And finally, for RAM-limited workflows, create a pipeline that pulls data, writes to disc, and flushes. The only limitation with data pulls ``gdeltPyR`` is you hardware.
The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. It monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and accesses the world’s breaking events and reaction in near-realtime as both the GDELT Event and Global Knowledge Graph update every 15 minutes. Visit the `GDELT website to learn more about the project <(http://gdeltproject.org/#intro)>`_.
Installation
--------------
Latest release installs from PyPi::
pip install gdelt
Latest dev version of ``gdeltPyR`` can be installed from GitHub.com::
pip install git+https://github.com/linwoodc3/gdeltPyR
.. image:: https://twistedsifter.files.wordpress.com/2015/06/people-tweeting-about-sunrises-over-a-24-hour-period.gif?w=700&h=453
:alt: GDELT can help you visualize the world's news!!! Analyze GDELT data with gdeltPyR!!
Basic Usage
--------------
.. code-block:: python
#############################
# Import gdeltPyR; instantiate
#############################
import gdelt
gd = gdelt.gdelt(version=2)
results = gd.Search(['2016 10 19','2016 10 22'],table='events',coverage=True)
Output Type
--------------
At the moment, ``gdeltPyR`` outputs pandas dataframes with the headers for which ever table/version your set as your input parameters. Performance and speed of query return depends on your computer. The more cores you have, the quicker your queries return. As an example, the query above takes ~36 seconds on a 4 core, MacOS Sierra 10.12 with 16GB of RAM laptop. It returns 900,000 rows of data with 61 columns (events database table).
Full-on open source project with the following contributors:
* Linwood Creekmore
2016-09-25
Released 0.1
Initial check-in of gdeltPyR
2016-10-23
Working on 0.1 release, basic functionality all there
2016-10-25
Changed MANIFEST.in file; cleaned up egg.info
2016-10-30
Edited the warning strings and aligned some code with PEP8
2016-10-31
Added human readable CAMEO Codes column
2016-10-31
Bug fix; file not loading or downloading on install
2016-11-03
Added ability to pull GKG 1.0 data
2016-11-06
GDELT changed url structure for 2.0 events database.
2016-11-07
Typo on line 290 of base.py; removed and fixed bug.
2016-11-12
Major update to code; fixed Windows date range/interval freezing bug.
Added more documentation and included ability to output GDELT 1.0 or 2.0
data as R dataframe, json, or csv (csv is joined with all other dates and
headers).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gdelt-0.1.9.tar.gz
(231.1 kB
view hashes)
Built Distribution
gdelt-0.1.9-py2.py3-none-any.whl
(28.2 kB
view hashes)
Close
Hashes for gdelt-0.1.9-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42ef2eb63e8a2f42bba52e36a2f2464bf0b9caa3b6e824137177eaf8a40993f9 |
|
MD5 | eed26ebafc80ddbdbfa05e01de08874f |
|
BLAKE2b-256 | a5c3122fd73070088fee5e15575c1b65cc6b087c3727b4a31690155177a05008 |