Skip to main content

Pythonic access to UNESCO data

Project description

unesco_reader

PyPI PyPI - Python Version Documentation Status codecov Black

Pythonic access to UNESCO data

unesco_reader is a Python package that provides a simple interface to access UNESCO Institute of Statistics (UIS) data. UIS currently does not offer API access to its data. Users must download zipped files and extract the data. This process requires several manual steps explained in their python tutorial. This package simplifies the process by providing a simple interface to access, explore, and analyze the data, already structured and formatted through pandas DataFrames. This package also allows users to view dataset documentation and other information such as the date of last update, as well as retrieve information about all available datasets from UIS.

Note:

UIS data is expected to be accessible through the DataCommons API in the future and should be the preferred method to access the data. Future versions of this package may include support for the API, or may be deprecated and remain as a legacy package.

This package is designed to scrape data from the UIS website. As a result of this approach the package may be subject to breakage if the website structure or data file formats change without notice. Please report any unexpected errors or issues you encounter. All feedback, suggestions, and contributions are welcome!

Installation

$ pip install unesco-reader

Usage

Importing the package

import unesco_reader as uis

Retrieve information about all the available datasets from UIS.

uis.info()

This function will display all available datasets and relevant information about them.

>>>
name                                                               latest_update    theme
-----------------------------------------------------------------  ---------------  ---------
SDG Global and Thematic Indicators                                 February 2024    Education
Other Policy Relevant Indicators (OPRI)                            February 2024    Education
Research and Development (R&D) SDG 9.5                             February 2024    Science
Research and Development (R&D) – Other Policy Relevant Indicators  February 2024    Science
...

Retrieve a list of all available datasets from UIS.

uis.available_datasets()
>>> ['SDG Global and Thematic Indicators',
     'Other Policy Relevant Indicators (OPRI)',
     'Research and Development (R&D) SDG 9.5',
     ...]

Optionally you can specify a theme to filter the datasets.

uis.available_datasets(theme='Education')

To access data for a particular dataset, use the UIS class passing the name of the dataset. A UIS object allows a user to easily access, explore, and analyse the data. On instantiation, the data will be extracted from the UIS website, or if it has already been extracted, it will be read from the cache (more on caching below)

from unesco_reader import UIS

sdg = UIS("SDG Global and Thematic Indicators")

Basic information about the dataset can be accessed using the info method.

sdg.info()

This will display information about the dataset, such as the name, and the latest update, and theme

>>>
-------------  ----------------------------------
name           SDG Global and Thematic Indicators
latest update  February 2024
theme          Education
-------------  ----------------------------------

Information is also accessible through the attributes of the object.

name = sdg.name
update = sdg.latest_update
theme = sdg.theme
documentation = sdg.readme

The readme attribute contains the dataset documentation. To display the documentation, use the display_readme method.

sdg.display_readme()

Various methods exist to access the data. To access country data:

df = sdg.get_country_data()

This will return a pandas DataFrame with the country data, in a structured and expected format. By default the dataframe will not contain metadata. To include metadata in the output, set the include_metadata parameter to True. Countries may also be filtered for a specific region by specifying the region's ID in the region parameter. To see available regions use the get_regions method.

df = sdg.get_country_data(include_metadata=True, region='WB: World')

To access regional data:

df = sdg.get_region_data()

This will return a pandas DataFrame with the regional data, in a structured and expected format. Note that not all datasets contain regional data. If the dataset does not contain regional data, an error will be raised. This is the same for any other data that is not available for the particular dataset. By default the dataframe will not contain metadata. To include metadata in the output, set the include_metadata parameter to True.

Metadata, available countries, available regions, and variables are also accessible through class objects.

metadata_df = sdg.get_metadata()
countries_df = sdg.get_countries()
regions_df = sdg.get_regions()
variables_df = sdg.get_variables()

To refresh the data and extract the latest data from the UIS website, use the refresh method.

sdg.refresh()

Caching

Caching is used to prevent unnecessary requests to the UIS website and enhance performance. To refresh data returned by functions, use the refresh parameter. Caching using the LRU (Least Recently Used) algorithm approach and stores data in RAM. The cache is cleared when the program is terminated.

uis.info(refresh=True)
uis.available_datasets(refresh=True)

refresh=True will clear the cache and force extraction of the data and information from the UIS website.

For the UIS class, the refresh method will clear the cache and extract the latest data from the UIS website.

sdg.refresh()

To clear all cached data, use the clear_all_caches method.

uis.clear_all_caches()

Contributing

All contributions are welcome! If you find a bug, or have a suggestion for a new feature, or an improvement on the documentation please open an issue. Since this project is under current development, please check open issues and make sure the issue has not been raised already.

A detailed overview of the contribution process can be found here. By contributing to this project, you agree to abide by its terms.

License

unesco_reader was created by Luca Picci. It is licensed under the terms of the MIT license.

Credits

unesco_reader was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unesco_reader-1.0.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unesco_reader-1.0.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file unesco_reader-1.0.0.tar.gz.

File metadata

  • Download URL: unesco_reader-1.0.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for unesco_reader-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b08ecc1f43c208410c7f4472a98966aea6704483ebee5f1156ac0e512c826558
MD5 752d0676cfadf3ce23f86fa422a9af6f
BLAKE2b-256 2cc7fac55dac2b441aaa7c966eb5bdb93a9e53d3cac77188735c443e6c5235e6

See more details on using hashes here.

File details

Details for the file unesco_reader-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: unesco_reader-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for unesco_reader-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5f43024d829e5b5bc6036eb8000b1bc976001327fe696ef648a838fd8350983
MD5 37619506c00e86fc1a7e3aecef5bbbe3
BLAKE2b-256 d2dc2f28d3b371de4cdd3d0f360fdb5f5d1bfb70d1c2dd7bd2d5c676dfa45aa3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page