Skip to main content

Socio4health is a Python package for gathering and consolidating socio-demographic data.

Project description

socio4health

Lifecycle: maturing MIT license GitHub contributors commits

Overview

Package socio4health is an extraction, transformation and loading (ETL) classification tool designed to simplify the intricate process of collecting and merging data from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a harmonized dataset.

  • Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
  • Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, fixed-width files and geospatial files, ensuring versatility in sourcing information.
  • Consolidating extracted data into a pandas (or dask) DataFrame.

Dependencies

pandas logo Dask
Dask is a flexible parallel computing library for analytics.
pandas logo Pandas
Pandas is a well-known open source data analysis and manipulation tool.
pandas logo Geopandas
Python tools for geographic data.
numpy logo Numpy
The fundamental package for scientific computing with Python.
scrapy logo Scrapy
Framework for extracting the data you need from websites.
scrapy logo Matplotlib
Library for creating static, animated, and interactive visualizations in Python.
scrapy logo Torch
Python package for tensor computation and deep neural networks.

Installation

socio4health can be installed via pip from PyPI.

# Install using pip
pip install socio4health

How to Use it

To use the socio4health package, follow these steps:

  1. Import the package in your Python script:

    from socio4health import Extractor()
    from socio4health import Harmonizer
    
  2. Create an instance of the Extractor class:

    extractor = Extractor()
    
  3. Extract data from online sources and create a list of data information:

    url = 'https://www.example.com'
    depth = 0
    ext = 'csv'
    list_datainfo = extractor.s4h_extract(url=url, depth=depth, ext=ext)
    harmonizer = Harmonizer()
    

For more detailed examples and use cases, please refer to the socio4health documentation.

Resources

Package Website

The socio4health website package website includes API reference, user guide, and examples. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international project that develops cost-effective and reproducible digital tools for stakeholders in Latin America and the Caribbean (LAC) affected by a changing climate. These stakeholders include cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru, and Spain.

Organizations

bsc logo uniandes logo

Authors / Contact information

Here is the contact information of authors/contributors in case users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)
Juan Montenegro (developer)
Ingrid Mora (documentation)


Changelog

All notable changes to this project will be documented in this file.

The format is based on "Keep a Changelog" (https://keepachangelog.com/en/1.0.0/)

[Unreleased]

  • Prepare improvements and documentation updates.

[1.0.0] - 2025-10-22

Added

  • Project now includes changelog linked from packaging metadata.
  • Minor documentation updates.

Fixed

  • Packaging metadata clarified in setup.py.

[0.1.7] - 2024-06-01

Added

  • Initial public release notes placeholder.

[Unreleased]: https://github.com/harmonize-tools/socio4health/compare/v1.0.0...HEAD

[1.0.0]: https://github.com/harmonize-tools/socio4health/compare/v0.1.7...v1.0.0

[0.1.7]: https://github.com/harmonize-tools/socio4health/releases/tag/v0.1.7

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socio4health-1.0.0.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

socio4health-1.0.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file socio4health-1.0.0.tar.gz.

File metadata

  • Download URL: socio4health-1.0.0.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-1.0.0.tar.gz
Algorithm Hash digest
SHA256 12293bac12e0c9e17f317fd91242e7855199d86063241180a2e68032f6483471
MD5 322039e9b5bbb2e249519e93179eb153
BLAKE2b-256 5fe13d43716b297d2d9d567c2f1c2bbd8830fc00ca724463f62a0bc3142e07ef

See more details on using hashes here.

File details

Details for the file socio4health-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: socio4health-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 241942ac8aaac8e11844737ef3fc7e37c82d43e3a7ffd5e7e1dd2cdef62a59b0
MD5 0b94d2fa6e82faf8a2edbf69269fd13b
BLAKE2b-256 44e71b648998588c39b0e8e815a073932042de49e222d0efe152206b8911e06d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page