Skip to main content

Socio4health is a Python package for gathering and harmonizing socio-demographic data.

Project description

socio4health

Lifecycle: maturing MIT license GitHub contributors commits

Overview

Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.

  • Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
  • Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
  • Consolidating extracted data into pandas DataFrame.
  • Consolidating transformed data into a cohesive relational database.
  • Conduct precise queries and apply transformations to meet specific criteria.
  • Using natural language input to query data (Answers from values to subsets)
  • Using natural language input to create simple visualizations of data

Dependencies

pandas logo Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
numpy logo Numpy
The fundamental package for scientific computing with Python.
scrapy logo Scrapy
Framework for extracting the data you need from websites.
ggplot2 logo Pandasai
Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.

Installation

You can install the latest version of the package from GitHub using the remotes package:

# Install using pip
pip install nyctibius

How to Use it

To use the Nyctibius package, follow these steps:

  1. Import the package in your Python script:

    from socio4health import Harmonizer
    
  2. Create an instance of the Harmonizer class:

    harmonizer = Harmonizer()
    
  3. Extract data from online sources and create a list of data information:

    url = 'https://www.example.com'
    depth = 0
    ext = 'csv'
    list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
    harmonizer = Harmonizer(list_datainfo)
    
  4. Load the data from the list of data information and merge it into a relational database:

    results = harmonizer.load()
    
  5. Import the modifier module and create an instance of the Modifier class:

    from socio4health.db.modifier import Modifier
    modifier = Modifier(db_path='../../data/output/nyctibius.db')
    
  6. Perfom modifications:

    tables = modifier.get_tables()
    print(tables)
    
  7. Import the querier module and create an instance of the Querier class:

    from socio4health.db.querier import Querier
    querier = Querier(db_path='data/output/socio4health.db')
    
  8. Perform queries:

    df = querier.select(table="Estructura CHC_2017").execute()
    print(df)
    

Resources

Package Website

The socio4health website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Organizations

bsc logo uniandes logo

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socio4health-0.1.0.tar.gz (29.9 kB view details)

Uploaded Source

Built Distribution

socio4health-0.1.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file socio4health-0.1.0.tar.gz.

File metadata

  • Download URL: socio4health-0.1.0.tar.gz
  • Upload date:
  • Size: 29.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for socio4health-0.1.0.tar.gz
Algorithm Hash digest
SHA256 970b7fc385f08b310179e9047ea90e31384001eab9a4e10f50db29d994066ef7
MD5 865e19a329dfafb59e514da36cf11251
BLAKE2b-256 8dab41070877e8edd6eda517002ba88315820b8a68ca1958664df0acd800e142

See more details on using hashes here.

File details

Details for the file socio4health-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: socio4health-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for socio4health-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e9a174fb9dd4096661f59ed7b8d07c2d1196e3de9afc2f5ef4d383d82bcb48e
MD5 edbd380e72265e07cbfcac195dbdfa99
BLAKE2b-256 56f8983dbb892fa885c11a737fd15e83df17dd9ec50266fe5a986f6738e8f517

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page