Skip to main content

Socio4health is a Python package for gathering and harmonizing socio-demographic data.

Project description

socio4health

Lifecycle: maturing MIT license GitHub contributors commits

Overview

Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.

  • Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
  • Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
  • Consolidating extracted data into pandas DataFrame.
  • Consolidating transformed data into a cohesive relational database.
  • Conduct precise queries and apply transformations to meet specific criteria.
  • Using natural language input to query data (Answers from values to subsets)
  • Using natural language input to create simple visualizations of data

Dependencies

pandas logo Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
numpy logo Numpy
The fundamental package for scientific computing with Python.
scrapy logo Scrapy
Framework for extracting the data you need from websites.
ggplot2 logo Pandasai
Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.

Installation

You can install the latest version of the package from GitHub using the remotes package:

# Install using pip
pip install nyctibius

How to Use it

To use the Nyctibius package, follow these steps:

  1. Import the package in your Python script:

    from socio4health import Harmonizer
    
  2. Create an instance of the Harmonizer class:

    harmonizer = Harmonizer()
    
  3. Extract data from online sources and create a list of data information:

    url = 'https://www.example.com'
    depth = 0
    ext = 'csv'
    list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
    harmonizer = Harmonizer(list_datainfo)
    
  4. Load the data from the list of data information and merge it into a relational database:

    results = harmonizer.load()
    
  5. Import the modifier module and create an instance of the Modifier class:

    from socio4health.db.modifier import Modifier
    modifier = Modifier(db_path='../../data/output/nyctibius.db')
    
  6. Perfom modifications:

    tables = modifier.get_tables()
    print(tables)
    
  7. Import the querier module and create an instance of the Querier class:

    from socio4health.db.querier import Querier
    querier = Querier(db_path='data/output/socio4health.db')
    
  8. Perform queries:

    df = querier.select(table="Estructura CHC_2017").execute()
    print(df)
    

Resources

Package Website

The socio4health website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Organizations

bsc logo uniandes logo

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socio4health-0.1.1.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

socio4health-0.1.1-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file socio4health-0.1.1.tar.gz.

File metadata

  • Download URL: socio4health-0.1.1.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for socio4health-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ab8853d324bdd6885a2c48676f6159be6c400ab0478c8de4a1263d891e5fa09d
MD5 0618d332e44bc7f3a28245762af83132
BLAKE2b-256 864e89e33d2b394e56bf59ac935796786464cedebe1d6db5cc28388ec3b42ec8

See more details on using hashes here.

File details

Details for the file socio4health-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: socio4health-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for socio4health-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 369ce991ef70b3aa91f09820172cbcf040141ec435775ae7411f3dab9b2336fd
MD5 cd2557785c4192769262bf435489d689
BLAKE2b-256 9df6f9f9f68e95107824b515dca5575d6541bda33bda05ab187c6e9027f30f47

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page