Skip to main content

Socio4health is a Python package for gathering and consolidating socio-demographic data.

Project description

socio4health

Lifecycle: maturing MIT license GitHub contributors commits

Overview

Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.

  • Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
  • Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
  • Consolidating extracted data into pandas DataFrame.
  • Consolidating transformed data into a cohesive relational database.
  • Conduct precise queries and apply transformations to meet specific criteria.
  • Using natural language input to query data (Answers from values to subsets)
  • Using natural language input to create simple visualizations of data

Dependencies

pandas logo Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
numpy logo Numpy
The fundamental package for scientific computing with Python.
scrapy logo Scrapy
Framework for extracting the data you need from websites.
ggplot2 logo Pandasai
Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.

Installation

You can install the latest version of the package from GitHub using the remotes package:

# Install using pip
pip install nyctibius

How to Use it

To use the Nyctibius package, follow these steps:

  1. Import the package in your Python script:

    from socio4health import Harmonizer
    
  2. Create an instance of the Harmonizer class:

    harmonizer = Harmonizer()
    
  3. Extract data from online sources and create a list of data information:

    url = 'https://www.example.com'
    depth = 0
    ext = 'csv'
    list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
    harmonizer = Harmonizer(list_datainfo)
    
  4. Load the data from the list of data information and merge it into a relational database:

    results = harmonizer.load()
    
  5. Import the modifier module and create an instance of the Modifier class:

    from socio4health.db.modifier import Modifier
    modifier = Modifier(db_path='../../data/output/nyctibius.db')
    
  6. Perfom modifications:

    tables = modifier.get_tables()
    print(tables)
    
  7. Import the querier module and create an instance of the Querier class:

    from socio4health.db.querier import Querier
    querier = Querier(db_path='data/output/socio4health.db')
    
  8. Perform queries:

    df = querier.select(table="Estructura CHC_2017").execute()
    print(df)
    

Resources

Package Website

The socio4health website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Organizations

bsc logo uniandes logo

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socio4health-0.1.5.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

socio4health-0.1.5-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file socio4health-0.1.5.tar.gz.

File metadata

  • Download URL: socio4health-0.1.5.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1519a7c983234092fb310c81b9cbccdb72fff086b3b2441650aa8d23e960ee4f
MD5 8a44f0f5b6571c9a2a8fa8edd766b2ee
BLAKE2b-256 3a4d2de44c692084c39b144d4ff61bc4732b6652e2c00f9bfdc12abdb4bcdbf5

See more details on using hashes here.

File details

Details for the file socio4health-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: socio4health-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e7c1f984b4afd2b346930e7639afeb64b1adcdaeb85bff3260795e8c9c0d52ac
MD5 46a1f0f124c9a0751ce67344a622e3dc
BLAKE2b-256 1282ecd2648db897db2e55ede4a8b6f416a0eb64f14e0de2cd20145e3a3abaf4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page