Skip to main content

Socio4health is a Python package for gathering and consolidating socio-demographic data.

Project description

socio4health

Lifecycle: maturing MIT license GitHub contributors commits

Overview

Package socio4health is an extraction, transformation, loading (ETL) and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data from multiple sources focusing in sociodemografic and census datasets from Colombia, Brasil and Peru, into a unified relational database structure and visualize or querying it using natural language.

  • Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
  • Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, and compressed files, ensuring versatility in sourcing information.
  • Consolidating extracted data into pandas DataFrame.
  • Consolidating transformed data into a cohesive relational database.
  • Conduct precise queries and apply transformations to meet specific criteria.
  • Using natural language input to query data (Answers from values to subsets)
  • Using natural language input to create simple visualizations of data

Dependencies

pandas logo Pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool.
numpy logo Numpy
The fundamental package for scientific computing with Python.
scrapy logo Scrapy
Framework for extracting the data you need from websites.
ggplot2 logo Pandasai
Integrates generative artificial intelligence capabilities into pandas, making dataframes conversational.

Installation

You can install the latest version of the package from GitHub using the remotes package:

# Install using pip
pip install nyctibius

How to Use it

To use the Nyctibius package, follow these steps:

  1. Import the package in your Python script:

    from socio4health import Harmonizer
    
  2. Create an instance of the Harmonizer class:

    harmonizer = Harmonizer()
    
  3. Extract data from online sources and create a list of data information:

    url = 'https://www.example.com'
    depth = 0
    ext = 'csv'
    list_datainfo = harmonizer.extract(url=url, depth=depth, ext=ext)
    harmonizer = Harmonizer(list_datainfo)
    
  4. Load the data from the list of data information and merge it into a relational database:

    results = harmonizer.load()
    
  5. Import the modifier module and create an instance of the Modifier class:

    from socio4health.db.modifier import Modifier
    modifier = Modifier(db_path='../../data/output/nyctibius.db')
    
  6. Perfom modifications:

    tables = modifier.get_tables()
    print(tables)
    
  7. Import the querier module and create an instance of the Querier class:

    from socio4health.db.querier import Querier
    querier = Querier(db_path='data/output/socio4health.db')
    
  8. Perform queries:

    df = querier.select(table="Estructura CHC_2017").execute()
    print(df)
    

Resources

Package Website

The socio4health website package website includes a function reference, a model outline, and case studies using the package. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international develop cost-effective and reproducible digital tools for stakeholders in hotspots affected by a changing climate in Latin America & the Caribbean (LAC), including cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru and Spain.

Organizations

bsc logo uniandes logo

Authors / Contact information

List the authors/contributors of the package and provide contact information if users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socio4health-0.1.4.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

socio4health-0.1.4-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file socio4health-0.1.4.tar.gz.

File metadata

  • Download URL: socio4health-0.1.4.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-0.1.4.tar.gz
Algorithm Hash digest
SHA256 27f62e909b66bac7a4b9cb1b88f74b0380c30c67791cb7eb706806be96b9e8dd
MD5 614a32ab6a8d47a7ca48ab388a2a99f1
BLAKE2b-256 887670f80a7c19d615c877f6a5e1ef72a32323c8d1807562550a6710e1c538ca

See more details on using hashes here.

File details

Details for the file socio4health-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: socio4health-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for socio4health-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c9a42c34bc238ede143d8580f8d5301eba3268770996cdd903ec145ebdf4acea
MD5 9bc22f160080b9a5585194920acae7be
BLAKE2b-256 fde3baa7aa77aca058fbd046235e59e8b8597d8471524c6aa94533e98cc6cac1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page