Skip to main content

A tool for extracting data and reading csv into datafames details

Project description

Data_Engineering Challenge

This is a task to retrieve publicly available data and perform analysis on that data using SQL. The data were obtained through different means and format. The first dataset obtained from World Bank API is World Bank data which was extracted in a json format. The second dataset is a GDP data download from the source as .csv

Therefore, in this GitHub repo are three .py files or modules to handle the extraction, processing and staging the data in a postgresql database. A .sql file which contains all analysis done on data.

Getting Started

  1. data_grab.py - This module extracts the World Bank data, it cleans and saves the data in a both json and csv file for reference.
  2. process_csv.py - This module reads the csv files and process (JOINS & converting to dataframe) before it is staged.
  3. stage_data.py - Module handles the staging of data. It creates the tables and saves data to the database.
  4. To begin;
    1. Make sure all csv files are downloaded and located in the same the directory the .py files are located.
    2. Run stage_data.py' run the script directly.
    3. A prompt to enter database password and host pops up.
  5. Once the information is passed in, the process begins and the data is extracted and staged.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

M_challenge-0.0.2.tar.gz (4.3 kB view details)

Uploaded Source

File details

Details for the file M_challenge-0.0.2.tar.gz.

File metadata

  • Download URL: M_challenge-0.0.2.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for M_challenge-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5f5aaa2a6d6651cf8a7fce52b9b69dbf66c0fe8f571735336add304fe6c03343
MD5 d4410df79c4e2857436915d90faac258
BLAKE2b-256 7cf9f70de8b513669db844b9138d60a332d66675f948b31ddf6ecf7d0920dff5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page