Skip to main content

A tool for extracting data and reading csv into datafames details

Project description

Data_Engineering Challenge

This is a task to retrieve publicly available data and perform analysis on that data using SQL. The data were obtained through different means and format. The first dataset obtained from World Bank API is World Bank data which was extracted in a json format. The second dataset is a GDP data download from the source as .csv

Therefore, in this GitHub repo are three .py files or modules to handle the extraction, processing and staging the data in a postgresql database. A .sql file which contains all analysis done on data.

Getting Started

  1. data_grab.py - This module extracts the World Bank data, it cleans and saves the data in a both json and csv file for reference.
  2. process_csv.py - This module reads the csv files and process (JOINS & converting to dataframe) before it is staged.
  3. stage_data.py - Module handles the staging of data. It creates the tables and saves data to the database.
  4. To begin;
    1. Make sure all csv files are downloaded and located in the same the directory the .py files are located.
    2. Run stage_data.py' run the script directly.
    3. A prompt to enter database password and host pops up.
  5. Once the information is passed in, the process begins and the data is extracted and staged.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

M_challenge-0.0.1.tar.gz (4.3 kB view details)

Uploaded Source

File details

Details for the file M_challenge-0.0.1.tar.gz.

File metadata

  • Download URL: M_challenge-0.0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for M_challenge-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d5496142b84874872780d89da01dc4bad96329aebff5d0eb26e29a8119452214
MD5 f0c288d1c193cc1159d89d7e994e8338
BLAKE2b-256 f16d1b405c48baeaf629441f5f026fe49950ec3093863eb4883889e666a5b7c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page