Skip to main content

A tool for extracting data and reading csv into datafames details

Project description

Data_Engineering Challenge

This is a task to retrieve publicly available data and perform analysis on that data using SQL. The data were obtained through different means and format. The first dataset obtained from World Bank API is World Bank data which was extracted in a json format. The second dataset is a GDP data download from the source as .csv

Therefore, in this GitHub repo are three .py files or modules to handle the extraction, processing and staging the data in a postgresql database. A .sql file which contains all analysis done on data.

Getting Started

  1. data_grab.py - This module extracts the World Bank data, it cleans and saves the data in a both json and csv file for reference.
  2. process_csv.py - This module reads the csv files and process (JOINS & converting to dataframe) before it is staged.
  3. stage_data.py - Module handles the staging of data. It creates the tables and saves data to the database.
  4. To begin;
    1. Make sure all csv files are downloaded and located in the same the directory the .py files are located.
    2. Run stage_data.py' run the script directly.
    3. A prompt to enter database password and host pops up.
  5. Once the information is passed in, the process begins and the data is extracted and staged.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

M_challenge-0.0.3.tar.gz (4.3 kB view details)

Uploaded Source

File details

Details for the file M_challenge-0.0.3.tar.gz.

File metadata

  • Download URL: M_challenge-0.0.3.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9

File hashes

Hashes for M_challenge-0.0.3.tar.gz
Algorithm Hash digest
SHA256 20164f965e1d0177627494f1b1605372f5c51d9b430d0f1bac38854b869814cb
MD5 c864ffa2dacb109435582a9dd9c9d4ea
BLAKE2b-256 1e9a3bc13af74ef76360849595e167d35d0cd5925537c0a5ff30095074591af5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page