A tool for extracting data and reading csv into datafames details
Project description
Data_Engineering Challenge
This is a task to retrieve publicly available data and perform analysis on that data using SQL. The data were obtained through different means and format. The first dataset obtained from World Bank API is World Bank data which was extracted in a json format. The second dataset is a GDP data download from the source as .csv
Therefore, in this GitHub repo are three .py files or modules to handle the extraction, processing and staging the data in a postgresql database. A .sql file which contains all analysis done on data.
Getting Started
- data_grab.py - This module extracts the World Bank data, it cleans and saves the data in a both json and csv file for reference.
- process_csv.py - This module reads the csv files and process (JOINS & converting to dataframe) before it is staged.
- stage_data.py - Module handles the staging of data. It creates the tables and saves data to the database.
- To begin;
- Make sure all csv files are downloaded and located in the same the directory the .py files are located.
- Run stage_data.py' run the script directly.
- A prompt to enter database password and host pops up.
- Once the information is passed in, the process begins and the data is extracted and staged.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file M_challenge-0.0.3.tar.gz.
File metadata
- Download URL: M_challenge-0.0.3.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20164f965e1d0177627494f1b1605372f5c51d9b430d0f1bac38854b869814cb
|
|
| MD5 |
c864ffa2dacb109435582a9dd9c9d4ea
|
|
| BLAKE2b-256 |
1e9a3bc13af74ef76360849595e167d35d0cd5925537c0a5ff30095074591af5
|