Skip to main content

Library to easily access and read data from data.gov.sg

Project description

govtech-data: An easy-to-use, unofficial library to access data.gov.sg

Features

  • govtech-data
    • An easy search to find the data you are looking for!
    • Smart caching to reduce the number of API calls!
  • govtech-data[openai]
    • An easy

Optional requirements

Installation

To install the client library

pip install govtech-data

To install the client library with OpenAI functionality

pip install govtech-data[openai]

Examples

To search for a dataset

In [1]: from govtech_data import GovTechClient

In [2]: GovTechClient.search_package("resale prices", limit=5)
2023-04-17 16:09:27.818 | DEBUG    | govtech_data.client:get_model_from_json_response:109 - endpoint: https://data.gov.sg/api/action/package_list
Out[2]:
[SearchPackage(package_id='resale-flat-prices', score=95),
 SearchPackage(package_id='median-resale-prices-for-registered-applications-by-town-and-flat-type', score=90),
 SearchPackage(package_id='average-retail-prices-of-selected-consumer-items-monthly', score=86),
 SearchPackage(package_id='average-retail-prices-of-selected-items-annual', score=86),
 SearchPackage(package_id='changes-in-value-added-per-worker-at-current-market-prices-by-industry-ssic-2015-quarterly', score=86)]

To read from a dataset

In [1]: from govtech_data import GovTechClient

In [2]: df = GovTechClient.fetch_dataframe_from_package("resale-flat-prices")
2023-04-17 16:10:35.600 | DEBUG    | govtech_data.client:get_model_from_json_response:109 - endpoint: https://data.gov.sg/api/action/package_show
2023-04-17 16:10:36.529 | DEBUG    | govtech_data.utils.content:fetch_url:11 - Fetching url - https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards-2023-04-17T03-28-08Z.csv

In [3]: df
Out[3]:
shape: (151_157, 11)
┌─────────┬────────────┬───────────┬───────┬───┬────────────────┬─────────────────────┬────────────────────┬──────────────┐
 month    town        flat_type  block    flat_model      lease_commence_date  remaining_lease     resale_price 
 ---      ---         ---        ---       ---             ---                  ---                 ---          
 str      str         str        str       str             i64                  str                 f64          
╞═════════╪════════════╪═══════════╪═══════╪═══╪════════════════╪═════════════════════╪════════════════════╪══════════════╡
 2017-01  ANG MO KIO  2 ROOM     406      Improved        1979                 61 years 04 months  232000.0     
 2017-01  ANG MO KIO  3 ROOM     108      New Generation  1978                 60 years 07 months  250000.0     
 2017-01  ANG MO KIO  3 ROOM     602      New Generation  1980                 62 years 05 months  262000.0     
 2017-01  ANG MO KIO  3 ROOM     465      New Generation  1980                 62 years 01 month   265000.0     
                                                                                                        
 2023-04  YISHUN      EXECUTIVE  258      Maisonette      1985                 61 years 02 months  808000.0     
 2023-04  YISHUN      EXECUTIVE  293      Apartment       1992                 68 years 04 months  892000.0     
 2023-04  YISHUN      EXECUTIVE  723      Apartment       1986                 62 years 02 months  780000.0     
 2023-04  YISHUN      EXECUTIVE  828      Apartment       1988                 63 years 10 months  865000.0     
└─────────┴────────────┴───────────┴───────┴───┴────────────────┴─────────────────────┴────────────────────┴──────────────┘

Ask OpenAI to generate code for a question

In [1]: from govtech_data.utils.openai import OpenAIClient

In [2]: govtech_openai_client = OpenAIClient()

In [3]: govtech_openai_client.query("get average resale flat prices in bedok for different flat-types in a dataframe")
2023-04-18 01:25:17.135 | DEBUG    | govtech_data.utils.openai:query:56 - Request:
{'role': 'user', 'content': 'get average resale flat prices in bedok for different flat-types in a dataframe'}
2023-04-18 01:25:17.270 | WARNING  | govtech_data.utils.openai:num_tokens_from_messages:192 - Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
2023-04-18 01:25:17.271 | DEBUG    | govtech_data.utils.openai:simple_query_openai:233 - Total number of tokens in messages: 603
2023-04-18 01:25:33.830 | DEBUG    | govtech_data.utils.openai:query:70 - ChatGPT content response:
{'thoughts': {'text': 'I need to get the dataset id that contains resale flat prices in Bedok and then extract the required information using pandas.', 'reasoning': "I will need to query the dataset using 'dataset_search' and 'get_dataset' commands to get the dataset id and then use 'get_all_distinct_values_in_a_dataset_field' to get the different flat types available in the dataset. I can then filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.", 'plan': ["Search for dataset containing resale flat prices in Bedok using 'dataset_search' command", "Extract dataset id using 'get_dataset' command", "Get all distinct flat types in the dataset using 'get_all_distinct_values_in_a_dataset_field' command", 'Filter dataset using pandas to get average resale flat prices for each flat type in Bedok'], 'criticism': 'I need to ensure that I am using the correct dataset and field names to get the required information.', 'speak': "I will use 'dataset_search', 'get_dataset' and 'get_all_distinct_values_in_a_dataset_field' commands to extract the required information."}, 'command': {'name': 'dataset_search', 'args': {'input': 'resale flat prices bedok'}}}
2023-04-18 01:25:33.833 | DEBUG    | govtech_data.client:get_model_from_json_response:109 - endpoint: https://data.gov.sg/api/action/package_list
2023-04-18 01:25:35.197 | DEBUG    | govtech_data.utils.openai:query:56 - Request:
{'role': 'user', 'content': 'Datasets found for resale flat prices bedok: [{"id":"resale-flat-prices","score":95},{"id":"age-of-hdb-population-by-ethnic-group-and-flat-type-in-age-groups","score":86},{"id":"average-and-median-size-of-hdb-households-by-ethnic-group-and-flat-type","score":86},{"id":"average-number-of-income-earners-of-hdb-households-by-ethnic-group-and-flat-type","score":86},{"id":"average-retail-prices-of-selected-consumer-items-monthly","score":86},{"id":"average-retail-prices-of-selected-items-annual","score":86},{"id":"changes-in-value-added-per-worker-at-current-market-prices-by-industry-ssic-2015-quarterly","score":86},{"id":"compensation-of-employees-by-industry-at-current-prices-annual","score":86},{"id":"construction-material-market-prices-monthly","score":86},{"id":"dependency-ratio-of-hdb-resident-population-by-ethnic-group-and-flat-type","score":86}]'}
2023-04-18 01:25:35.197 | WARNING  | govtech_data.utils.openai:num_tokens_from_messages:192 - Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
2023-04-18 01:25:35.198 | DEBUG    | govtech_data.utils.openai:simple_query_openai:233 - Total number of tokens in messages: 1082
2023-04-18 01:25:53.001 | DEBUG    | govtech_data.utils.openai:query:70 - ChatGPT content response:
{'thoughts': {'text': "From the search results, we can see that the first dataset 'resale-flat-prices' is the dataset we need. I will now extract the dataset id using 'get_dataset' command and then use 'get_dataset_schema' command to get the schema of the dataset and extract the required field names.", 'reasoning': "I will need the schema of the dataset to get the field names for flat type and town. Once I have the field names, I can then use 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types available in the dataset and then filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.", 'plan': ["Extract dataset id for 'resale-flat-prices' using 'get_dataset' command", "Get the schema of the dataset using 'get_dataset_schema' command to get the required field names", "Use 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types available in the dataset", 'Filter dataset using pandas to get average resale flat prices for each flat type in Bedok'], 'criticism': 'I need to ensure that I am using the correct field names and values to extract the required information.', 'speak': "I will use 'get_dataset' and 'get_dataset_schema' commands to extract the required field names and 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types available in the dataset."}, 'command': {'name': 'get_dataset', 'args': {'id': 'resale-flat-prices'}}}
2023-04-18 01:25:53.003 | DEBUG    | govtech_data.client:get_model_from_json_response:109 - endpoint: https://data.gov.sg/api/action/package_show
2023-04-18 01:25:53.972 | DEBUG    | govtech_data.utils.content:fetch_url:11 - Fetching url - https://storage.data.gov.sg/resale-flat-prices/resources/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards-2023-04-17T03-28-08Z.csv
2023-04-18 01:25:56.195 | DEBUG    | govtech_data.utils.openai:query:56 - Request:
{'role': 'user', 'content': "Schema for resale-flat-prices: {'month': Utf8, 'town': Utf8, 'flat_type': Utf8, 'block': Utf8, 'street_name': Utf8, 'storey_range': Utf8, 'floor_area_sqm': Float64, 'flat_model': Utf8, 'lease_commence_date': Int64, 'remaining_lease': Utf8, 'resale_price': Float64}"}
2023-04-18 01:25:56.195 | WARNING  | govtech_data.utils.openai:num_tokens_from_messages:192 - Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
2023-04-18 01:25:56.197 | DEBUG    | govtech_data.utils.openai:simple_query_openai:233 - Total number of tokens in messages: 1495
2023-04-18 01:26:10.913 | DEBUG    | govtech_data.utils.openai:query:70 - ChatGPT content response:
{'thoughts': {'text': "The schema shows that the required field names are 'flat_type', 'town' and 'resale_price'. I can now use 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types available in the dataset and then filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.", 'reasoning': "I will extract the different flat types available in the dataset using 'get_all_distinct_values_in_a_dataset_field' command and then filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.", 'plan': ["Use 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types available in the dataset", 'Filter dataset using pandas to get average resale flat prices for each flat type in Bedok'], 'criticism': 'I need to ensure that I am filtering the data based on the correct town name, i.e. Bedok.', 'speak': "I will now use 'get_all_distinct_values_in_a_dataset_field' command to get the different flat types and then filter the dataset using pandas."}, 'command': {'name': 'get_all_distinct_values_in_a_dataset_field', 'args': {'id': 'resale-flat-prices', 'field': 'flat_type'}}}
2023-04-18 01:26:10.963 | DEBUG    | govtech_data.utils.openai:query:56 - Request:
{'role': 'user', 'content': 'All distinct values in flat_type: ["3 ROOM","4 ROOM","5 ROOM","2 ROOM","EXECUTIVE","1 ROOM","MULTI-GENERATION"]'}
2023-04-18 01:26:10.963 | WARNING  | govtech_data.utils.openai:num_tokens_from_messages:192 - Warning: gpt-3.5-turbo may change over time. Returning num tokens assuming gpt-3.5-turbo-0301.
2023-04-18 01:26:10.965 | DEBUG    | govtech_data.utils.openai:simple_query_openai:233 - Total number of tokens in messages: 1797
2023-04-18 01:26:26.786 | DEBUG    | govtech_data.utils.openai:query:70 - ChatGPT content response:
{'thoughts': {'text': 'Now that we have the different flat types available in the dataset, I will filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.', 'reasoning': 'I will first filter the dataset to only include transactions in Bedok using pandas. Then, I will group the data by flat type and calculate the average resale price for each flat type.', 'plan': ['Filter dataset to only include transactions in Bedok using pandas', 'Group the data by flat type and calculate the average resale price for each flat type'], 'criticism': 'I need to ensure that I am correctly calculating the average resale flat prices for each flat type.', 'speak': 'I will now filter the dataset using pandas to get the average resale flat prices for each flat type in Bedok.'}, 'command': {'name': 'evaluate_full_code', 'args': {'code': "import matplotlib.pyplot as plt\nfrom govtech_data import GovTechClient\nimport pandas as pd\n\ndataset_id = 'resale-flat-prices'\ndataset_df = GovTechClient.fetch_dataframe_from_package(dataset_id).to_pandas()\n\nbedok_df = dataset_df[dataset_df['town'] == 'BEDOK']\nflat_type_grouped = bedok_df.groupby(['flat_type'])\naverage_prices = flat_type_grouped['resale_price'].mean().reset_index()\n\nprint(average_prices)"}}}
Out[3]: True

In [4]: print(govtech_openai_client.get_generated_code_from_history())
import matplotlib.pyplot as plt
from govtech_data import GovTechClient
import pandas as pd

dataset_id = 'resale-flat-prices'
dataset_df = GovTechClient.fetch_dataframe_from_package(dataset_id).to_pandas()

bedok_df = dataset_df[dataset_df['town'] == 'BEDOK']
flat_type_grouped = bedok_df.groupby(['flat_type'])
average_prices = flat_type_grouped['resale_price'].mean().reset_index()

print(average_prices)

In [5]: exec(govtech_openai_client.get_generated_code_from_history())
   flat_type   resale_price
0     2 ROOM  234414.740385
1     3 ROOM  315348.718331
2     4 ROOM  465674.532707
3     5 ROOM  606324.855910
4  EXECUTIVE  759893.882775

Ask OpenAI to generate a trend graph for average resale flat prices

Test123

Credits

This library adopts some ideas from the Auto-GPT project to perform Chain-of-Thought reasoning.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

govtech_data-0.1.10.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

govtech_data-0.1.10-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file govtech_data-0.1.10.tar.gz.

File metadata

  • Download URL: govtech_data-0.1.10.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.9 Darwin/22.4.0

File hashes

Hashes for govtech_data-0.1.10.tar.gz
Algorithm Hash digest
SHA256 abdb2c257b218cbc7e7d5dd7dedbd657e0ac1685ae70c55d4281a82036bca83d
MD5 f4a1d895477d9fba1cd6f1f431c83787
BLAKE2b-256 cf8539e6edfbab29ae80fd947247c686317dc772ce4b61d1d169faca13c6d4f8

See more details on using hashes here.

File details

Details for the file govtech_data-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: govtech_data-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.10.9 Darwin/22.4.0

File hashes

Hashes for govtech_data-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 df16288f43c9059cc067a76c48555a1b0ec0a5dea6fad48b22b44fb745a8d22a
MD5 c5decfaca47cc7a666dd1e7116757ec7
BLAKE2b-256 a51b5668a2f19c8cff2c49cd778df1617087a8496c205c8c5421e4f7f5a65492

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page