A python program that can be run from the command line, and used to search climate policy documents.

These details have not been verified by PyPI

Project description

climate-search-cli

A python program that can be run from the command line, and used to search climate policy documents.

Task Overview

Created as an interview technical challenge. The task is to create a cli tool that can be used to search summaries of climate documents.

The cli needed to have the following functionality:

Load & validate documents into a database at the command line
Query the documents returning a sequence of document objects
Display the documents and some statistics about them as output
Order by relevance using a relevancy score

Evaluation Criteria

The following are the items that are being evaluated:

Readability
Maintainability
Functionality
Efficiency
Modularity
Commenting and documentation
Testing Strategy

Run

Run poetry to install dependencies, (see below for other ways of running):

poetry install
poetry shell

Data can be loaded via:

cs load

This will also output errors to the same directory, and load the data into a database. A custom file can also be loaded using the --localpath argument.

Data can then be queried by passing keywords with the retrieve command:

cs retrieve -k green -k energy

This will display the policies that match. Results can also be sorted with:

cs retrieve -k forests --sort

The test directory contains both unit and integration tests, these can be run with pytest:

pytest

Solution Overview

I decided to use click as the cli tool for this project. As well as sqlite as a backend, both of these are simple and portable, although if I could start again, I'd be keen to use a database that had support for arrays. Transformations and schema definitions where done in pandas for convenience, I originally started going down the path of having multiple tables in the database, but decided this was over optimising for what was needed with the given timeframe. Having just one table meant pandas was a straightoferward option for defining the table. The search relevency implementation is just a quick tfidf algorithm on the results.

Time taken

I worked intermittently on this over the course of a couple of days. I think the total time actively working on the solution was about 6 hours. (Not including time spent reading the brief, researching and planning). I could keep going, but I went over the suggested timeframe, so I'm leaving it here. Some key items I'd like to improve include error handling and the relevancy algorithm.

alternate ways of running

Docker

This can also be run via docker:

docker build -t climate-search-cli:latest .

docker run climate-search-cli:latest load
docker run climate-search-cli:latest retrieve -k cycling -k health --sort

pypi

Also available on pypi:

pip install climate-search-cli

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.4

Aug 13, 2023

0.1.3

Aug 12, 2023

0.1.2

Aug 12, 2023

0.1.1

Aug 12, 2023

0.1.0

Aug 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

climate_search_cli-0.1.4.tar.gz (624.0 kB view details)

Uploaded Aug 13, 2023 Source

Built Distribution

climate_search_cli-0.1.4-py3-none-any.whl (628.8 kB view details)

Uploaded Aug 13, 2023 Python 3

File details

Details for the file climate_search_cli-0.1.4.tar.gz.

File metadata

Download URL: climate_search_cli-0.1.4.tar.gz
Upload date: Aug 13, 2023
Size: 624.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for climate_search_cli-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`1aee3405e260c7553cc829e821105aa4ac02ca1cc8633df5f8d360967c58c78a`
MD5	`fdb541e68bc4ec6addef5f608ae5983c`
BLAKE2b-256	`ac225801f0ba4ccf61d597de079932f2cfaca48b0eb8f66ab7dacce5db3ab6d2`

See more details on using hashes here.

File details

Details for the file climate_search_cli-0.1.4-py3-none-any.whl.

File metadata

Download URL: climate_search_cli-0.1.4-py3-none-any.whl
Upload date: Aug 13, 2023
Size: 628.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for climate_search_cli-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`95002eee0a3c2d06f2759b4feced2cb2f57e61bd9855c5bb22d1ec4ae2091a3d`
MD5	`292e2bae38e842ddef685beedc73f986`
BLAKE2b-256	`3922f65f0a9b030a668bd8b0a479d079fa0edce6de676b4224eef084456ac013`

See more details on using hashes here.

climate-search-cli 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

climate-search-cli

Task Overview

Evaluation Criteria

Run

Solution Overview

Time taken

alternate ways of running

Docker

pypi

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes