CLI tool for making good data
Project description
agrc-sweeper
The data cleaning service.
Available Sweepers
Addresses
Checks that addresses have minimum required parts and optionally normalizes them.
Duplicates
Checks for duplicate features.
Empties
Checks for empty geometries.
Metadata
Checks to make sure that the metadata meets the Basic SGID Metadata Requirements.
Tags
Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).
This check also verifies that the data set contains a tag that matches the database name (e.g. SGID
) and the schema (e.g. Cadastre
).
--try-fix
adds missing required tags and title-cases any existing tags.
Summary
Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.
Description
Checks to make sure that the description contains a link to a data page on gis.utah.gov.
Use Limitations
Checks to make sure that the text in this section matches the official text for UGRC.
--try-fix
updates the text to match the official text.
Parsing Addresses
This project contains a module that can be used as a standalone address parser, sweeper.address_parser
. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.
Usage Example
from sweeper.address_parser import Address
address = Address('123 South Main Street')
print(address)
'''
--> Parsed Address:
{'address_number': '123',
'normalized': '123 S MAIN ST',
'prefix_direction': 'S',
'street_name': 'MAIN',
'street_type': 'ST'}
'''
Available Address class properties
All properties default to None if there is no parsed value.
address_number
address_number_suffix
prefix_direction
street_name
street_direction
street_type
unit_type
unit_id
If no unit_type
is found, this property is prefixed with #
(e.g. # 3
). If unit_type
is found, #
is stripped from this property.
city
zip_code
po_box
The PO Box if a po-box-type address was entered (e.g. po_box
would be 1
for p.o. box 1
).
normalized
A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format PO BOX <number>
.
Installation (requires Pro 2.7+)
- clone arcgis conda environment
conda create -name sweeper --clone arcgispro-py3
- activate environment
activate sweeper
- install sweeper
pip install agrc-sweeper
Development
- clone arcgis conda environment
conda create -name sweeper --clone arcgispro-py3
- activate environment
activate sweeper
test_metadata.py
uses a SQL database that needs to be restored viasrc/sweeper/tests/data/Sweeper.bak
to your local SQL Server.
Installing dependencies
- clone arcgis conda environment
conda create -name sweeper --clone arcgispro-py3
- install only required dependencies to run sweeper
pip install -e .
- install required dependencies to work on sweeper
pip install -e ".[develop]"
- install required dependencies to run sweeper tests
pip install -e ".[tests]"
- run tests:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file agrc-sweeper-1.4.3.tar.gz
.
File metadata
- Download URL: agrc-sweeper-1.4.3.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a4b2ed4594e85a54b572d6f777668ec1d587ec3eea056aeb9cd43bb6e7b8411 |
|
MD5 | 480d594cc1d61335fb6f6fd02c3ce93b |
|
BLAKE2b-256 | 5fae8bef5cde01d589ab89a66b86e411c611b59a885e726060b907392b091074 |
File details
Details for the file agrc_sweeper-1.4.3-py3-none-any.whl
.
File metadata
- Download URL: agrc_sweeper-1.4.3-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad8ca8d7acdbb0a89af406a4df0a3d4c7e4435bdccf1c1303c986b13025a6c88 |
|
MD5 | 11f9e060eae119018d95a18fe4a7b49d |
|
BLAKE2b-256 | 356bc9d6272f71dd2593e422ff02cb9dab4ffed15e6600df6a9c42b0226a7870 |