Skip to main content

CLI tool for making good data

Project description

ugrc-sweeper PyPI versionPush Events

The data cleaning service.

sweeper_sm

Available Sweepers

Addresses

Checks that addresses have minimum required parts and optionally normalizes them.

Duplicates

Checks for duplicate features.

Empties

Checks for empty geometries.

Metadata

Checks to make sure that the metadata meets the Basic SGID Metadata Requirements.

Tags

Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).

This check also verifies that the data set contains a tag that matches the database name (e.g. SGID) and the schema (e.g. Cadastre).

--try-fix adds missing required tags and title-cases any existing tags.

Summary

Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.

Description

Checks to make sure that the description contains a link to a data page on gis.utah.gov.

Use Limitations

Checks to make sure that the text in this section matches the official text for UGRC.

--try-fix updates the text to match the official text.

Parsing Addresses

This project contains a module that can be used as a standalone address parser, sweeper.address_parser. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.

Usage Example

from sweeper.address_parser import Address

address = Address('123 South Main Street')
print(address)

'''
--> Parsed Address:
{'address_number': '123',
 'normalized': '123 S MAIN ST',
 'prefix_direction': 'S',
 'street_name': 'MAIN',
 'street_type': 'ST'}
'''

Available Address class properties

All properties default to None if there is no parsed value.

address_number

address_number_suffix

prefix_direction

street_name

street_direction

street_type

unit_type

unit_id If no unit_type is found, this property is prefixed with # (e.g. # 3). If unit_type is found, # is stripped from this property.

city

zip_code

po_box The PO Box if a po-box-type address was entered (e.g. po_box would be 1 for p.o. box 1).

normalized A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format PO BOX <number>.

Installation (requires Pro 2.7+)

  1. clone arcgis conda environment
    • conda create --name sweeper --clone arcgispro-py3
  2. activate environment
    • activate sweeper
  3. install sweeper
    • pip install ugrc-sweeper
  4. Optionally duplicate config.sample.json as config.json in the folder where you will run sweeper.

[!CAUTION] This is required for the following functions:

  • --scheduled argument (required for sending emails)
  • --change-detect argument
  • using user-specific connection files via the CONNECTIONS_FOLDER config value

Exclusions

Tables can be skipped by adding values to the EXCLUSIONS.<sweeper_key> config array. These values are matched against table names using fnmatch. Note that these do not apply when using the --table-name argument.

Development

  1. clone arcgis conda environment
    • conda create --name sweeper --clone arcgispro-py3
  2. activate environment
    • activate sweeper
  3. install required dependencies to work on sweeper
    • pip install -e ".[tests]"
  4. test_metadata.py uses a SQL database that needs to be restored via src/sweeper/tests/data/Sweeper.bak to your local SQL Server.
  5. run sweeper: sweeper
  6. test: pytest
  7. lint: ruff check .
  8. format: ruff format .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ugrc_sweeper-2.0.8.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ugrc_sweeper-2.0.8-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file ugrc_sweeper-2.0.8.tar.gz.

File metadata

  • Download URL: ugrc_sweeper-2.0.8.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ugrc_sweeper-2.0.8.tar.gz
Algorithm Hash digest
SHA256 9a318802609ba2717ec636e82252d292a62578ee385421e53e81e4e075e20226
MD5 fc964dbc7400b68a0d53c03fd15ea84e
BLAKE2b-256 77caa9a9b9bf81f569d5a478cd3cd1f3550b4ce2207d3c8887f7e87e5e811eaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for ugrc_sweeper-2.0.8.tar.gz:

Publisher: release.yml on agrc/sweeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ugrc_sweeper-2.0.8-py3-none-any.whl.

File metadata

  • Download URL: ugrc_sweeper-2.0.8-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ugrc_sweeper-2.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ab147be9d0ce7d8c15c9ca4ad297e69ae40e00afd5e30f196410a8a7e17dc3c3
MD5 622c3398511c914b0b0004193dbd5905
BLAKE2b-256 49af880cb28aa03ed4dd47d12d4e48fc8f56e956d3a678b5d746a994bde25654

See more details on using hashes here.

Provenance

The following attestation bundles were made for ugrc_sweeper-2.0.8-py3-none-any.whl:

Publisher: release.yml on agrc/sweeper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page