Skip to main content

CLI tool for making good data

Project description

agrc-sweeper PyPI versionPush Events

The data cleaning service.

sweeper_sm

Available Sweepers

Addresses

Checks that addresses have minimum required parts and optionally normalizes them.

Duplicates

Checks for duplicate features.

Empties

Checks for empty geometries.

Metadata

Checks to make sure that the metadata meets the Basic SGID Metadata Requirements.

Tags

Checks to make sure that existing tags are cased appropriately. This mean that the are title-cased other than known abbreviations (e.g. UGRC, BLM) and articles (e.g. a, the, of).

This check also verifies that the data set contains a tag that matches the database name (e.g. SGID) and the schema (e.g. Cadastre).

--try-fix adds missing required tags and title-cases any existing tags.

Summary

Checks to make sure that the summary is less than 2048 characters (a limitation of AGOL) and that it is shorter than the description.

Description

Checks to make sure that the description contains a link to a data page on gis.utah.gov.

Use Limitations

Checks to make sure that the text in this section matches the official text for UGRC.

--try-fix updates the text to match the official text.

Parsing Addresses

This project contains a module that can be used as a standalone address parser, sweeper.address_parser. This allows developer to take advantage of sweepers advanced address parsing and normalization without having to run the entire sweeper process.

Usage Example

from sweeper.address_parser import Address

address = Address('123 South Main Street')
print(address)

'''
--> Parsed Address:
{'address_number': '123',
 'normalized': '123 S MAIN ST',
 'prefix_direction': 'S',
 'street_name': 'MAIN',
 'street_type': 'ST'}
'''

Available Address class properties

All properties default to None if there is no parsed value.

address_number

address_number_suffix

prefix_direction

street_name

street_direction

street_type

unit_type

unit_id If no unit_type is found, this property is prefixed with # (e.g. # 3). If unit_type is found, # is stripped from this property.

city

zip_code

po_box The PO Box if a po-box-type address was entered (e.g. po_box would be 1 for p.o. box 1).

normalized A normalized string representing the entire address that was passed into the constructor. PO Boxes are normalized in this format PO BOX <number>.

Installation (requires Pro 2.7+)

  1. clone arcgis conda environment
    • conda create -name sweeper --clone arcgispro-py3
  2. activate environment
    • activate sweeper
  3. install sweeper
    • pip install agrc-sweeper

Development

  1. clone arcgis conda environment
    • conda create -name sweeper --clone arcgispro-py3
  2. activate environment
    • activate sweeper
  3. test_metadata.py uses a SQL database that needs to be restored via src/sweeper/tests/data/Sweeper.bak to your local SQL Server.

Installing dependencies

  1. clone arcgis conda environment
    • conda create -name sweeper --clone arcgispro-py3
  2. install only required dependencies to run sweeper
    • pip install -e .
  3. install required dependencies to work on sweeper
    • pip install -e ".[develop]"
  4. install required dependencies to run sweeper tests
    • pip install -e ".[tests]"
  5. run tests: pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agrc-sweeper-1.4.3.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

agrc_sweeper-1.4.3-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file agrc-sweeper-1.4.3.tar.gz.

File metadata

  • Download URL: agrc-sweeper-1.4.3.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for agrc-sweeper-1.4.3.tar.gz
Algorithm Hash digest
SHA256 4a4b2ed4594e85a54b572d6f777668ec1d587ec3eea056aeb9cd43bb6e7b8411
MD5 480d594cc1d61335fb6f6fd02c3ce93b
BLAKE2b-256 5fae8bef5cde01d589ab89a66b86e411c611b59a885e726060b907392b091074

See more details on using hashes here.

File details

Details for the file agrc_sweeper-1.4.3-py3-none-any.whl.

File metadata

  • Download URL: agrc_sweeper-1.4.3-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for agrc_sweeper-1.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ad8ca8d7acdbb0a89af406a4df0a3d4c7e4435bdccf1c1303c986b13025a6c88
MD5 11f9e060eae119018d95a18fe4a7b49d
BLAKE2b-256 356bc9d6272f71dd2593e422ff02cb9dab4ffed15e6600df6a9c42b0226a7870

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page