Skip to main content

This project attempts to map the organization of the US Federal Government by gathering and consolidating information from various directories.

Project description

Overview

This project attempts to map the organization of the US Federal Government by gathering and consolidating information from various directories.

PyPI License PyPI Version PyPI Downloads

Current sources:

Each source is scraped (see out directory) in raw JSON format, including fields for the organizational unit name/parent (if any), unique ID/parent-ID fields (if the names are not unique) as well as any other attribute data for that organization available from that source.

A normalized name (still WIP) is then added, which corrects letter case, spacing and expands acronyms. Acronyms are selected and verified manually using data from USCD GovSpeak and the DOD Dictionary of Military and Associated Terms as well as manual entry when needed.

Each source is them imported into a tree and exported into the following formats for easy consumption:

  • Plain text tree
  • JSON flat format (with path to each element)
  • JSON nested tree format
  • CSV format (with embedded JSON attributes)
  • Wide CSV format (with flattened attributes)
  • DOT file (does not include attributes)
  • GEXF graph file (includes flattened attributes)
  • GraphQL graph file (includes flattened attributes)
  • Cytoscape.js JSON format (includes flattened attributes)

To merge the lists, each tree is merged into a selected base tree by comparing the normalized names of each node in the tree to the names of each node in the base tree using a fuzzy matching algorithm. Similarity scores between each pair of parents are incorporated into the score to more correctly identify cases where the same/similar office or program name is used for different organizations.

Note that the fuzzy matching is imperfect and may have some inaccurate mappings (although most appear OK) and will certainly have some entries which actually should be merged, but aren't.

The final merged dataset is written in the above formats to the data/merged directory.

Setup

Requirements

Installation

Check out this repository, then from the repository root, install dependencies:

$ poetry install

See command line usage:

poetry run allusgov --help

Run a complete scrape and merge:

poetry run allusgov

Project details


Release history Release notifications | RSS feed

This version

0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allusgov-0.0.tar.gz (424.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

allusgov-0.0-py3-none-any.whl (432.3 kB view details)

Uploaded Python 3

File details

Details for the file allusgov-0.0.tar.gz.

File metadata

  • Download URL: allusgov-0.0.tar.gz
  • Upload date:
  • Size: 424.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for allusgov-0.0.tar.gz
Algorithm Hash digest
SHA256 61b30f67e940c86245e9e0cc283f1b8fdc3fd862e857928a9ee26c37173802af
MD5 6b1798d3c4682568f4a9439a704ed1cc
BLAKE2b-256 b53c49a19363563c76022891ef0bf89ce246dd4189e316d300b6049bdcfb2e36

See more details on using hashes here.

File details

Details for the file allusgov-0.0-py3-none-any.whl.

File metadata

  • Download URL: allusgov-0.0-py3-none-any.whl
  • Upload date:
  • Size: 432.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.1 CPython/3.11.3

File hashes

Hashes for allusgov-0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b256e2ca95f5165f0bc3a910b61a696db5b99bb3458e4f605053874671184f50
MD5 f716eb349f509dd34f682b734b0401a0
BLAKE2b-256 0dd10ac8bd591070d67e81c3365d8a98ffc4e981f7fe21aa2c1ddc0a2159b4a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page