Skip to main content

Pydantic validators for mySociety democracy types

Project description

mysoc-validator

A set of pydantic-based validators and classes for common mySociety democracy formats.

Currently supports:

  • Popolo database
  • Transcript format (old-style XML and new json format)
  • Interests format

XML based formats are tested to round-trip with themselves, but not to be string identical with the original source.

Can be installed with pip install mysoc-validator

To use as a cli validator:

python -m mysoc_validator popolo validate path-to-people.json
python -m mysoc_validator transcript validate path-to-transcript.xml
python -m mysoc_validator transcript validate transcripts/
python -m mysoc_validator transcript validate path-to-*.xml --glob
python -m mysoc_validator interests validate path-to-interests.xml

To see all options use python -m mysoc_validator --help or python -m mysoc_validator popolo tui.

Or if using uvx (don't need to install first):

uvx mysoc-validator popolo validate path-to-people.json

To validate and consistently format:

uvx mysoc-validator format people.json

Modification functions

See python -m mysoc_validator popolo --help for functions to change parties/whip and add alt names.

Popolo

A pydantic based validator for main mySociety people.json file (which mostly follows the popolo standard with a few extra bits).

Validates:

  • Basic structure
  • Unique IDs and ID Patterns
  • Foreign key relationships between objects.

It also has support for looking up from name or identifying to person, and new ID generation for membership.

Using name or ID lookup

After first use, there is some caching behind the scenes to speed this up.

from mysoc_validator import Popolo
from mysoc_validator.models.popolo import Chamber, IdentifierScheme
from datetime import date

popolo = Popolo.from_parlparse()

keir_starmer_parl_id = popolo.persons.from_identifier(
    "4514", scheme=IdentifierScheme.MNIS
)
keir_starmer_name = popolo.persons.from_name(
    "keir starmer", chamber_id=Chamber.COMMONS, date=date.fromisoformat("2022-07-31")
)

keir_starmer_parl_id.id == keir_starmer_name.id

Transcripts

Python validator and handler for 'publicwhip' style transcript format.

from mysoc_validator import Transcript
from pathlib import Path

transcript_file = Path("data", "debates2023-03-28d.xml")

transcript = Transcript.from_xml_path(transcript_file)

Register of Interests

Python validator and handler for 'publicwhip' style interests format.

For new style generic json format.

from mysoc_validator import RegmemRegister
from pathlib import Path

register_file = Path("data", "commons-regmem-2025-01-20.json")
interests = RegmemRegister.from_path(register_file)
from mysoc_validator import XMLRegister
from pathlib import Path

register_file = Path("data", "regmem2024-05-28.xml")
interests = XMLRegister.from_xml_path(register_file)

Info fields

We have various XML files in parlparse that are loaded into TWFY as extra info for people or constituencies.

This library has two approaches for this - a general permissive model that can load any file, and tools to create models to add validation for particular files if needed.

Load any file

from mysoc_validator.models.info import InfoCollection, PersonInfo, ConsInfo

social_media_links = InfoCollection[PersonInfo].from_parlparse("social-media-commons")
constituency_links = InfoCollection[ConsInfo].from_parlparse("constituency-links")

And this is an example of creating a more bespoke model for a particular file. Subclassing PersonInfo switches the 'extras' setting from 'allow' to 'forbid'.

from typing import Optional

from mysoc_validator.models.info import InfoCollection, PersonInfo, ConsInfo

class SocialInfo(PersonInfo):
    facebook_page: Optional[str] = None
    twitter_username: Optional[str]= None
    bluesky_handle: Optional[str]= None

social_media_links = InfoCollection[SocialInfo].from_parlparse("social-media-commons")

If needing to pass dicts across the XML boundary (although this implies a change to how things are imported), do the following:

from mysoc_validator.models.info import InfoCollection, PersonInfo
from mysoc_validator.models.xml_base import XMLDict, AsAttrStr

class DemoDataModel(PersonInfo):
    regmem_info: XMLDict
    random_string: AsAttrStr


item = DemoDataModel(
    person_id="uk.org.publicwhip/person/10001",
    regmem_info={"hello": ["yes", "no"]},
    random_string="banana",
)

items = InfoCollection[DemoDataModel](items=[item])

xml_data = items.model_dump_xml()

# Which looks like
"""
<twfy>
  <personinfo id="uk.org.publicwhip/person/10001">
    <regmem_info>{"hello": ["yes", "no"]}</regmem_info>
    <random_string>banana</random_string>
  </personinfo>
</twfy>
"""

# which can either be round-triped in the same model - or read by the generic model like this

generic_read = (
    InfoCollection[PersonInfo].model_validate_xml(xml_data).promote_children()
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mysoc_validator-1.2.0.tar.gz (39.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mysoc_validator-1.2.0-py3-none-any.whl (45.1 kB view details)

Uploaded Python 3

File details

Details for the file mysoc_validator-1.2.0.tar.gz.

File metadata

  • Download URL: mysoc_validator-1.2.0.tar.gz
  • Upload date:
  • Size: 39.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mysoc_validator-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5e83cfd3ed336525e13bb40d00e9e615d73f0540351c0788a80ce5bc1d2b375d
MD5 08841720d7ff86ddee4b823c9a8499d6
BLAKE2b-256 b865bfae2617d116fdb575978bb83fdd216728600086727a84c7a2fa8d545057

See more details on using hashes here.

Provenance

The following attestation bundles were made for mysoc_validator-1.2.0.tar.gz:

Publisher: auto_publish.yml on mysociety/mysoc-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mysoc_validator-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mysoc_validator-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 536465440c000cd0da6cd076097cdd5e0eb694f75487074e11a8403d5a47b86c
MD5 4a14e0f026ac6a4e436f3fc7f86ee19d
BLAKE2b-256 26430d938ef1f2d4cc5b12a72a41d61477aa0da74b05876471a434f377cfd89f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mysoc_validator-1.2.0-py3-none-any.whl:

Publisher: auto_publish.yml on mysociety/mysoc-validator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page