Skip to main content

Canonical company-name normalisation, shared byte-for-byte across EasyBiz services (L-MDS-CR-08).

Project description

easybiz-companyname-normalisation

Canonical company-name normalisation, shared byte-for-byte by the MDS company-resolver and accounting-service (locked decision L-MDS-CR-08).

It is a library, not a service: normalise_name() is a pure, deterministic function on the resolution hot path. A service would add a network hop and an availability dependency for zero benefit.

from easybiz_companyname_normalisation import normalise_name, NORMALISER_VERSION

normalise_name("ACME S.à r.l.")   # -> "acme sarl"
NORMALISER_VERSION                  # -> "1.0.0"

Why this is a shared package and not copy-pasted logic

Both services store normalised values (EntitySynonym.normalised_value, embedding source text). Those stored values are a derived cache; the raw value is the source of truth. If the two services normalised differently, a name stored by one would silently fail to match a query from the other. So the logic lives in exactly one place and both import it.

Versioning policy (the important part)

Bump When Consequence
MAJOR normalise_name() output changes for any input (e.g. adding a legal form) Requires a coordinated re-normalise backfill of stored values from raw_value, and a re-embed, in MDS. Both services upgrade in lockstep.
MINOR / PATCH tests, perf, docs, adding fixture rows Must never change output.

NORMALISER_VERSION equals the package version. MDS persists it alongside each stored value (EntitySynonym.normaliser_version, mirroring EntityEmbedding.embedding_model) so stale rows are detectable; a renormalise_stale management command re-derives rows whose version != current, from raw_value. Skew only ever degrades a match to a miss (→ human review), never to a wrong match — the safe failure direction.

The parity contract

tests/test_normalisation.py runs against fixtures/lu_names.csv, which ships inside the package. Running this test in each consumer's CI proves that consumer uses the canonical behaviour for the version it pinned.

Never edit an existing expected value without a MAJOR bump — that is, by definition, an output change. Adding rows is fine (MINOR) and is the encouraged way to harden coverage as real LU supplier-name shapes surface. The 20 starter rows are illustrative, not a target; there is no requirement to reach any count.

Installation

Distribution is deliberately deferred while in solo development. The import path is identical across all options below — only the install source changes.

  • Now (solo inner loop): clone next to the service repos and install editable. Both services point at one working copy; edits are picked up instantly.
    pip install -e ../easybiz-companyname-normalisation
    
  • When CI or a teammate appears — switch to a pinned git tag:
    pip install git+https://github.com/<org>/easybiz-companyname-normalisation@v1.0.0
    
    Pin a tag, not a branch — a branch ref lets two installs drift, which breaks the version-sync contract.
  • Target later (optional): AWS CodeArtifact private PyPI. python -m build && twine upload --repository codeartifact dist/*; consume via aws codeartifact login --tool pip ... and pin ==1.0.0.

Scope

Company-name normalisation only. Identifier checksum validators (IBAN/VAT/RCS) stay accounting-side (the resolver trusts pre-validated input); they are not part of this package.

TODO (deferred, not yet built)

  • Cross-service CI parity guard — once both services have CI, add a check that fails the build if MDS and accounting pin different easybiz-companyname-normalisation versions. Not needed while both use one editable install.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easybiz_companyname_normalisation-1.0.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file easybiz_companyname_normalisation-1.0.0.tar.gz.

File metadata

File hashes

Hashes for easybiz_companyname_normalisation-1.0.0.tar.gz
Algorithm Hash digest
SHA256 7d3dab288f582a373452c76fb7d771e1dc5a093bb028ec23e423c5daec2fbafc
MD5 755e0e36e4cef6e2381666bcc526a9a6
BLAKE2b-256 e2bde0e8b27a26a68ac3cbf6d29ce484704664e1d3efaa70e478f51965597951

See more details on using hashes here.

File details

Details for the file easybiz_companyname_normalisation-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for easybiz_companyname_normalisation-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f499991eeaa3c62301db5dc386afaa3062058548ccb32028cd3dcb01e21bc19a
MD5 e30d187cc8ec616324d0b598aff8b9d0
BLAKE2b-256 7c97a533fa40c95ab7ff15baf3346d3950d0df0c7dee3439ea4299c6aded3784

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page