Skip to main content

Toolkit to normalize text to UMLS / ontologies

Project description

ClickHouse backend

The DuckDB builder remains the source of truth. Build a DuckDB file with build_merged_duckdb, then upload its canonical tables into ClickHouse:

uv run python scripts/upload_clickhouse.py data/dbs_final/SmallMolecule.duckdb --database normalization

The upload shows a progress bar for each copied table; pass --no-progress to silence it.

Connection settings are read from .env with python-dotenv and use the official clickhouse-connect client. Set CH_HTTP, for example http://host:8123/normalization; CH_USER and CH_PASSWORD may be supplied separately and override URL credentials.

Use the ClickHouse backend from Python:

from norm_toolkit import ClickHouseNormalizer

normalizer = ClickHouseNormalizer(database="normalization")
result = normalizer.normalize(["aspirin"], top_k=5)

You can also pass a DSN in code:

normalizer = ClickHouseNormalizer(
    dsn="http://host:8123/normalization",
    database="normalization",
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

norm_toolkit-1.9.2.tar.gz (57.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

norm_toolkit-1.9.2-py3-none-any.whl (72.1 kB view details)

Uploaded Python 3

File details

Details for the file norm_toolkit-1.9.2.tar.gz.

File metadata

  • Download URL: norm_toolkit-1.9.2.tar.gz
  • Upload date:
  • Size: 57.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for norm_toolkit-1.9.2.tar.gz
Algorithm Hash digest
SHA256 690870acd9e567cad3cf93eaaff0b7269bf827edd60e763682d1460badb636ec
MD5 db7e592efe23ecdfd7ce64d90feac10d
BLAKE2b-256 6f955ec2109eaad408870eb32876e2ef20378fb5854ffa0445a1d0fa4e2e46ec

See more details on using hashes here.

File details

Details for the file norm_toolkit-1.9.2-py3-none-any.whl.

File metadata

  • Download URL: norm_toolkit-1.9.2-py3-none-any.whl
  • Upload date:
  • Size: 72.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for norm_toolkit-1.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 824558f3607c7179c0cffa6f67131a1dcaaf6e91dbc7b7c886aa65f650e42cc9
MD5 55bf2016541ee3154e16e671c46e2f94
BLAKE2b-256 25e9a3c8f220eaaec42329c37d485eb2e0bb2d84f768210b6866f771a2f54267

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page