Skip to main content

Oncology extraction schema for MESA

Project description

Oncoschema

Schema package for oncology extraction from cancer clinical documents.

Structure

📁 oncoschema
├── examples/            # Training examples showing document input and structured output
├── schema.py            # Pydantic model for specifying expected output structure
├── prompt_builder.py    # Prompt builder for data generation and inference
├── prompt_datagen.txt   # Prompt template with example (for training data generation)
├── prompt_main.txt      # Prompt template without example (for inference/deployment)
└── py.typed             # Type checking marker

Usage

from oncoschema.prompt_builder import PromptBuilder

# Initialize builder
builder = PromptBuilder()

# Build data generation prompt (with example)
datagen_prompt = builder.build_datagen_prompt()

# Build main/inference prompt (without example)
main_prompt = builder.build_main_prompt()

Schema

Schema overview

Type Values
TopographyType unknown_primary, other, haematological, lung, pleura, other_respiratory, oesophagus, stomach, small_intestine, colon, rectum, pancreas, liver, gallbladder, bile_duct, other_gi, kidney, bladder, prostate, testis, other_gu, breast, cervix, uterus, ovary, other_gynae, brain, spinal_cord, other_cns, oral, hypo_oro_naso_pharynx, larynx, salivary_gland, nasal_cavity, paranasal_sinus, thyroid, adrenal_gland, other_endocrine, skin, soft_tissue, bone
MorphologyType unknown_morphology, other, adenocarcinoma, squamous_cell_carcinoma, urothelial_carcinoma, renal_cell_carcinoma, hepatocellular_carcinoma, small_cell_carcinoma, non_small_cell_carcinoma, carcinoma_other, mesothelioma, melanoma, neuroendocrine, carcinoid, sarcoma_nos, gastrointestinal_stromal_tumour, osteosarcoma, chondrosarcoma, ewing_sarcoma, rhabdomyosarcoma, kaposi_sarcoma, soft_tissue_sarcoma, acute_lymphoblastic_leukaemia, acute_myeloid_leukaemia, chronic_lymphocytic_leukaemia, chronic_myeloid_leukaemia, hodgkin_lymphoma, non_hodgkin_lymphoma, multiple_myeloma, leukaemia_other, myelodysplastic, glioblastoma, astrocytoma, oligodendroglioma, meningioma, seminoma, teratoma, choriocarcinoma, wilms_tumour, retinoblastoma, hepatoblastoma
MSIStatus msi_high, ms_stable
TMBStatus tmb_high, tmb_low, tmb_intermediate
MolecularBiomarkerType other, braf, ntrk1, ntrk2, ntrk3, ret, erbb2_her2, tp53, brca1, brca2, mlh1, msh2, msh6, pms2, palb2, rad51, egfr, alk, ros1, met, kras, nras, pik3ca, esr1_er, pgr_pr, ki_67, kit, pdgfra, fgfr1, fgfr2, fgfr3, idh1, idh2, pdl1, nf1, nf2, mgmt, npm1, flt3, bcr_abl1, jak2, bcl2, myc
BiomarkerStatus altered, negative, equivocal, hypothetical
CancerScoreName pathological_grade, gleason, figo, dukes, breslow, clark, binet, child_pugh, other
SpreadType other, lymph_node, liver, lung, spine, other_bone, brain, other_cns, adrenal, kidney, pleura, peritoneum, skin, pancreas, spleen, ovary, testis, thyroid, stomach, bowel, bladder, prostate, breast, head_and_neck
TimelineEventType had_systemic_or_radiotherapy_treatment, had_surgical_treatment_performed, experienced_toxicity_or_complication_related_to_treatment, experienced_treatment_reduction_or_stop, considered_for_clinical_trial, enrolled_to_clinical_trial, positive_treatment_response_on_assessment, evidence_of_metastatic_progression, radiology_evidence_of_disease_progression, experienced_disease_remission, experienced_disease_recurrence, patient_died
PatientFindingType comorbidity_finding, social_or_family_finding, symptom_finding, physical_examination_finding, functional_finding, mental_state_finding
PatientFindingStatus is_present, is_not_present, uncertain
FuturePlanType planned_systemic_or_radiotherapy_treatment, planned_surgery_treatment, planned_investigation, planned_clinical_trial_involvement

License

This project uses a proprietary license issued by Guy's and St Thomas' NHS Foundation Trust, enabling free (non-commercial) use by NHS organisations. See LICENSE files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

londonaicentre_oncoschema-2.0.2.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

londonaicentre_oncoschema-2.0.2-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file londonaicentre_oncoschema-2.0.2.tar.gz.

File metadata

  • Download URL: londonaicentre_oncoschema-2.0.2.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_oncoschema-2.0.2.tar.gz
Algorithm Hash digest
SHA256 49f9f798b8a493c7feedb39c05a26f423fb9257a75b2e1b000988d0bf2ec7d8c
MD5 d8fc6961d9ce0b3659be3ddf4689b32b
BLAKE2b-256 47d3e69d0f53d7d281ad96be7f0959f92ac9d3d88cf159639c47990cf1a83449

See more details on using hashes here.

File details

Details for the file londonaicentre_oncoschema-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: londonaicentre_oncoschema-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_oncoschema-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b1d860bdcd15b86ccc55fbb2d912ad19d1aec891537838a0f34bbaf194b842ec
MD5 910d0f1feb2888f54a6743e8f316ffee
BLAKE2b-256 d3bc37ab8147a1e9c9b4a703db784e422e71d5819f5cb10934519158bd02ff95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page