Skip to main content

Oncology extraction schema for MESA

Project description

Oncoschema

Schema package for oncology extraction from cancer clinical documents.

Structure

📁 oncoschema
├── examples/            # Training examples showing document input and structured output
├── schema.py            # Pydantic model for specifying expected output structure
├── prompt_builder.py    # Prompt builder for data generation and inference
├── prompt_datagen.txt   # Prompt template with example (for training data generation)
├── prompt_main.txt      # Prompt template without example (for inference/deployment)
└── py.typed             # Type checking marker

Usage

from oncoschema.prompt_builder import PromptBuilder

# Initialize builder
builder = PromptBuilder()

# Build data generation prompt (with example)
datagen_prompt = builder.build_datagen_prompt()

# Build main/inference prompt (without example)
main_prompt = builder.build_main_prompt()

Schema

Schema overview

Type Values
TopographyType unknown_primary, other, haematological, lung, pleura, other_respiratory, oesophagus, stomach, small_intestine, colon, rectum, pancreas, liver, gallbladder, bile_duct, other_gi, kidney, bladder, prostate, testis, other_gu, breast, cervix, uterus, ovary, other_gynae, brain, spinal_cord, other_cns, oral, hypo_oro_naso_pharynx, larynx, salivary_gland, nasal_cavity, paranasal_sinus, thyroid, adrenal_gland, other_endocrine, skin, soft_tissue, bone
MorphologyType unknown_morphology, other, adenocarcinoma, squamous_cell_carcinoma, urothelial_carcinoma, renal_cell_carcinoma, hepatocellular_carcinoma, small_cell_carcinoma, non_small_cell_carcinoma, carcinoma_other, mesothelioma, melanoma, neuroendocrine, carcinoid, sarcoma_nos, gastrointestinal_stromal_tumour, osteosarcoma, chondrosarcoma, ewing_sarcoma, rhabdomyosarcoma, kaposi_sarcoma, soft_tissue_sarcoma, acute_lymphoblastic_leukaemia, acute_myeloid_leukaemia, chronic_lymphocytic_leukaemia, chronic_myeloid_leukaemia, hodgkin_lymphoma, non_hodgkin_lymphoma, multiple_myeloma, leukaemia_other, myelodysplastic, glioblastoma, astrocytoma, oligodendroglioma, meningioma, seminoma, teratoma, choriocarcinoma, wilms_tumour, retinoblastoma, hepatoblastoma
MSIStatus msi_high, ms_stable
TMBStatus tmb_high, tmb_low, tmb_intermediate
MolecularBiomarkerType other, braf, ntrk1, ntrk2, ntrk3, ret, erbb2_her2, tp53, brca1, brca2, mlh1, msh2, msh6, pms2, palb2, rad51, egfr, alk, ros1, met, kras, nras, pik3ca, esr1_er, pgr_pr, ki_67, kit, pdgfra, fgfr1, fgfr2, fgfr3, idh1, idh2, pdl1, nf1, nf2, mgmt, npm1, flt3, bcr_abl1, jak2, bcl2, myc
BiomarkerStatus altered, negative, equivocal, hypothetical
CancerScoreName pathological_grade, gleason, figo, dukes, breslow, clark, binet, child_pugh, other
SpreadType other, lymph_node, liver, lung, spine, other_bone, brain, other_cns, adrenal, kidney, pleura, peritoneum, skin, pancreas, spleen, ovary, testis, thyroid, stomach, bowel, bladder, prostate, breast, head_and_neck
TimelineEventType had_systemic_or_radiotherapy_treatment, had_surgical_treatment_performed, experienced_toxicity_or_complication_related_to_treatment, experienced_treatment_reduction_or_stop, considered_for_clinical_trial, enrolled_to_clinical_trial, positive_treatment_response_on_assessment, evidence_of_metastatic_progression, radiology_evidence_of_disease_progression, experienced_disease_remission, experienced_disease_recurrence, patient_died
PatientFindingType comorbidity_finding, social_or_family_finding, symptom_finding, physical_examination_finding, functional_finding, mental_state_finding
PatientFindingStatus is_present, is_not_present, uncertain
FuturePlanType planned_systemic_or_radiotherapy_treatment, planned_surgery_treatment, planned_investigation, planned_clinical_trial_involvement

License

This project uses a proprietary license issued by Guy's and St Thomas' NHS Foundation Trust, enabling free (non-commercial) use by NHS organisations. See LICENSE files for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

londonaicentre_oncoschema-2.0.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

londonaicentre_oncoschema-2.0.1-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file londonaicentre_oncoschema-2.0.1.tar.gz.

File metadata

  • Download URL: londonaicentre_oncoschema-2.0.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_oncoschema-2.0.1.tar.gz
Algorithm Hash digest
SHA256 3a549ef5cf740f4f36e2152e2306a69bafd4c5c154f938e89ea27fb71dc7f4e0
MD5 a6ee36f06cf1715f772b42ce1b803d0b
BLAKE2b-256 fe6ed0696717d5a1b245033778a84e5e45049f3ddf1cd6b84a8b9fa252f90547

See more details on using hashes here.

File details

Details for the file londonaicentre_oncoschema-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: londonaicentre_oncoschema-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Amazon Linux","version":"2023","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for londonaicentre_oncoschema-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c4649fd4c2813362e31629b76015ce28471534c497fe2f75f15098215eb799b5
MD5 fbf99de592d1f043bb3d88c49d98807b
BLAKE2b-256 6ce49f76022714265c6cac7d0cc5a1f1623b3b7a6bf3f2ea57a005e3e15c7bdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page