Skip to main content

Convert forms to SDC4-compliant templates using Gemini AI

Project description

Form2SDCTemplate

Version License SDC Open In Colab

Convert PDF, DOCX, and image forms into SDC4-compliant templates — powered by Gemini AI.


Overview

Form2SDCTemplate provides two ways to generate SDC4 templates:

  1. Google Colab Notebook (new) — Upload a form, get a validated template automatically
  2. Manual LLM Usage — Upload Form2SDCTemplate.md to any LLM as instructions

Both approaches produce standards-compliant SDC4 templates ready for SDCStudio upload.

Quick Start with Google Colab

The fastest way to convert a form to an SDC4 template:

  1. Open the Form2SDCTemplate Colab notebook
  2. Enter your Google AI API key
  3. Upload your form (PDF, DOCX, PNG, JPG)
  4. Download the generated SDC4 markdown template
  5. Upload to SDCStudio for processing

Quick Start with Python

pip install "form2sdc[gemini]"
from form2sdc.analyzer import GeminiAnalyzer
from form2sdc.core import FormToTemplatePipeline
from pathlib import Path

analyzer = GeminiAnalyzer(api_key="YOUR_KEY")
pipeline = FormToTemplatePipeline(analyzer)
result = pipeline.process(Path("your_form.pdf"))

print(result.template)       # SDC4 markdown
print(result.validation.valid)  # True if valid

Validate an existing template

from form2sdc.validator import Form2SDCValidator

validator = Form2SDCValidator()
result = validator.validate(open("template.md").read())

if result.valid:
    print("Template is valid!")
else:
    for error in result.errors:
        print(f"[{error.code}] {error.message}")

Manual LLM Usage

Option 1: Direct Download (Recommended)

  1. Click on Form2SDCTemplate.md in this repository
  2. Look for the Download button (down arrow ⬇) in the upper right of the file view
  3. Save the file to your computer
  4. Upload Form2SDCTemplate.md to your preferred LLM (Claude, ChatGPT, etc.)
  5. Provide your form description, PDF, or requirements to the LLM
  6. Review the generated template and upload it to SDCStudio for processing

Option 2: Clone Repository (For Contributors)

  1. Clone this repository:

    git clone https://github.com/SemanticDataCharter/Form2SDCTemplate.git
    cd Form2SDCTemplate
    
  2. Upload Form2SDCTemplate.md to your preferred LLM (e.g., Claude, ChatGPT, etc.)

  3. Provide your form description, PDF, or requirements to the LLM

  4. The LLM will generate a properly formatted SDCStudio template

  5. Review the generated template and upload it to SDCStudio for processing

Features

LLM-Optimized Instructions

  • Comprehensive step-by-step guide for AI assistants
  • Complete keyword glossary with usage examples
  • Clear structure and formatting requirements
  • Multi-language support (keywords in English, content in source language)

SDC4 Compliance

  • Generates templates conforming to SDC 4.0 specifications
  • Supports all SDC4 data types (XdString, XdCount, XdQuantity, etc.)
  • User-friendly type system (text, integer, decimal, date, etc.)
  • Intelligent type mapping based on context clues

Complete Template Generation

  • YAML front matter with metadata
  • Dataset overview and business context
  • Root and sub-cluster organization
  • Column definitions with constraints and enumerations
  • Component reuse support (NIEM, FHIR, HL7v3)
  • Example templates in English, French, and Brazilian Portuguese

Rapid Development

  • Eliminates manual template creation
  • Reduces development time from hours to minutes
  • Enables iterative refinement through conversational AI
  • Supports forms in any language

Use Cases

This tool is designed for:

  • Healthcare Organizations developing clinical data collection forms
  • Research Institutions creating standardized research data templates
  • Data Architects prototyping SDC4 template structures
  • Developers integrating SDC4 into existing systems
  • Data Governance Teams standardizing data collection processes

Documentation

How It Works

  1. Upload Instructions: Upload Form2SDCTemplate.md to an LLM (Claude, ChatGPT, etc.)
  2. Provide Form: Share your form description, PDF, or requirements
  3. LLM Generates: The LLM creates a properly formatted template following SDC4 specifications
  4. Review & Upload: Review the generated template and upload to SDCStudio
  5. Automatic Processing: SDCStudio's MD2PD system parses and validates the template

Usage Examples

Below are example prompts showing how to request template generation from an LLM. Upload Form2SDCTemplate.md first, then use one of these prompts along with your form/PDF.

English (en)

Please use the instructions in Form2SDCTemplate.md along with the attached PDF form
to create an SDCStudio template in markdown format.

Key requirements:
- Use English keywords (Type, Description, Enumeration, etc.)
- Keep all field names, descriptions, and values in the same language as the form
- Include all fields from the PDF with appropriate data types
- Add constraints for required fields and validation rules
- Use enumerations for dropdown lists and radio buttons
- Provide realistic examples for each field

French (fr)

Veuillez utiliser les instructions dans Form2SDCTemplate.md avec le formulaire PDF
ci-joint pour créer un template SDCStudio au format markdown.

Exigences clés :
- Utiliser les mots-clés en anglais (Type, Description, Enumeration, etc.)
- Conserver tous les noms de champs, descriptions et valeurs dans la langue du formulaire
- Inclure tous les champs du PDF avec les types de données appropriés
- Ajouter des contraintes pour les champs obligatoires et les règles de validation
- Utiliser des énumérations pour les listes déroulantes et boutons radio
- Fournir des exemples réalistes pour chaque champ

Brazilian Portuguese (pt-BR)

Por favor, use as instruções no Form2SDCTemplate.md junto com o formulário PDF
anexado para criar um template SDCStudio em formato markdown.

Requisitos principais:
- Usar palavras-chave em inglês (Type, Description, Enumeration, etc.)
- Manter todos os nomes de campos, descrições e valores no idioma do formulário
- Incluir todos os campos do PDF com os tipos de dados apropriados
- Adicionar restrições para campos obrigatórios e regras de validação
- Usar enumerações para listas suspensas e botões de opção
- Fornecer exemplos realistas para cada campo

Spanish (es)

Por favor, utiliza las instrucciones en Form2SDCTemplate.md junto con el formulario PDF
adjunto para crear una plantilla SDCStudio en formato markdown.

Requisitos clave:
- Usar palabras clave en inglés (Type, Description, Enumeration, etc.)
- Mantener todos los nombres de campos, descripciones y valores en el idioma del formulario
- Incluir todos los campos del PDF con los tipos de datos apropiados
- Agregar restricciones para campos obligatorios y reglas de validación
- Usar enumeraciones para listas desplegables y botones de opción
- Proporcionar ejemplos realistas para cada campo

Advanced Usage Examples

With specific domain context:

I'm uploading a healthcare patient intake form (PDF attached). Please use
Form2SDCTemplate.md to create a template.

Additional context:
- Domain: Healthcare
- This form will be used in a clinical setting
- Fields like patient_id, date_of_birth, and medical_record_number should use
  identifier type
- Include HIPAA-relevant field classifications where applicable
- Enable LLM enrichment (set enable_llm: true)

Multiple forms/sections:

I have three related forms (PDFs attached):
1. Patient Demographics
2. Medical History
3. Insurance Information

Please use Form2SDCTemplate.md to create a single template with three sub-clusters,
one for each form. Use appropriate data types and maintain the relationships between
sections.

Form in specific language:

Attached is a Brazilian government form (Cadastro de Contribuinte) in Portuguese.
Please use Form2SDCTemplate.md to generate the template.

Important:
- Keep all keywords in English (Type, Description, etc.)
- Keep all content in Portuguese (field names, descriptions, examples)
- Include Brazilian-specific fields (CPF, CNPJ, CEP, UF)
- Use proper Brazilian address format
- Include all 27 Brazilian states in UF enumeration

Related Projects

Part of the Semantic Data Charter ecosystem:

System Requirements

For Colab/Python usage:

For manual LLM usage:

  • LLM with markdown file upload capability (Claude, ChatGPT, etc.)
  • Basic understanding of form structure and data collection

Optional: SDCStudio for template testing and refinement

Standards Compliance

Form2SDCTemplate supports generation of templates compliant with:

  • W3C XML Schema (XSD)
  • W3C RDF/OWL for semantic modeling
  • ISO 11179 metadata standards
  • ISO 20022 data component specifications
  • HL7 standards for healthcare data

Version Information

Current Version: 4.2.8

The major version (4.x.x) aligns with SDC Generation 4, ensuring compatibility across the SDC4 ecosystem. See CHANGELOG.md for detailed version history.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for:

  • How to submit issues and feature requests
  • Guidelines for pull requests
  • Development workflow and testing procedures
  • Community standards and code of conduct

Security

For security concerns or vulnerability reports, please refer to our SECURITY.md policy or contact security@axius-sdc.com.

License

Copyright 2025 Axius-SDC, Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Acknowledgments

This project builds upon:

  • The Semantic Data Charter (SDC) framework
  • International standards from W3C, ISO, and HL7
  • Open source contributions from the data modeling community
  • Academic research in semantic data representation (12+ peer-reviewed papers, 165+ citations)

Support


Semantic Data Charter™ and SDC™ are trademarks of Axius-SDC, Inc.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

form2sdc-4.2.8.tar.gz (55.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

form2sdc-4.2.8-py3-none-any.whl (47.9 kB view details)

Uploaded Python 3

File details

Details for the file form2sdc-4.2.8.tar.gz.

File metadata

  • Download URL: form2sdc-4.2.8.tar.gz
  • Upload date:
  • Size: 55.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for form2sdc-4.2.8.tar.gz
Algorithm Hash digest
SHA256 b42b96dcca284d0c8c2b6f98036279a5c72f87ab8cde99996000fd905716e2d8
MD5 dade7542a4b674b43a400191d4af421f
BLAKE2b-256 f4a7b328318ba318eba8384fd9ec84f5db2d28638f9067d22e26d507fb5bf74e

See more details on using hashes here.

File details

Details for the file form2sdc-4.2.8-py3-none-any.whl.

File metadata

  • Download URL: form2sdc-4.2.8-py3-none-any.whl
  • Upload date:
  • Size: 47.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for form2sdc-4.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9b9a9d0d83423878f554e031b49238a7cd7c4fc8249cb739d75a3094dccbf397
MD5 024b3697d20e8a2f89f4afba02f8da2a
BLAKE2b-256 0353eb8b19876eb6e0a0b3696932214dddbb7cfe3123350158f97374a72f1637

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page