Convert forms to SDC4-compliant templates using Gemini AI
Project description
Form2SDCTemplate
Convert PDF, DOCX, and image forms into SDC4-compliant templates — powered by Gemini AI.
Overview
Form2SDCTemplate provides two ways to generate SDC4 templates:
- Google Colab Notebook (new) — Upload a form, get a validated template automatically
- Manual LLM Usage — Upload
Form2SDCTemplate.mdto any LLM as instructions
Both approaches produce standards-compliant SDC4 templates ready for SDCStudio upload.
Quick Start with Google Colab
The fastest way to convert a form to an SDC4 template:
- Open the Form2SDCTemplate Colab notebook
- Enter your Google AI API key
- Upload your form (PDF, DOCX, PNG, JPG)
- Download the generated SDC4 markdown template
- Upload to SDCStudio for processing
Quick Start with Python
pip install "form2sdc[gemini]"
from form2sdc.analyzer import GeminiAnalyzer
from form2sdc.core import FormToTemplatePipeline
from pathlib import Path
analyzer = GeminiAnalyzer(api_key="YOUR_KEY")
pipeline = FormToTemplatePipeline(analyzer)
result = pipeline.process(Path("your_form.pdf"))
print(result.template) # SDC4 markdown
print(result.validation.valid) # True if valid
Validate an existing template
from form2sdc.validator import Form2SDCValidator
validator = Form2SDCValidator()
result = validator.validate(open("template.md").read())
if result.valid:
print("Template is valid!")
else:
for error in result.errors:
print(f"[{error.code}] {error.message}")
Manual LLM Usage
Option 1: Direct Download (Recommended)
- Click on Form2SDCTemplate.md in this repository
- Look for the Download button (down arrow ⬇) in the upper right of the file view
- Save the file to your computer
- Upload
Form2SDCTemplate.mdto your preferred LLM (Claude, ChatGPT, etc.) - Provide your form description, PDF, or requirements to the LLM
- Review the generated template and upload it to SDCStudio for processing
Option 2: Clone Repository (For Contributors)
-
Clone this repository:
git clone https://github.com/SemanticDataCharter/Form2SDCTemplate.git cd Form2SDCTemplate
-
Upload
Form2SDCTemplate.mdto your preferred LLM (e.g., Claude, ChatGPT, etc.) -
Provide your form description, PDF, or requirements to the LLM
-
The LLM will generate a properly formatted SDCStudio template
-
Review the generated template and upload it to SDCStudio for processing
Features
LLM-Optimized Instructions
- Comprehensive step-by-step guide for AI assistants
- Complete keyword glossary with usage examples
- Clear structure and formatting requirements
- Multi-language support (keywords in English, content in source language)
SDC4 Compliance
- Generates templates conforming to SDC 4.0 specifications
- Supports all SDC4 data types (XdString, XdCount, XdQuantity, etc.)
- User-friendly type system (text, integer, decimal, date, etc.)
- Intelligent type mapping based on context clues
Complete Template Generation
- YAML front matter with metadata
- Dataset overview and business context
- Root and sub-cluster organization
- Column definitions with constraints and enumerations
- Component reuse support (NIEM, FHIR, HL7v3)
- Example templates in English, French, and Brazilian Portuguese
Rapid Development
- Eliminates manual template creation
- Reduces development time from hours to minutes
- Enables iterative refinement through conversational AI
- Supports forms in any language
Use Cases
This tool is designed for:
- Healthcare Organizations developing clinical data collection forms
- Research Institutions creating standardized research data templates
- Data Architects prototyping SDC4 template structures
- Developers integrating SDC4 into existing systems
- Data Governance Teams standardizing data collection processes
Documentation
- Form2SDCTemplate.md - Complete LLM instructions and reference guide
- CLAUDE.md - Detailed guidance for AI-assisted development
- CONTRIBUTING.md - How to contribute to this project
- SECURITY.md - Security policy and vulnerability reporting
- CHANGELOG.md - Version history and release notes
How It Works
- Upload Instructions: Upload
Form2SDCTemplate.mdto an LLM (Claude, ChatGPT, etc.) - Provide Form: Share your form description, PDF, or requirements
- LLM Generates: The LLM creates a properly formatted template following SDC4 specifications
- Review & Upload: Review the generated template and upload to SDCStudio
- Automatic Processing: SDCStudio's MD2PD system parses and validates the template
Usage Examples
Below are example prompts showing how to request template generation from an LLM. Upload Form2SDCTemplate.md first, then use one of these prompts along with your form/PDF.
English (en)
Please use the instructions in Form2SDCTemplate.md along with the attached PDF form
to create an SDCStudio template in markdown format.
Key requirements:
- Use English keywords (Type, Description, Enumeration, etc.)
- Keep all field names, descriptions, and values in the same language as the form
- Include all fields from the PDF with appropriate data types
- Add constraints for required fields and validation rules
- Use enumerations for dropdown lists and radio buttons
- Provide realistic examples for each field
French (fr)
Veuillez utiliser les instructions dans Form2SDCTemplate.md avec le formulaire PDF
ci-joint pour créer un template SDCStudio au format markdown.
Exigences clés :
- Utiliser les mots-clés en anglais (Type, Description, Enumeration, etc.)
- Conserver tous les noms de champs, descriptions et valeurs dans la langue du formulaire
- Inclure tous les champs du PDF avec les types de données appropriés
- Ajouter des contraintes pour les champs obligatoires et les règles de validation
- Utiliser des énumérations pour les listes déroulantes et boutons radio
- Fournir des exemples réalistes pour chaque champ
Brazilian Portuguese (pt-BR)
Por favor, use as instruções no Form2SDCTemplate.md junto com o formulário PDF
anexado para criar um template SDCStudio em formato markdown.
Requisitos principais:
- Usar palavras-chave em inglês (Type, Description, Enumeration, etc.)
- Manter todos os nomes de campos, descrições e valores no idioma do formulário
- Incluir todos os campos do PDF com os tipos de dados apropriados
- Adicionar restrições para campos obrigatórios e regras de validação
- Usar enumerações para listas suspensas e botões de opção
- Fornecer exemplos realistas para cada campo
Spanish (es)
Por favor, utiliza las instrucciones en Form2SDCTemplate.md junto con el formulario PDF
adjunto para crear una plantilla SDCStudio en formato markdown.
Requisitos clave:
- Usar palabras clave en inglés (Type, Description, Enumeration, etc.)
- Mantener todos los nombres de campos, descripciones y valores en el idioma del formulario
- Incluir todos los campos del PDF con los tipos de datos apropiados
- Agregar restricciones para campos obligatorios y reglas de validación
- Usar enumeraciones para listas desplegables y botones de opción
- Proporcionar ejemplos realistas para cada campo
Advanced Usage Examples
With specific domain context:
I'm uploading a healthcare patient intake form (PDF attached). Please use
Form2SDCTemplate.md to create a template.
Additional context:
- Domain: Healthcare
- This form will be used in a clinical setting
- Fields like patient_id, date_of_birth, and medical_record_number should use
identifier type
- Include HIPAA-relevant field classifications where applicable
- Enable LLM enrichment (set enable_llm: true)
Multiple forms/sections:
I have three related forms (PDFs attached):
1. Patient Demographics
2. Medical History
3. Insurance Information
Please use Form2SDCTemplate.md to create a single template with three sub-clusters,
one for each form. Use appropriate data types and maintain the relationships between
sections.
Form in specific language:
Attached is a Brazilian government form (Cadastro de Contribuinte) in Portuguese.
Please use Form2SDCTemplate.md to generate the template.
Important:
- Keep all keywords in English (Type, Description, etc.)
- Keep all content in Portuguese (field names, descriptions, examples)
- Include Brazilian-specific fields (CPF, CNPJ, CEP, UF)
- Use proper Brazilian address format
- Include all 27 Brazilian states in UF enumeration
Related Projects
Part of the Semantic Data Charter ecosystem:
- SDCRM - Reference model and schemas
- SDCObsidianTemplate - Obsidian vault template
- sdcvalidator - Python validation library
- sdcvalidatorJS - JavaScript/npm validator
System Requirements
For Colab/Python usage:
- Python 3.10+
- Google AI API key (free tier available at aistudio.google.com)
For manual LLM usage:
- LLM with markdown file upload capability (Claude, ChatGPT, etc.)
- Basic understanding of form structure and data collection
Optional: SDCStudio for template testing and refinement
Standards Compliance
Form2SDCTemplate supports generation of templates compliant with:
- W3C XML Schema (XSD)
- W3C RDF/OWL for semantic modeling
- ISO 11179 metadata standards
- ISO 20022 data component specifications
- HL7 standards for healthcare data
Version Information
Current Version: 4.2.5
The major version (4.x.x) aligns with SDC Generation 4, ensuring compatibility across the SDC4 ecosystem. See CHANGELOG.md for detailed version history.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for:
- How to submit issues and feature requests
- Guidelines for pull requests
- Development workflow and testing procedures
- Community standards and code of conduct
Security
For security concerns or vulnerability reports, please refer to our SECURITY.md policy or contact security@axius-sdc.com.
License
Copyright 2025 Axius-SDC, Inc.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Acknowledgments
This project builds upon:
- The Semantic Data Charter (SDC) framework
- International standards from W3C, ISO, and HL7
- Open source contributions from the data modeling community
- Academic research in semantic data representation (12+ peer-reviewed papers, 165+ citations)
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: security@axius-sdc.com
- Website: Coming soon
Semantic Data Charter™ and SDC™ are trademarks of Axius-SDC, Inc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file form2sdc-4.2.5.tar.gz.
File metadata
- Download URL: form2sdc-4.2.5.tar.gz
- Upload date:
- Size: 55.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dcf86ff1806cc05eab4e404098a24bd5750c3b6f8c1b4bd5022364c9c7360d14
|
|
| MD5 |
6bd26209f97c8c44966d874dc306c9e9
|
|
| BLAKE2b-256 |
cd5a2d8a495e969a36a683e2858c7338f9f31bd210693bacdc6f042e044c992c
|
File details
Details for the file form2sdc-4.2.5-py3-none-any.whl.
File metadata
- Download URL: form2sdc-4.2.5-py3-none-any.whl
- Upload date:
- Size: 47.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a254f20ac65167b3a71275c02dabc477616450c96a042ed50ef173a5796fd73b
|
|
| MD5 |
7c2d395dc2faa5a69b005e0df7c228c1
|
|
| BLAKE2b-256 |
af9c7b0e69f41c11b8ef01b961f7b35ac2efe67c7f257d18d582f8410e3f016b
|