Ottoman Turkish Named Entity Recognition toolkit
Project description
Ottoman NER
A focused toolkit for Ottoman Turkish Named Entity Recognition
About
Ottoman NER is a specialized Python package for Named Entity Recognition (NER) in Ottoman Turkish texts. This package provides a clean, modern interface for training, evaluating, and using NER models specifically designed for historical Ottoman Turkish documents.
Key Features
- 🎯 Focused NER Solution: Dedicated solely to Ottoman Turkish named entity recognition
- 🚀 Simple API: Single class interface for all NER operations
- ⚙️ Easy Training: Train custom models with JSON configuration
- 📊 Built-in Evaluation: Comprehensive evaluation metrics with seqeval
- 🔮 Fast Prediction: Real-time entity recognition
- 🛠️ CLI Interface: Command-line tools for all operations
- 📦 PyPI Ready: Easy installation via pip
Supported Entity Types
- PER: Person names (Sultan Abdülhamid, Ahmet Paşa)
- LOC: Locations (İstanbul, Rumeli, Anadolu)
- ORG: Organizations (Divan-ı Hümayun, Meclis-i Mebusan)
- MISC: Miscellaneous entities (dates, events, titles)
Installation
From PyPI (Recommended)
pip install ottoman-ner
From Source
git clone https://github.com/fatihburakkarag/ottoman-ner.git
cd ottoman-ner
pip install -e .
# Install with development dependencies
pip install -e .[dev]
# Install with full features (visualization, experiment tracking)
pip install -e .[full]
Quick Start
1. Using Pre-trained Models
from ottoman_ner import OttomanNER
# Initialize the NER system
ner = OttomanNER()
# Load a pre-trained model
ner.load_model("models_hub/ner/ottoman-ner-standard")
# Make predictions
text = "Sultan Abdülhamid İstanbul'da yaşıyordu."
entities = ner.predict(text)
for entity in entities:
print(f"{entity['text']} -> {entity['label']} ({entity['confidence']:.2f})")
2. Training Custom Models
from ottoman_ner import OttomanNER
# Initialize
ner = OttomanNER()
# Train from configuration file
results = ner.train_from_config("configs/training.json")
print(f"Training completed! F1 Score: {results['eval_f1']:.4f}")
3. Model Evaluation
from ottoman_ner import OttomanNER
# Initialize and evaluate
ner = OttomanNER()
results = ner.evaluate(
model_path="models_hub/ner/ottoman-ner-standard",
test_file="data/test.txt"
)
print(f"F1 Score: {results['overall_f1']:.4f}")
print(f"Precision: {results['overall_precision']:.4f}")
print(f"Recall: {results['overall_recall']:.4f}")
Command Line Interface
Ottoman NER provides a comprehensive CLI for all operations:
Training
# Train a new model
ottoman-ner train --config configs/training.json
# Train with verbose output
ottoman-ner --verbose train --config configs/training.json
Evaluation
# Evaluate a trained model
ottoman-ner eval --model-path models_hub/ner/ottoman-ner-standard --test-file data/test.txt
# Save evaluation results
ottoman-ner eval --model-path models_hub/ner/ottoman-ner-standard --test-file data/test.txt --output-dir results/
Prediction
# Predict on single text
ottoman-ner predict --model-path models_hub/ner/ottoman-ner-standard --text "Sultan Abdülhamid İstanbul'da yaşıyordu"
# Predict on file
ottoman-ner predict --model-path models_hub/ner/ottoman-ner-standard --input-file input.txt --output-file predictions.json
Configuration
Create a training configuration file in JSON format:
{
"experiment": {
"experiment_name": "my-ottoman-ner"
},
"model": {
"model_name_or_path": "dbmdz/bert-base-turkish-cased",
"num_labels": 9
},
"data": {
"train_file": "data/train.txt",
"dev_file": "data/dev.txt",
"test_file": "data/test.txt",
"max_length": 512
},
"training": {
"output_dir": "models/my-model",
"num_train_epochs": 3,
"per_device_train_batch_size": 4,
"learning_rate": 2e-5,
"eval_strategy": "steps",
"eval_steps": 100,
"save_steps": 100,
"load_best_model_at_end": true,
"metric_for_best_model": "eval_f1"
}
}
Data Format
Ottoman NER expects CoNLL format data with BIO tagging:
Sultan B-PER
Abdülhamid I-PER
İstanbul B-LOC
'da O
yaşıyordu O
. O
Osmanlı B-ORG
Devleti I-ORG
'nin O
başkenti O
İstanbul B-LOC
'dur O
. O
Project Background & Acknowledgments
This project builds upon foundational work in Ottoman Turkish NLP and represents a focused effort to provide a clean, maintainable NER solution for historical Turkish texts.
References
- Karagöz et al. (2024) — "Towards a Clean Text Corpus for Ottoman Turkish" ACL Anthology
- Özateş et al. (2025) — "Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models" arXiv:2501.04828
Special Thanks
Sincere gratitude to Assoc. Prof. Şaziye Betül Özateş and the Boğaziçi University Computational Linguistics Lab (BUColin) for their foundational contributions to historical Turkish NLP.
Requirements
- Python 3.8+
- PyTorch 1.9+
- Transformers 4.20+
- See
requirements.txtfor complete dependencies
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use Ottoman NER in your research, please cite:
@software{ottoman_ner_2024,
title={Ottoman NER: A Toolkit for Ottoman Turkish Named Entity Recognition},
author={Karagöz, Fatih Burak},
year={2024},
url={https://github.com/fatihburakkarag/ottoman-ner},
version={2.0.0}
}
Related Projects
For broader Ottoman Turkish NLP research and experimental tools, see the upcoming ottominer repository (coming soon).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ottoman_ner-2.0.0.tar.gz.
File metadata
- Download URL: ottoman_ner-2.0.0.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53ae7b06888c6c99499f5f197a21ba2fd4cf78472e93bab68b4c4ae17c580874
|
|
| MD5 |
ec399ac39cfbb86c48fe162b2e225e39
|
|
| BLAKE2b-256 |
957d5afb42e59003bdf71277d4ea461a53533f20204199453d8c5ea7c559ebeb
|
File details
Details for the file ottoman_ner-2.0.0-py3-none-any.whl.
File metadata
- Download URL: ottoman_ner-2.0.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873a6851db1ed65bde07b91825c75c439143c8ef294b8271f5aadee662bce094
|
|
| MD5 |
ed7993789e94e9642f4de777e9874f81
|
|
| BLAKE2b-256 |
06785ca957b9f42585f0c4397795d1b38b20ab8d7072ca4ba19250cb57a9cd4b
|