Skip to main content

Ottoman Turkish Named Entity Recognition toolkit

Project description

Ottoman NER

A focused toolkit for Ottoman Turkish Named Entity Recognition

Python 3.8+ License: MIT PyPI version


About

Ottoman NER is a specialized Python package for Named Entity Recognition (NER) in Ottoman Turkish texts. This package provides a clean, modern interface for training, evaluating, and using NER models specifically designed for historical Ottoman Turkish documents.

Key Features

  • 🎯 Focused NER Solution: Dedicated solely to Ottoman Turkish named entity recognition
  • 🚀 Simple API: Single class interface for all NER operations
  • ⚙️ Easy Training: Train custom models with JSON configuration
  • 📊 Built-in Evaluation: Comprehensive evaluation metrics with seqeval
  • 🔮 Fast Prediction: Real-time entity recognition
  • 🛠️ CLI Interface: Command-line tools for all operations
  • 📦 PyPI Ready: Easy installation via pip

Supported Entity Types

  • PER: Person names (Sultan Abdülhamid, Ahmet Paşa)
  • LOC: Locations (İstanbul, Rumeli, Anadolu)
  • ORG: Organizations (Divan-ı Hümayun, Meclis-i Mebusan)
  • MISC: Miscellaneous entities (dates, events, titles)

Installation

From PyPI (Recommended)

pip install ottoman-ner

From Source

git clone https://github.com/fatihburakkarag/ottoman-ner.git
cd ottoman-ner
pip install -e .

# Install with development dependencies
pip install -e .[dev]

# Install with full features (visualization, experiment tracking)
pip install -e .[full]

Quick Start

1. Using Pre-trained Models

from ottoman_ner import OttomanNER

# Initialize the NER system
ner = OttomanNER()

# Load a pre-trained model
ner.load_model("models_hub/ner/ottoman-ner-standard")

# Make predictions
text = "Sultan Abdülhamid İstanbul'da yaşıyordu."
entities = ner.predict(text)

for entity in entities:
    print(f"{entity['text']} -> {entity['label']} ({entity['confidence']:.2f})")

2. Training Custom Models

from ottoman_ner import OttomanNER

# Initialize
ner = OttomanNER()

# Train from configuration file
results = ner.train_from_config("configs/training.json")
print(f"Training completed! F1 Score: {results['eval_f1']:.4f}")

3. Model Evaluation

from ottoman_ner import OttomanNER

# Initialize and evaluate
ner = OttomanNER()
results = ner.evaluate(
    model_path="models_hub/ner/ottoman-ner-standard",
    test_file="data/test.txt"
)

print(f"F1 Score: {results['overall_f1']:.4f}")
print(f"Precision: {results['overall_precision']:.4f}")
print(f"Recall: {results['overall_recall']:.4f}")

Command Line Interface

Ottoman NER provides a comprehensive CLI for all operations:

Training

# Train a new model
ottoman-ner train --config configs/training.json

# Train with verbose output
ottoman-ner --verbose train --config configs/training.json

Evaluation

# Evaluate a trained model
ottoman-ner eval --model-path models_hub/ner/ottoman-ner-standard --test-file data/test.txt

# Save evaluation results
ottoman-ner eval --model-path models_hub/ner/ottoman-ner-standard --test-file data/test.txt --output-dir results/

Prediction

# Predict on single text
ottoman-ner predict --model-path models_hub/ner/ottoman-ner-standard --text "Sultan Abdülhamid İstanbul'da yaşıyordu"

# Predict on file
ottoman-ner predict --model-path models_hub/ner/ottoman-ner-standard --input-file input.txt --output-file predictions.json

Configuration

Create a training configuration file in JSON format:

{
  "experiment": {
    "experiment_name": "my-ottoman-ner"
  },
  "model": {
    "model_name_or_path": "dbmdz/bert-base-turkish-cased",
    "num_labels": 9
  },
  "data": {
    "train_file": "data/train.txt",
    "dev_file": "data/dev.txt",
    "test_file": "data/test.txt",
    "max_length": 512
  },
  "training": {
    "output_dir": "models/my-model",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 4,
    "learning_rate": 2e-5,
    "eval_strategy": "steps",
    "eval_steps": 100,
    "save_steps": 100,
    "load_best_model_at_end": true,
    "metric_for_best_model": "eval_f1"
  }
}

Data Format

Ottoman NER expects CoNLL format data with BIO tagging:

Sultan B-PER
Abdülhamid I-PER
İstanbul B-LOC
'da O
yaşıyordu O
. O

Osmanlı B-ORG
Devleti I-ORG
'nin O
başkenti O
İstanbul B-LOC
'dur O
. O

Project Background & Acknowledgments

This project builds upon foundational work in Ottoman Turkish NLP and represents a focused effort to provide a clean, maintainable NER solution for historical Turkish texts.

References

  • Karagöz et al. (2024)"Towards a Clean Text Corpus for Ottoman Turkish" ACL Anthology
  • Özateş et al. (2025)"Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models" arXiv:2501.04828

Special Thanks

Sincere gratitude to Assoc. Prof. Şaziye Betül Özateş and the Boğaziçi University Computational Linguistics Lab (BUColin) for their foundational contributions to historical Turkish NLP.


Requirements

  • Python 3.8+
  • PyTorch 1.9+
  • Transformers 4.20+
  • See requirements.txt for complete dependencies

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.


Citation

If you use Ottoman NER in your research, please cite:

@software{ottoman_ner_2024,
  title={Ottoman NER: A Toolkit for Ottoman Turkish Named Entity Recognition},
  author={Karagöz, Fatih Burak},
  year={2024},
  url={https://github.com/fatihburakkarag/ottoman-ner},
  version={2.0.0}
}

Related Projects

For broader Ottoman Turkish NLP research and experimental tools, see the upcoming ottominer repository (coming soon).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ottoman_ner-2.0.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ottoman_ner-2.0.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file ottoman_ner-2.0.0.tar.gz.

File metadata

  • Download URL: ottoman_ner-2.0.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for ottoman_ner-2.0.0.tar.gz
Algorithm Hash digest
SHA256 53ae7b06888c6c99499f5f197a21ba2fd4cf78472e93bab68b4c4ae17c580874
MD5 ec399ac39cfbb86c48fe162b2e225e39
BLAKE2b-256 957d5afb42e59003bdf71277d4ea461a53533f20204199453d8c5ea7c559ebeb

See more details on using hashes here.

File details

Details for the file ottoman_ner-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: ottoman_ner-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for ottoman_ner-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 873a6851db1ed65bde07b91825c75c439143c8ef294b8271f5aadee662bce094
MD5 ed7993789e94e9642f4de777e9874f81
BLAKE2b-256 06785ca957b9f42585f0c4397795d1b38b20ab8d7072ca4ba19250cb57a9cd4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page