Skip to main content

Extraction-based Turkish news summarizer.

Project description

SadedeGel: An extraction based Turkish news summarizer

SadedeGel is a library for unsupervised extraction-based news summarization using several old and new NLP techniques.

Development of the library takes place as a part of Açık Kaynak Hackathon Programı 2020

💫 Version 0.14 out now! Check out the release notes here.

Python package Python Version Coverage Code Quality Score Code Grade pypi Version PyPi downloads License Commit Month Commit Week Last Commit Binder Slack

📖 Documentation

Documentation
Contribute How to contribute to the sadedeGel project and code base.

💬 Where to ask questions

The SadedeGel project is maintained by @globalmaksimum AI team members @dafajon, @askarbozcan, @mccakir and @husnusensoy.

Type Platforms
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests GitHub Issue Tracker
Questions Slack Workspace

Features

  • Several news datasets

    • Basic corpus
      • Raw corpus (sadedegel.dataset.load_raw_corpus)
      • Sentences tokenized corpus (sadedegel.dataset.load_sentences_corpus)
      • Human annotated summary corpus (sadedegel.dataset.load_annotated_corpus)
    • Extended corpus
      • Raw corpus (sadedegel.dataset.extended.load_extended_raw_corpus)
      • Sentences tokenized corpus (sadedegel.dataset.extended.load_extended_sents_corpus)
  • ML based sentence boundary detector (SBD) trained for Turkish language (sadedegel.dataset)

  • Various baseline summarizers

    • Position Summarizer
      • First Important Summarizer
      • Last Important Summarizer
    • Length Summarizer
    • Band Summarizer
    • Random Summarizer
  • Various unsupervised/supervised summarizers

    • ROUGE1 Summarizer
    • Cluster Summarizer
    • Supervised Summarizer
  • Various Word Tokenizers

    • BERT Tokenizer - Trained tokenizer
    • Simple Tokenizer - Regex Based (Experimental)

📖 For more details, refere to sadedegel.ai

Install sadedeGel

  • Operating system: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
  • Python version: 3.6+ (only 64 bit)
  • Package managers: pip

pip

Using pip, sadedeGel releases are available as source packages and binary wheels.

pip install sadedegel

When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state:

python -m venv .env
source .env/bin/activate
pip install sadedegel

Quickstart with SadedeGel

To load SadedeGel, use sadedegel.load()

from sadedegel import Doc
from sadedegel.dataset import load_raw_corpus
from sadedegel.summarize import Rouge1Summarizer

raw = load_raw_corpus()

d = Doc(next(raw))

summarizer = Rouge1Summarizer()
summarizer(d, k=5)

To use our ML based sentence boundary detector

from sadedegel import Doc

doc = ("Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı. Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok "
"daha büyük ölçekte. Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu. IBM 650 Model I adını taşıyan bilgisayarın "
"satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı. Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı. "
"Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.")

Doc(doc).sents
['Bilişim sektörü, günlük devrimlerin yaşandığı ve hızına yetişilemeyen dev bir alan haline geleli uzun bir zaman olmadı.',
 'Günümüz bilgisayarlarının tarihi, yarım asırı yeni tamamlarken; yaşanan gelişmeler çok daha büyük ölçekte.',
 'Türkiye de bu gelişmelere 1960 yılında Karayolları Umum Müdürlüğü (şimdiki Karayolları Genel Müdürlüğü) için IBM’den satın aldığı ilk bilgisayarıyla dahil oldu.',
 'IBM 650 Model I adını taşıyan bilgisayarın satın alınma amacı ise yol yapımında gereken hesaplamaların daha hızlı yapılmasıydı.',
 'Türkiye’nin ilk bilgisayar destekli karayolu olan 63 km uzunluğundaki Polatlı - Sivrihisar yolu için yapılan hesaplamalar IBM 650 ile 1 saatte yapıldı.',
 'Daha öncesinde 3 - 4 ayı bulan hesaplamaların 1 saate inmesi; teknolojinin, ekonomik ve toplumsal dönüşüme büyük etkide bulunacağının habercisiydi.']

SadedeGel Server

In order to integrate with your applications we provide a quick summarizer server with sadedeGel.

python3 -m sadedegel.server 

SadedeGel Server on Heroku

SadedeGel Server is hosted on free tier of Heroku cloud services.

PyLint, Flake8 and Bandit

sadedeGel utilized pylint for static code analysis, flake8 for code styling and bandit for code security check.

To run all tests

make lint

Run tests

sadedeGel comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build sadedeGel from source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

Alternatively, you can find out where sadedeGel is installed and run pytest on that directory. Don't forget to also install the test utilities via sadedeGel's requirements.txt:

make test

References

Our Community Contributors

We would like to thank our community contributors for their bug/enhancement requests and questions to make sadedeGel better everyday

Software Engineering

  • Special thanks to spaCy project for their work in showing us the way to implement a proper python module rather than merely explaining it.

    • We have borrowed many document and style related stuff from their code base :smile:
  • There are a few free-tier service providers we need to thank:

Machine Learning (ML), Deep Learning (DL) and Natural Language Processing (NLP)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sadedegel-0.14.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sadedegel-0.14.1-py3-none-any.whl (2.3 MB view details)

Uploaded Python 3

File details

Details for the file sadedegel-0.14.1.tar.gz.

File metadata

  • Download URL: sadedegel-0.14.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for sadedegel-0.14.1.tar.gz
Algorithm Hash digest
SHA256 a8740cd0a5263dafcec3f69e87f5f542553fbfb9737214d0590929a8dde47e4a
MD5 ed999c2e605947f25d5dd4370811d6f6
BLAKE2b-256 407cb8a9392979106ce32df0ff5146d583607e4327572e2555b25b9998167035

See more details on using hashes here.

File details

Details for the file sadedegel-0.14.1-py3-none-any.whl.

File metadata

  • Download URL: sadedegel-0.14.1-py3-none-any.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for sadedegel-0.14.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2dff6bfb1fb3696d9715534db797483fdd4d79f80dc608e18c549dce2ff3f0b6
MD5 4f0f011706d718877749aeeae1b1eacf
BLAKE2b-256 3cd752a1fb1be0ff56200bed8cb151d724f4cba0b94e9350a72ca8b6ad0bcfe9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page