Creates a SQLite database if the CNN and DailyMail summarization dataset.
Project description
CNN/DailyMail Dataset as SQLite
Creates a SQLite database if the CNN and DailyMail summarization dataset.
Documentation
See the full documentation. The API reference is also available.
Obtaining
The easiest way to install the command line program is via the pip
installer:
pip3 install zensols.cnndmdb
Binaries are also available on pypi.
Usage
First create the SQLite database file: cnndmdb load
and check to make sure
the file data/cnn.sqlite3
was created. This takes a while since the entire
corpus is first downloaded and then inserted into the SQLite file.
Command Line
The SQLite database keys can be given:
cnndmdb keys
Then the command line can also be used to print articles:
cnndmdb show -t org 3b07f5102c69e3e609d73b2ccb0dc5549d4fbaf6
The -t org
tells it to use the original corpus keys. This option also allows
for selected SQLite rowid
keys or a Kth smallest article.
API
The corpus objects are accessible as mapped Python objects. For example:
corpus: Corpus = ApplicationFactory.get_corpus()
art: Article = next(iter(corpus.stash.values()))
print(art.text)
Data Source
The data is sourced from a Tensorflow dataset, which in turn uses the Abigail See GitHub repository.
@article{DBLP:journals/corr/SeeLM17,
author = {Abigail See and
Peter J. Liu and
Christopher D. Manning},
title = {Get To The Point: Summarization with Pointer-Generator Networks},
journal = {CoRR},
volume = {abs/1704.04368},
year = {2017},
url = {http://arxiv.org/abs/1704.04368},
archivePrefix = {arXiv},
eprint = {1704.04368},
timestamp = {Mon, 13 Aug 2018 16:46:08 +0200},
biburl = {https://dblp.org/rec/bib/journals/corr/SeeLM17},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
@inproceedings{hermann2015teaching,
title={Teaching machines to read and comprehend},
author={Hermann, Karl Moritz and Kocisky, Tomas and Grefenstette, Edward and Espeholt, Lasse and Kay, Will and Suleyman, Mustafa and Blunsom, Phil},
booktitle={Advances in neural information processing systems},
pages={1693--1701},
year={2015}
}
Changelog
An extensive changelog is available here.
License
Copyright (c) 2023 Paul Landes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file zensols.cnndmdb-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: zensols.cnndmdb-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a1b3c29a44f6525e3cfde91a95737453e530ecc763304dfa72fafb36badb7a4 |
|
MD5 | e80015f4297ba313fe02c13b2017058e |
|
BLAKE2b-256 | 684bab5e8401c6ae10b0dec3bf962ce1d3978d7778931c2b628ccec8ed620564 |