textcrafts: Summary, keyphrase and relation extraction with dependecy graphs
Project description
TextGraphCrafts
Python-based summary, keyphrase and relation extractor from text documents using dependency graphs.
HOME: https://github.com/ptarau/TextGraphCrafts
Project Description
** The system uses dependency links for building Text Graphs, that with help of a centrality algorithm like PageRank, extract relevant keyphrases, summaries and relations from text documents. Developed with Python 3, on OS X, but portable to Linux.**
Dependencies:
- python 3.7 or newer, pip3, java 9.x or newer. Also, having git installed is recommended for easy updates
pip3 install nltk
- also, run in python3 something like
import nltk
nltk.download('wordnet')
nltk.download('words')
nltk.download('stopwords')
- or, if that fails on a Mac, use run
python3 down.py
to collect the desired nltk resource files. pip3 install networkx
pip3 install requests
pip3 install graphviz
, also ensure .gv files can be viewedpip3 install stanfordnlp
parser- Note that
stanfordnlp
requires torch binaries which are easier to instal with ````anaconda```.
Tested with the above on a Mac, with macOS Mojave and Catalina and on Ubuntu Linux 18.x.
Running it:
in a shell window, run
start_server.sh
in another shell window, start with
python3 -i tests.py
and then interactively, at the ">>>" prompt, try
>>> test1()
>>> test2()
>>> ...
>>> test9()
>>> test12()
>>> test0()
see how to activate other outputs in file
deepRank.py
text file inputs (including the US Constitution const.txt) are in the folder
examples/
Handling PDF documents
The easiest way to do this is to install pdftotext, which is part of Poppler tools.
If pdftotext is installed, you can place a file like textrank.pdf already in subdirectory pdfs/ and try something similar to:
Change setting in file params.py to use the system with other global parameter settings.
Alternative NLP toolkit
Optionally, you can activate the alternative Stanford CoreNLP toolkit as follows:
- install Stanford CoreNLP and unzip in a derictory of your choice (ag., the local directory)
- edit if needed
start_parser.sh
with the location of the parser directory - override the
params
class and setcorenlp=True
Note however that the Stanford CoreNLP is GPL-licensed, which can place restrictions on proprietary software activating this option.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file textcrafts-0.1.2.tar.gz
.
File metadata
- Download URL: textcrafts-0.1.2.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/42.0.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82b51a2d3a462a75339401991ce64207914c6d93f01ec4d31c40c6c0b40a25ac |
|
MD5 | 42dadc981955d6d79631bec151254aeb |
|
BLAKE2b-256 | 288c35ef0a663fe48ecf3fe57cd03d5e792466a79cea5a8a313798c569ac7caa |
File details
Details for the file textcrafts-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: textcrafts-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/42.0.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d1cb15aba45280852a51da0e3ec46efb8e3aca9e43a63995a10b119d8a16ad0 |
|
MD5 | f4b33fd5ca93df15784de7412fc67981 |
|
BLAKE2b-256 | 50322f4e76eb7cd4e13bd6f74db1b29badcd4ac8689c913d073866afbbf654f3 |