Skip to main content

Extracts Keyphrases from Documents

Project description

# C-Rank

C-Rank is an unsupervised keyphrase extraction algorithm that uses Concept Linking in order to improve its results.

It does not need external data to be inputted by the user other than the document to have its keyphrases extracted.

It is necessary to create an account to Babelfy (http://babelfy.org/login) as C-Rank uses its services. Then, your Babelfy key must be inserted in orde to C-Rank work properly.

## Installation
The following packages must be installed to use C-Rank:

networkx (https://networkx.github.io/):
```
pip install networkx
```

nltk:(https://www.nltk.org/index.html)
```
sudo pip install -U nltk
```

pybabelfy: (https://github.com/aghie/pybabelfy)
```
sudo pip install pybabelfy
```

Then:
```
sudo pip install C-Rank
```


## Getting started
```
import CRank as cr

crank = cr.CRank(BABELFY_KEY, LIST_OF_INPUT_DOCUMENTS, OUTPUT_DIRECTORY)
#Exemple
#crank = cr.CRank("3ejklasd-a456-41ae-647f-0a1234546dd3", ['./document1.txt', './document2.txt'], './')
crank.keyphrasesExtraction()

printKeyphrases()
```
## Functionalities
```
# all printing options
printKeyphrases(self, nKeyphrases = 10, documentIndex=-1, showRanking = True, stem = False)

# save options to persist keyphrases in a single file (as in SemEval)
saveKeyphrasesSingleFile(self, fileName, nKeyphrases = 10, documentIndex=-1, showRanking = True, stem = False)

# save options to persist keyphrases in diferent files
saveKeyphrasesDiferentFiles(self, nKeyphrases = 10, documentIndex=-1, showRanking = True, stem = False)

# variables used in above functionalities
##nKeyphrases = number of kyphrases to print | nKeyphrases = 0 for all keyphrases
##documentIndex = index of document to print | documentIndex = -1 for all documents
##showRanking = show or not weight of keyphrases
##stem = stem or not keyphrases
##fileName = name of the file
```
### Intermediate results and available variables
```
self.key = BabelfyKey
self.inputFiles = inputFiles
self.outputDirectory = outputDirectory
self.lang = language
self.distance = dist
self.graphName = []
self.splitted_text = []
self.dictionary = []
self.dictionaryCode = []
self.weight = []
self.paragraphs_annotations = []
self.paragraphs_text = []
self.paragraphs_code = []
self.graphs = []
self.graphs2 = []
self.keyPhrases = []
```
## Citation
Available soon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

C-Rank-3.tar.gz (9.8 kB view details)

Uploaded Source

File details

Details for the file C-Rank-3.tar.gz.

File metadata

  • Download URL: C-Rank-3.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for C-Rank-3.tar.gz
Algorithm Hash digest
SHA256 4795a0c4f7cbd78aa67d25cee2d94db243848d468b16d6c3826c2092d9c5fda9
MD5 bade765038796523803550d1f989eca8
BLAKE2b-256 4a9d86ce4c59b97ccf07bc402102c2290eb6f8c5c0d48331d0efad5493263128

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page