Skip to main content

Community analysis in bibliographical references

Project description

RefCliq

This is a Python program that uses a network clustering algorthym to anlayze Web of Knowledge records in BibTeX format. The results are helpful for constructing literature reviews.

Requires Python 3 with the following dependencies:

You can install NLTK, Pybtex and NetworkX through easy_install or pip. To use community, pip install python-louvain (the 'community' package is something else).

Sample Usage:

This program works exclusively with articles downloaded from Web of Knowledge that include the "Full Record" including "Cited References". Using a community detection algorithm, this program identifies clusters of citations and reports the most central works in each clusters. For each cluster, it all list the most commonly occurring words in abstracts (or titles when abstracts are not available) of articles that heavily cite works from the cluster and the journals where that frequently cite these works.

For each of the most central references in the cluster, the report includes the within cluster betweeness centrality, total count of the times the work was cited and the words used most frequently in articles/titles that cite that work. Selecting on a reference opens a new page that shows the contemporary works that cite that reference.

Here is a sample output based on articles published in US sociological journals in 2011 and 2012.

If you had all the articles from the journal Mobilization stored as moby.bib and you wanted the analysis to be stored in a folder called moby:

$ python refcliq.py moby.bib -d moby

The contents of the output directory include:

  • index.html which is the cluster report.
  • refs folder which contains information about listed references.
  • cites.json which stores the resulting reference network for mapping in a format that d3js reads.
  • cites.gexf ... Gephi format.

To run a standard analysis of the files, us_2012_a.bib, us_2012_b.bib, and , us_2012_c.bib and save the results in a folder called us_sociology.

$ python refcliq.py us_2012.bib -d us_sociology

Using the same files and destination, but include all references with at least two citations.

$ python refcliq.py us_2012*.bib -d us_sociology -n 2

Using the same files and destination, but include all edges with at least two occurrences.

$ python refcliq.py us_2012*.bib -d us_sociology -e 2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refcliq-0.0.8.tar.gz (15.5 kB view hashes)

Uploaded Source

Built Distribution

refcliq-0.0.8-py3-none-any.whl (22.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page