Skip to main content

Analyzing the evolution of ideas using citation analysis

Project description

This Python package, knowknow, is an attempt to make powerful, modern tools for analyzing the structure of knowledge open to anyone. I recognize that parallel efforts exist along these lines, including CADRE, but this package is still the only resource for anyone to analyze Web of Science datasets, and the methods can be incorporated into CADRE by anyone.

Projects built on knowknow

  • amcgail/citation-death applies the concept of 'death' to attributes of citations, and analyzes the lifecourse of cited works, cited authors, and the authors writing the citations, using the sociology-wos-74b dataset.
  • amcgail/lost-forgotten digs deeper into . An online appendix is available here, and the paper published in The American Sociologist can be found here.

Datasets built with knowknow

  • Sociology
    • sociology-wos (Harvard Dataverse) every paper in WoS in early 2020 whose journal is in the 'Sociology' category, and which have full data.
    • in progress sociology-jstor in-text citations and their contexts were extracted from >90k full-text Sociology articles indexed in JSTOR.

Installation (from PyPI)

  1. Install Python 3.7+
  2. Install Build Tools for Visual Studio
  3. Run pip install knowknow-amcgail

Installation (from GitHub)

  1. Install Python 3.7+
  2. Clone this repository to your computer
  3. Create a virtualenv for knowknow
  4. In the virtualenv, execute pip install -r requirements
    • On Windows, I needed to install the latest versions of numpy, scikit-learn and scipy via .whl
    • For Windows, download from this site, install with pip install <fn.whl>

Getting Started

To get started with knowknow, you need to 1) specify where knowknow should store data and code ("init") 2) either create a new project or copy an existing one, and 3) start a JupyterLab environment.

The following commands will help you perform these actions, getting you started conducting or reproducing analyses using knowknow.

python -m knowknow init. Run this command first. It will prompt you for the directory to store data files and the directory where code will be stored.

python -m knowknow start <PROJ-NAME> For instance, python -m knowknow start citation-death. Start a JupyterLab notebook in a knowknow code directory. If the directory doesn't exist, knowknow creates the directory.

[Recommended] Interfacing with GitHub

In order to use the following commands you must install Git. This allows you to use others' code, and to publish your own code for others to use.

python -m knowknow clone <URL> For instance, python -m knowknow clone https://github.com/amcgail/lost-forgotten. Clone someone else's repository.

In order to make your own changes to others' code, or to share your code with the world, do the following:

  1. Create a GitHub account and log in.
  2. Install GitHub Desktop, which is a simple connector between Git on your computer and GitHub, in the cloud. 3a) [Share your code] In GitHub Desktop, choose File -> Create Repository, navigate to the folder containing knowknow code. This folder was created by knowknow using the start command. Now press "Publish Repository" in the upper right to add this code to your GitHub account. 3b) [Contribute to others' code] In GitHub, fork the repository you would like to contribute to. This creates a personal copy of that repository in your GitHub account. Then clone this copy into knowknow's code directory using the clone command, or using GitHub desktop. Once you are satisfied with your updates, and they are pushed back to GitHub, submit a "pull request" to the original repository to ask them to review and merge your changes.

Auto-downloading Data and Code

Data files will be automatically downloaded during code execution, if they are not alredy in the data directory you specified with the init command. This may take up significant bandwidth -- the data files for the Sociology dataset are ~750MB.

Code specified by the knowknow.reqiure function will be automatically downloaded by knowknow into the code directory you specified with the init command. Be sure you trust whoever wrote the code you download. Running arbitrary code from random strangers on your computer is a security risk.

Developing

If you want to contribute edits of your own, fork this repository into your own GitHub account, make the changes, and submit a request for me to incorporate the code (a "pull request"). This process is really easy with GitHub Desktop (tutorial here).

There is a lot to do! If you find this useful to your work, and would like to contribute (even to the following list of possible next steps) but can't figure out how, please don't hesitate to reach out. My website is here, Twitter here.

Possible projects

  • The documentation for this project can always be improved. This is typically through people reaching out to me when they have issues. Please feel free.
  • complete An object-oriented model for handling context would prevent the need for so much variable-passing between functions, reduce total code volume, and improve readability.
  • ongoing Different datasets and sources could be incorporated, if you have the need, in addition to JSTOR and WoS.
  • complete - you can now upload data files to Harvard's Dataverse If you produce precomputed binaries and have an idea of how we could incorporate the sharing of these binaries within this library, please DM me or something. That would be great.
  • ongoing, future work All analyses can be generalized to any counted variable of the citations. This wouldn't be tough, and would have a huge payout.
  • huge project, uncertain payout It would be amazing if we could make a graphical interface for this.
    • user simply imports data, chooses the analyses they want to run, fill in configuration parameters and press "go"
    • the output is a PDF with the code, visualizations, and explanations for a given analysis
    • behind the scenes, all this GUI does is run nbconvert
    • also could allow users to regenerate any/all analyses for each dataset with the click of a button
    • could provide immediate access to online archives, either to download or upload similar count datasets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowknow-amcgail-0.3.1.tar.gz (181.5 kB view hashes)

Uploaded Source

Built Distribution

knowknow_amcgail-0.3.1-py3-none-any.whl (51.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page