Analyzing the evolution of ideas using citation analysis
Project description
This Python package, knowknow
, is an attempt to make powerful, modern tools for analyzing the structure of knowledge open to anyone.
I recognize that parallel efforts exist along these lines, including CADRE, but this package is still the only resource for anyone to analyze Web of Science datasets, and the methods can be incorporated into CADRE by anyone.
Projects built on knowknow
- amcgail/citation-death applies the concept of 'death' to attributes of citations, and analyzes the lifecourse of cited works, cited authors, and the authors writing the citations, using the
sociology-wos-74b
dataset. - amcgail/lost-forgotten digs deeper into . An online appendix is available here, and the paper published in The American Sociologist can be found here.
Datasets built with knowknow
- Sociology
sociology-wos
(Harvard Dataverse) every paper in WoS in early 2020 whose journal is in the 'Sociology' category, and which have full data.- in progress
sociology-jstor
in-text citations and their contexts were extracted from >90k full-text Sociology articles indexed in JSTOR.
Installation (from PyPI)
- Install Python 3.7+
- Install Build Tools for Visual Studio
- Run
pip install knowknow-amcgail
Installation (from GitHub)
- Install Python 3.7+
- Clone this repository to your computer
- Create a virtualenv for
knowknow
- In the virtualenv, execute
pip install -r requirements
- On Windows, I needed to install the latest versions of
numpy
,scikit-learn
andscipy
via .whl - For Windows, download from this site, install with
pip install <fn.whl>
- On Windows, I needed to install the latest versions of
Getting Started
To get started with knowknow, you need to 1) specify where knowknow should store data and code ("init") 2) either create a new project or copy an existing one, and 3) start a JupyterLab environment.
The following commands will help you perform these actions, getting you started conducting or reproducing analyses using knowknow
.
python -m knowknow init
.
Run this command first.
It will prompt you for the directory to store data files and the directory where code will be stored.
python -m knowknow start <PROJ-NAME>
For instance, python -m knowknow start citation-death
.
Start a JupyterLab notebook in a knowknow code directory.
If the directory doesn't exist, knowknow creates the directory.
[Recommended] Interfacing with GitHub
In order to use the following commands you must install Git. This allows you to use others' code, and to publish your own code for others to use.
python -m knowknow clone <URL>
For instance, python -m knowknow clone https://github.com/amcgail/lost-forgotten
.
Clone someone else's repository.
In order to make your own changes to others' code, or to share your code with the world, do the following:
- Create a GitHub account and log in.
- Install GitHub Desktop, which is a simple connector between Git on your computer and GitHub, in the cloud.
3a) [Share your code] In GitHub Desktop, choose
File -> Create Repository
, navigate to the folder containing knowknow code. This folder was created by knowknow using thestart
command. Now press "Publish Repository" in the upper right to add this code to your GitHub account. 3b) [Contribute to others' code] In GitHub,fork
the repository you would like to contribute to. This creates a personal copy of that repository in your GitHub account. Then clone this copy into knowknow's code directory using theclone
command, or using GitHub desktop. Once you are satisfied with your updates, and they are pushed back to GitHub, submit a "pull request" to the original repository to ask them to review and merge your changes.
Auto-downloading Data and Code
Data files will be automatically downloaded during code execution, if they are not alredy in the data directory you specified with the init
command. This may take up significant bandwidth -- the data files for the Sociology dataset are ~750MB.
Code specified by the knowknow.reqiure
function will be automatically downloaded by knowknow into the code directory you specified with the init
command. Be sure you trust whoever wrote the code you download. Running arbitrary code from random strangers on your computer is a security risk.
Developing
If you want to contribute edits of your own, fork this repository into your own GitHub account, make the changes, and submit a request for me to incorporate the code (a "pull request"). This process is really easy with GitHub Desktop (tutorial here).
There is a lot to do! If you find this useful to your work, and would like to contribute (even to the following list of possible next steps) but can't figure out how, please don't hesitate to reach out. My website is here, Twitter here.
Possible projects
- The documentation for this project can always be improved. This is typically through people reaching out to me when they have issues. Please feel free.
- complete An object-oriented model for handling context would prevent the need for so much variable-passing between functions, reduce total code volume, and improve readability.
- ongoing Different datasets and sources could be incorporated, if you have the need, in addition to JSTOR and WoS.
- complete - you can now upload data files to Harvard's Dataverse If you produce precomputed binaries and have an idea of how we could incorporate the sharing of these binaries within this library, please DM me or something. That would be great.
- ongoing, future work All analyses can be generalized to any counted variable of the citations. This wouldn't be tough, and would have a huge payout.
- huge project, uncertain payout It would be amazing if we could make a graphical interface for this.
- user simply imports data, chooses the analyses they want to run, fill in configuration parameters and press "go"
- the output is a PDF with the code, visualizations, and explanations for a given analysis
- behind the scenes, all this GUI does is run
nbconvert
- also could allow users to regenerate any/all analyses for each dataset with the click of a button
- could provide immediate access to online archives, either to download or upload similar count datasets
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for knowknow_amcgail-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad07b444fb6c50b1a07d0cc0080dc103b694e7ea944d77ebdce6a48a13e24100 |
|
MD5 | 9559885972fa41d53c9d73b0956be718 |
|
BLAKE2b-256 | 982097ec83e80e5b93fbbcf0a85991a08bdb4f29e4a8e8f657edb8e6243007ae |