A tool to extract and format academic data from Web of Science and Crossref
Project description
COSC425-DATA
A repository which implements data collection of a University's academic research articles within a given time period and classifies them into categories defined by the NSF PhD research focus areas taxonomy then provides:
- Data on an article level
- Data on individual authors
- Data on category level
Currently the data is outputted in JSON format. There exists a script for converting the JSON to an Excel file but is currently somewhat finnicky.
A more thorough offline file formatting will be implemented in the future.
How to install
For non-development
-
Install the package
pip install academic-metrics
-
Create a
.env
file in the root directory and add your OpenAI API key:OPENAI_API_KEY=<your_openai_api_key>
-
Create a script
run_pipeline.py
in the root directory and add the following:from academic_metrics.runners.pipeline import PipelineRunner runner = PipelineRunner(ai_api_key=os.getenv("OPENAI_API_KEY")) runner.run_pipeline()
For development
- Clone the repository:
- HTTPS:
git clone https://github.com/SpencerPresley/COSC425-DATA.git
- SSH:
git clone git@github.com:SpencerPresley/COSC425-DATA.git
- HTTPS:
- Navigate into the project root directory
cd COSC425-DATA
and run the setup scriptpython setup_environment.py
:- This will install the academic_metrics package in editable mode and configure the pre-commit in
.git/hooks
- The git hook will format the code on commit using black
- This will install the academic_metrics package in editable mode and configure the pre-commit in
Note
As of 11/9/2024 the pipeline runs off input files in src/academic_metrics/data/core/input_files
Shortly integration of the crossref API code will be made in academic_metrics/runners/pipeline.py
so that you can pass in your school name, data range, etc. to get your own data outputted.
Integration for writing to a mongoDB database is currently implemented only for our use case, future integration will allow two modes:
- Offline output files to
src/academic_metrics/data/core/output_files
- In this mode the API for crossref will still work but the output files will be saved locally rather to a database.
- Database support. For this you will have to create a
.env
file in the root directory and add the following:MONGO_URI=<your_mongo_uri>
MONGO_DB_NAME=<your_mongo_db_name>
MONGO_COLLECTION_NAME=<your_mongo_collection_name>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for academic_metrics-0.1.1a0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44423633215b5bfa3154097fcf727b23e62ae1122196610b42bfe622cf4b3901 |
|
MD5 | 919e34ccec664f6f0ca16cdd72aec2cb |
|
BLAKE2b-256 | f5e60a764320027b0aacdfe30d998cdcf772c483c1b51a3c4458b1b52c1c54a7 |