A tool to extract and format academic data from Web of Science and Crossref
Project description
COSC425-DATA
A repository which implements data collection of a University's academic research articles within a given time period and classifies them into categories defined by the NSF PhD research focus areas taxonomy then provides:
- Data on an article level
- Data on individual authors
- Data on category level
Currently the data is outputted in JSON format. There exists a script for converting the JSON to an Excel file but is currently somewhat finnicky.
A more thorough offline file formatting will be implemented in the future.
How to install
For non-development
-
Install the package
pip install academic-metrics -
Create a
.envfile in the root directory and add your OpenAI API key:OPENAI_API_KEY=<your_openai_api_key> -
Create a script
run_pipeline.pyin the root directory and add the following:from academic_metrics.runners.pipeline import PipelineRunner runner = PipelineRunner(ai_api_key=os.getenv("OPENAI_API_KEY")) runner.run_pipeline()
For development
- Clone the repository:
- HTTPS:
git clone https://github.com/SpencerPresley/COSC425-DATA.git - SSH:
git clone git@github.com:SpencerPresley/COSC425-DATA.git
- HTTPS:
- Navigate into the project root directory
cd COSC425-DATAand run the setup scriptpython setup_environment.py:- This will install the academic_metrics package in editable mode and configure the pre-commit in
.git/hooks - The git hook will format the code on commit using black
- This will install the academic_metrics package in editable mode and configure the pre-commit in
Note
As of 11/9/2024 the pipeline runs off input files in src/academic_metrics/data/core/input_files
Shortly integration of the crossref API code will be made in academic_metrics/runners/pipeline.py so that you can pass in your school name, data range, etc. to get your own data outputted.
Integration for writing to a mongoDB database is currently implemented only for our use case, future integration will allow two modes:
- Offline output files to
src/academic_metrics/data/core/output_files- In this mode the API for crossref will still work but the output files will be saved locally rather to a database.
- Database support. For this you will have to create a
.envfile in the root directory and add the following:MONGO_URI=<your_mongo_uri>MONGO_DB_NAME=<your_mongo_db_name>MONGO_COLLECTION_NAME=<your_mongo_collection_name>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file academic_metrics-1.0.0.tar.gz.
File metadata
- Download URL: academic_metrics-1.0.0.tar.gz
- Upload date:
- Size: 126.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
523aa32f7cb8366e7c0e97a7a9bc7e46b702e8c150043b89d7542cfccead20ff
|
|
| MD5 |
f479f47a009817acac74975536570331
|
|
| BLAKE2b-256 |
2d5903fc4a469a8e9c8d7fbee0e2bf3a5334b198f38198b01963cce5f510bc22
|
File details
Details for the file academic_metrics-1.0.0-py3-none-any.whl.
File metadata
- Download URL: academic_metrics-1.0.0-py3-none-any.whl
- Upload date:
- Size: 145.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42443c0af1737771a5d2bd4729fd0a3f04d313b050aa1bfd19cc69c45cc23151
|
|
| MD5 |
4f5ca063071c9a6d2dab9a9c8a4f8294
|
|
| BLAKE2b-256 |
32e0d081bfd278ad6604e500185c193871bc415cdbf079323c38e8a423089bac
|