Skip to main content

A tool to extract and format academic data from Web of Science and Crossref

Project description

COSC425-DATA

A repository which implements data collection of a University's academic research articles within a given time period and classifies them into categories defined by the NSF PhD research focus areas taxonomy then provides:

  • Data on an article level
  • Data on individual authors
  • Data on category level

Currently the data is outputted in JSON format. There exists a script for converting the JSON to an Excel file but is currently somewhat finnicky.

A more thorough offline file formatting will be implemented in the future.

How to install

For non-development

  1. Install the package pip install academic-metrics

  2. Create a .env file in the root directory and add your OpenAI API key: OPENAI_API_KEY=<your_openai_api_key>

  3. Create a script run_pipeline.py in the root directory and add the following:

    from academic_metrics.runners.pipeline import PipelineRunner
    
    runner = PipelineRunner(ai_api_key=os.getenv("OPENAI_API_KEY"))
    runner.run_pipeline()
    

For development

  1. Clone the repository:
    • HTTPS: git clone https://github.com/SpencerPresley/COSC425-DATA.git
    • SSH: git clone git@github.com:SpencerPresley/COSC425-DATA.git
  2. Navigate into the project root directory cd COSC425-DATA and run the setup script python setup_environment.py:
    • This will install the academic_metrics package in editable mode and configure the pre-commit in .git/hooks
    • The git hook will format the code on commit using black

Note

As of 11/9/2024 the pipeline runs off input files in src/academic_metrics/data/core/input_files

Shortly integration of the crossref API code will be made in academic_metrics/runners/pipeline.py so that you can pass in your school name, data range, etc. to get your own data outputted.

Integration for writing to a mongoDB database is currently implemented only for our use case, future integration will allow two modes:

  1. Offline output files to src/academic_metrics/data/core/output_files
    • In this mode the API for crossref will still work but the output files will be saved locally rather to a database.
  2. Database support. For this you will have to create a .env file in the root directory and add the following:
    • MONGO_URI=<your_mongo_uri>
    • MONGO_DB_NAME=<your_mongo_db_name>
    • MONGO_COLLECTION_NAME=<your_mongo_collection_name>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_metrics-0.3.0a0.tar.gz (97.1 kB view details)

Uploaded Source

Built Distribution

academic_metrics-0.3.0a0-py3-none-any.whl (112.9 kB view details)

Uploaded Python 3

File details

Details for the file academic_metrics-0.3.0a0.tar.gz.

File metadata

  • Download URL: academic_metrics-0.3.0a0.tar.gz
  • Upload date:
  • Size: 97.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for academic_metrics-0.3.0a0.tar.gz
Algorithm Hash digest
SHA256 b8487e23eb7b193aa54b10603e044350c7bc72c44adf5b9b8fce291f94d59f7b
MD5 152807220fec3d9f52871365a13c4586
BLAKE2b-256 fe08a6a2b92ef609df706ffe6e420205cffa53e74de11a6b6f5172d71681d753

See more details on using hashes here.

File details

Details for the file academic_metrics-0.3.0a0-py3-none-any.whl.

File metadata

File hashes

Hashes for academic_metrics-0.3.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 fdeabeb4d249eb5a6511cc27a3feeb97724c3b1e3af9c2fb8815710d0ca7413a
MD5 12a11a02d7551e7a4293819ba37542cc
BLAKE2b-256 6f0acb37ad9d5323a7109e1952804dfe575b3b41cf2ff545476f1249190f2df9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page