Skip to main content

An automatic text mining tool

Project description

GatorMiner

Build Status codecov Built with spaCy Built with Streamlit

An automated text-mining tool written in Python to measure the technical responsibility of students in computer science courses, being used to analyze students' markdown reflection documents and five questions survey based on Natural Language Processing in the Department of Computer Science at Allegheny College.

Installation

You can clone the repository by running the following command:

git clone git@github.com:Allegheny-Ethical-CS/GatorMiner.git

cd into the project root folder:

cd GatorMiner

This program uses Pipenv for dependency management.

  • If needed, install and upgrade the pipenv with pip:

    pip install pipenv -U
    
  • To create a default virtual environment and use the program:

    pipenv install
    

GatorMiner relies on en_core_web_sm and/or en_core_web_md, English models trained on written web text (blogs, news, comments) that includes vocabulary, vectors, syntax and entities.

To install the pre-trained model, you can run (one of) the following commands:

pipenv run python -m spacy download en_core_web_sm
pipenv run python -m spacy download en_core_web_md

Web Interface

GatorMiner is mainly developed on its web interface with Streamlit in order to provide fast text analysis and visualizations.

In order to run the Streamlit interface, type and execute the following command in your terminal:

pipenv run streamlit run streamlit_web.py

You then will see something like this in your terminal window:

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501
Network URL: http://xxx.xxx.x.x:8501

The web interface will be automatically opened in your browser:

browser

Data Retreiving

There are currently two ways to import text data for analysis: through local file system or AWS DynamoDB.

Local File System

You can type in the path(s) to the directorie(s) that hold reflection markdown documents. You are welcome to try the tool with the sample documents we provided in resources, for example:

resources/sample_md_reflections/lab1, resources/sample_md_reflections/lab2, resources/sample_md_reflections/lab3

AWS

Retrieving reflection documents from AWS is a feature integrated with the use of GatorGrader where students' markdown reflection documents are being collected and stored inside the a pre-configured DynamoDB database. In order to use this feature, you will need to have some credential tokens (listed below) stored as environment variables:

export GATOR_ENDPOINT=<Your Endpoint>
export GATOR_API_KEY=<Your API Key>
export AWS_ACCESS_KEY_ID=<Your Access Key ID>
export AWS_SECRET_ACCESS_KEY=<Your Secret Access Key>

It is likely that you already have these prepared when using GatorMiner in conjunction with GatorGrader, since these would already be exported when setting up the AWS services. You can read more about setting up an AWS service with GatorGrader here.

Once the documents are successfully imported, you can then navigate through the select box in the sidebar to view the text analysis:

select box
Reflection Documents

We are using markdown format for the student reflection documents. Its organized structure allows us to parse and perform text analysis easily. With that said, there are few requirements for the reflection document before it could be seamlessly processed and analyzed with GatorMiner. A template is provided within the repo. Note that the headers with the assignment's and student's ID/name are required. GatorMiner is set in default to take the first header as assignment name and the second header as student name.

You can also check out the sample json report to see the format of json reports GatorMiner gathers from AWS.

Analysis

frequency sentiment similarity topic

Contribution

We are excited that you would take the time to contribute to GatorMiner! We have provided a contributing guideline that will help you effectively get started and make contributions to the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GatorMiner-1.0.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

GatorMiner-1.0.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file GatorMiner-1.0.0.tar.gz.

File metadata

  • Download URL: GatorMiner-1.0.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for GatorMiner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 af3e9bd3e8fba7df585399dfafa7718f344138d368c411ced987cf463a3b0822
MD5 979ca447edfe672a539433618099cbb6
BLAKE2b-256 ad1061314881f39a4c029655da114a6a9dd19ae6bdb5161642be22d64b632d65

See more details on using hashes here.

File details

Details for the file GatorMiner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: GatorMiner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.8

File hashes

Hashes for GatorMiner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6687614f7da5da7993df559e6207d73af59056bdfa3d3a09bd2b9f6960214d25
MD5 ccc5e4f1f29d522bc33cbcc332b5e0c4
BLAKE2b-256 55e2132f30b856675a7fcbd0470f1dd86156f6fcb16ffd2f3c7ed5ed2420f0ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page