A deep learning model for the automatic classification of online educational materials.
Project description
BiGBERT
BiGBERT is a pre-trained deep learning model that uses website URLs and their respective descriptions to identify educational resources.
Installation
To begin using BiGBERT, install the PyPi package:
pip install bigbert
Important Note:
The installation size of the package is relatively small, but the first time you instantiate an instance of BiGBERT, two large files need to be downloaded. Details for these files, and their sizes, are provided in the table below.
| File | Size | Purpose |
|---|---|---|
| edu2vec.txt | 5.2 GB | Word embeddings infused with educational standards domain knowledge. Used by the URL vectorizer component internally. |
| bertedu_1e-6lr.p | 438.0 MB | A BERT model fine-tuned with educational domain knowledge. Used for the snippet vectorizer internally. |
Data Prep
BiGBERT expects a pandas.DataFrame as input with two columns: "url" and "description".
Usage
import numpy as np
import pandas as pd
from bigbert.bigbert import BiGBERT
from sklearn.metrics import accuracy_score
# This file should have "url", "description" along with "target" columns
data = pd.read_csv("some/data/file.csv")
y = data["target"]
X = data.drop(columns=["target"], inplace=True)
model = BiGBERT()
y_pred = model.predict(X)
print(accuracy_score(y, np.argmax(y_pred, axis=1)))
Citation
If you use BiGBERT in a research publication, please include the following citation (provided in BibTeX format):
@inproceedings{allen2021bigbert,
title={BiGBERT: Classifying Educational WebResources for Kindergarten-12$^{th}$ Grades},
author={Allen, Garrett and Downs, Brody and Shukla, Aprajita and Kennington, Casey and Fails, Jerry Alan and Wright, Katherine Landau and Pera, Maria Soledad},
booktitle={European Conference on Information Retrieval},
pages={176-184},
year={2021},
organization={Springer}
}
License
BiGBERT is available under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bigbert-0.1.0.tar.gz.
File metadata
- Download URL: bigbert-0.1.0.tar.gz
- Upload date:
- Size: 7.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
231e6bd27afe96a79265184c681e7fb8308670a980f5d82e8ec89b3e40cfa33a
|
|
| MD5 |
0e352dfc605938fe29788b2f5eee272c
|
|
| BLAKE2b-256 |
b55d9ae5c7d0cf491c634eed4d4e8467c7bdbe718f7195765b9b3e440d345c0f
|
File details
Details for the file bigbert-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bigbert-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e65a76926e3ba5a08d82bbbfb38104ebc4d1a1fa34d54d465a0a2834ff8a8852
|
|
| MD5 |
547ef0049cb463461aef44e8cef93e8d
|
|
| BLAKE2b-256 |
2e6138751451504326862ce6ca996ba210f92591617ef2966331950e76826272
|