Skip to main content

A deep learning model for the automatic classification of online educational materials.

Project description

BiGBERT

BiGBERT is a pre-trained deep learning model that uses website URLs and their respective descriptions to identify educational resources.

Installation

To begin using BiGBERT, install the PyPi package:

pip install bigbert

Important Note:

The installation size of the package is relatively small, but the first time you instantiate an instance of BiGBERT, two large files need to be downloaded. Details for these files, and their sizes, are provided in the table below.

File Size Purpose
edu2vec.txt 5.2 GB Word embeddings infused with educational standards domain knowledge. Used by the URL vectorizer component internally.
bertedu_1e-6lr.p 438.0 MB A BERT model fine-tuned with educational domain knowledge. Used for the snippet vectorizer internally.

Data Prep

BiGBERT expects a pandas.DataFrame as input with two columns: "url" and "description".

Usage

import numpy as np
import pandas as pd
from bigbert.bigbert import BiGBERT
from sklearn.metrics import accuracy_score

# This file should have "url", "description" along with "target" columns
data = pd.read_csv("some/data/file.csv")
y = data["target"]
X = data.drop(columns=["target"], inplace=True)

model = BiGBERT()
y_pred = model.predict(X)
print(accuracy_score(y, np.argmax(y_pred, axis=1)))

Citation

If you use BiGBERT in a research publication, please include the following citation (provided in BibTeX format):

@inproceedings{allen2021bigbert,
  title={BiGBERT: Classifying Educational WebResources for Kindergarten-12$^{th}$ Grades},
  author={Allen, Garrett and Downs, Brody and Shukla, Aprajita and Kennington, Casey and Fails, Jerry Alan and Wright, Katherine Landau and Pera, Maria Soledad},
  booktitle={European Conference on Information Retrieval},
  pages={176-184},
  year={2021},
  organization={Springer}
}

License

BiGBERT is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigbert-0.1.0.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bigbert-0.1.0-py3-none-any.whl (7.1 MB view details)

Uploaded Python 3

File details

Details for the file bigbert-0.1.0.tar.gz.

File metadata

  • Download URL: bigbert-0.1.0.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.0

File hashes

Hashes for bigbert-0.1.0.tar.gz
Algorithm Hash digest
SHA256 231e6bd27afe96a79265184c681e7fb8308670a980f5d82e8ec89b3e40cfa33a
MD5 0e352dfc605938fe29788b2f5eee272c
BLAKE2b-256 b55d9ae5c7d0cf491c634eed4d4e8467c7bdbe718f7195765b9b3e440d345c0f

See more details on using hashes here.

File details

Details for the file bigbert-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bigbert-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.0

File hashes

Hashes for bigbert-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e65a76926e3ba5a08d82bbbfb38104ebc4d1a1fa34d54d465a0a2834ff8a8852
MD5 547ef0049cb463461aef44e8cef93e8d
BLAKE2b-256 2e6138751451504326862ce6ca996ba210f92591617ef2966331950e76826272

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page