Predicts classes of environmental due diligence
Project description
EnvBert is an easy-to-use Python library built on top of Bert models to identify essential environmental data as a part of due diligence in environmental site assessments.
Feature | Output |
---|---|
EDD Prediction | Categorizes the Environment data under different classes |
Relevancy | Classify whether it's relevant or not for the Environment domain |
Ranking | Relevance probability is returned against the predicted classes |
Fine-tuning | Train for your custom Environment data and save, use your model |
Installation
Use the package manager pip to install EnvBert
pip install EnvBert
Usage
Predict with EnvBert
# load all the functions
from EnvBert.due_diligence import *
# returns the predicted class along with the probability of the actual EnvBert model
doc = """
weathered shale was encountered below the surface area with fluvial deposits.
Sediments in the coastal plain region are found above and below the bedrock
with sandstones and shales that form the basement rock"
"""
envbert_predict(doc)
Fine-tune over EnvBert with your custom Environment data and labels
# load all the functions
from EnvBert.due_diligence import *
# define training config
training_config = {
'learning_rate':5e-5,
'epochs':10,
'batch_size':16,
'sentence column name':'Sentence', #training sentences column name
'label column name': 'label', #encoded labels column name
'save_dir': r'XX\XX\XXX' #model save path
}
"""
please make sure you encode your labels
provide the save_dir path to automatically save the model after training
'sentence column name' and 'label column name' are mandatory fields in training config
you can tweak the other parameters or it will be taken by default
"""
# Train the model with just 1 line
new_model, new_tokenizer = finetune(df, training_config) #df is the dataframe with your sentences and labels
Load your fine-tuned model and predict
load_dir = r'XX\XX\XXX' #model save path
finetuned_model = finetune_predict(load_dir)
# single sentence prediction
doc= "contamination have been reported and remediation havent been carried out"
finetuned_model.sent(doc)
# predict over a dataframe column
df['prediction'] = finetuned_model.df(df, 'Sentence') #df is the dataframe and 'Sentence' is the column name
About
This Package is part of the Research topic "AI for Environment Due-Diligence" conducted by Afreen Aman, Deepak John Reji. If you use this work (code, model or dataset),
Please cite us and star at: AI for Environment Due-Diligence, (2022), GitHub repository, https://github.com/dreji18/environmental-due-diligence
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file EnvBert-1.0.6.tar.gz
.
File metadata
- Download URL: EnvBert-1.0.6.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd9e0337a9e2e657058ad5c75f533f8f2c51370b6b9f7f1e4b611b3e74290728 |
|
MD5 | 614324b9f20b2c8b50c013c4f2a0f7e5 |
|
BLAKE2b-256 | 3820e09832b5ac66aa60d65b8368654786aee5b7bfb2f0d4bb962e74c3c14daf |
File details
Details for the file EnvBert-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: EnvBert-1.0.6-py3-none-any.whl
- Upload date:
- Size: 48.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9f008903b0dd8e689daf07df6fd9470af7110d428206ece94fda800ca8dc7e8 |
|
MD5 | b5c16b9774b5457d0650139deb083f56 |
|
BLAKE2b-256 | 10edf64761fdf1f795f799a009ed5e6e55ce4b3ea50aeb01e2b9e555f3335932 |