Skip to main content

Predicts classes of environmental due diligence

Project description

EnvBert is an easy-to-use Python library built on top of Bert models to identify essential environmental data as a part of due diligence in environmental site assessments.

Feature Output
EDD Prediction Categorizes the Environment data under different classes
Relevancy Classify whether it's relevant or not for the Environment domain
Ranking Relevance probability is returned against the predicted classes
Fine-tuning Train for your custom Environment data and save, use your model

Installation

Use the package manager pip to install EnvBert

pip install EnvBert

Usage

Predict with EnvBert

# load all the functions
from EnvBert.due_diligence import *

# returns the predicted class along with the probability of the actual EnvBert model
doc = """
	weathered shale was encountered below the surface area with fluvial deposits. 
	Sediments in the coastal plain region are found above and below the bedrock 
	with sandstones and shales that form the basement rock"
      """

envbert_predict(doc)

Fine-tune over EnvBert with your custom Environment data and labels

# load all the functions
from EnvBert.due_diligence import *

# define training config
training_config = {
    'learning_rate':5e-5,
    'epochs':10,
    'batch_size':16,
    'sentence column name':'Sentence', #training sentences column name
    'label column name': 'label', #encoded labels column name
    'save_dir': r'XX\XX\XXX' #model save path
    }

"""
please make sure you encode your labels
provide the save_dir path to automatically save the model after training
'sentence column name' and 'label column name' are mandatory fields in training config
you can tweak the other parameters or it will be taken by default
"""

# Train the model with just 1 line
new_model, new_tokenizer = finetune(df, training_config) #df is the dataframe with your sentences and labels

Load your fine-tuned model and predict

load_dir = r'XX\XX\XXX' #model save path

finetuned_model = finetune_predict(load_dir)

# single sentence prediction
doc= "contamination have been reported and remediation havent been carried out"
finetuned_model.sent(doc)

# predict over a dataframe column
df['prediction'] = finetuned_model.df(df, 'Sentence') #df is the dataframe and 'Sentence' is the column name

About

This Package is part of the Research topic "AI for Environment Due-Diligence" conducted by Afreen Aman, Deepak John Reji. If you use this work (code, model or dataset),

Please cite us and star at: AI for Environment Due-Diligence, (2022), GitHub repository, https://github.com/dreji18/environmental-due-diligence

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

EnvBert-1.0.6.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

EnvBert-1.0.6-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file EnvBert-1.0.6.tar.gz.

File metadata

  • Download URL: EnvBert-1.0.6.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.10

File hashes

Hashes for EnvBert-1.0.6.tar.gz
Algorithm Hash digest
SHA256 cd9e0337a9e2e657058ad5c75f533f8f2c51370b6b9f7f1e4b611b3e74290728
MD5 614324b9f20b2c8b50c013c4f2a0f7e5
BLAKE2b-256 3820e09832b5ac66aa60d65b8368654786aee5b7bfb2f0d4bb962e74c3c14daf

See more details on using hashes here.

File details

Details for the file EnvBert-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: EnvBert-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.10

File hashes

Hashes for EnvBert-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f9f008903b0dd8e689daf07df6fd9470af7110d428206ece94fda800ca8dc7e8
MD5 b5c16b9774b5457d0650139deb083f56
BLAKE2b-256 10edf64761fdf1f795f799a009ed5e6e55ce4b3ea50aeb01e2b9e555f3335932

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page