AiDetector provides a simple interface to train and run models to classify if text was generated by AI or not.
Project description
AI Detector: Detecting AI Generated Text
Overview
AI Detector is a Python module, based on PyTorch, that simplifies the process of training and deploying a classification model to detect whether a given text has been generated by AI. It is designed to be platform-agnostic, making AI detection capabilities accessible to users across different work environments.
Installation
There are two methods available for installing the AI Detector module:
-
Using pip: You can install AI Detector directly from PyPI using pip by running the following command:
pip3 install aidetector
-
From this repository: Alternatively, you can clone this repository and install it locally:
git clone https://github.com/baileytec-labs/aidetector.git cd aidetector pip3 install .
Usage
AI Detector can be operated in two modes: training and inference.
Training
To train a new model, you need a CSV dataset with a classification column (labels: 0 for human-written and 1 for AI-generated text) and a text column (the text data). The script takes the following command-line arguments:
aidetector train --datafile [path_to_data] --modeloutputfile [path_to_model] --vocaboutputfile [path_to_vocab] --tokenmodel [SpaCy model] --percentsplit [percentage_for_test_split] --classificationlabel [classification_label_in_data] --textlabel [text_label_in_data] --download --lowerbound [lower_bound_for_early_stopping] --upperbound [upper_bound_for_early_stopping] --epochs [number_of_epochs]
Inference
To make predictions with a trained model, you need to provide the text you want to classify. The script takes the following command-line arguments:
aidetector infer --modelfile [path_to_trained_model] --vocabfile [path_to_vocab] --text [text_to_classify] --tokenmodel [SpaCy_model] --threshold [probability_threshold_for_classification] --download [flag_to_download_SpaCy_model]
The prediction will be printed to the console: "This was written by AI" or "This was written by a human."
Python API
You can use all the functionality of AiDetector in your python programs, it's as simple as starting with
from aidetector.aidetectorclass import *
from aidetector.inference import *
from aidetector.training import *
from aidetector.tokenization import *
#or
import aidetector as ad
From there, you have access to all of the training, inference, and tokenization capabilities.
for example,
#Getting inference of an AI model in python
from aidetector.tokenization import *
from aidetector.inference import *
from aidetector.aidetectorclass import *
tokenizer=get_tokenizer()
vocab=load_vocab("./myvocabfile.vocab")
model = AiDetector(len(vocab))
testtext="Is this written by AI?"
model.load_state_dict(torch.load("./mymodelfile.model"))
isai=check_input(
model,
vocab,
testtext,
tokenizer=tokenizer,
)
#returns 0 if human, 1 if AI.
Dependencies
The main dependencies for this project include:
PyTorch SpaCy Torchtext scikit-learn pandas argparse Halo
Note: For tokenization, the project uses SpaCy models. By default, it uses the multi-language model xx_ent_wiki_sm, but other models can be specified using the --tokenmodel argument. If the model is not already downloaded, you can use the --download flag to download the model.
Contributing
Contributions to the AI Detector project are welcome. Please review CONTRIBUTION.md for further instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for aidetector-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bc058ae0a443309cc72e1621ffde655b15a452fc925ab63ffd335f243b40214 |
|
MD5 | cb954bb5e88529e9bf1aa106647af3af |
|
BLAKE2b-256 | b9d320e3fae4136b417c8a765a2d03a5a64b890e861ecbb1af693152cac97d3f |