AiDetector provides a simple interface to train and run models to classify if text was generated by AI or not.
Project description
AI Detector: Detecting AI Generated Text
Overview
AI Detector is a Python module, based on PyTorch, that simplifies the process of training and deploying a classification model to detect whether a given text has been generated by AI. It is designed to be platform-agnostic, making AI detection capabilities accessible to users across different work environments.
Installation
There are two methods available for installing the AI Detector module:
-
Using pip: You can install AI Detector directly from PyPI using pip by running the following command:
pip3 install aidetector
-
From this repository: Alternatively, you can clone this repository and install it locally:
git clone https://github.com/baileytec-labs/aidetector.git cd aidetector pip3 install .
Usage
AI Detector can be operated in two modes: training and inference.
Training
To train a new model, you need a CSV dataset with a classification column (labels: 0 for human-written and 1 for AI-generated text) and a text column (the text data). The script takes the following command-line arguments:
aidetector train --datafile [path_to_data] --modeloutputfile [path_to_model] --vocaboutputfile [path_to_vocab] --tokenmodel [SpaCy model] --percentsplit [percentage_for_test_split] --classificationlabel [classification_label_in_data] --textlabel [text_label_in_data] --download --lowerbound [lower_bound_for_early_stopping] --upperbound [upper_bound_for_early_stopping] --epochs [number_of_epochs]
Inference
To make predictions with a trained model, you need to provide the text you want to classify. The script takes the following command-line arguments:
aidetector infer --modelfile [path_to_trained_model] --vocabfile [path_to_vocab] --text [text_to_classify] --tokenmodel [SpaCy_model] --threshold [probability_threshold_for_classification] --download [flag_to_download_SpaCy_model]
The prediction will be printed to the console: "This was written by AI" or "This was written by a human."
Python API
You can use all the functionality of AiDetector in your python programs, it's as simple as starting with
from aidetector.aidetectorclass import *
from aidetector.inference import *
from aidetector.training import *
from aidetector.tokenization import *
#or
import aidetector as ad
From there, you have access to all of the training, inference, and tokenization capabilities.
for example,
#Getting inference of an AI model in python
from aidetector.tokenization import *
from aidetector.inference import *
from aidetector.aidetectorclass import *
tokenizer=get_tokenizer()
vocab=load_vocab("./myvocabfile.vocab")
model = AiDetector(len(vocab))
testtext="Is this written by AI?"
model.load_state_dict(torch.load("./mymodelfile.model"))
isai=check_input(
model,
vocab,
testtext,
tokenizer=tokenizer,
)
#returns 0 if human, 1 if AI.
Dependencies
The main dependencies for this project include:
PyTorch SpaCy Torchtext scikit-learn pandas argparse Halo
Note: For tokenization, the project uses SpaCy models. By default, it uses the multi-language model xx_ent_wiki_sm, but other models can be specified using the --tokenmodel argument. If the model is not already downloaded, you can use the --download flag to download the model.
Contributing
Contributions to the AI Detector project are welcome. Please review CONTRIBUTION.md for further instructions.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aidetector-0.0.2.tar.gz
.
File metadata
- Download URL: aidetector-0.0.2.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e582d5d206671d5051bc9d32f3a5f8cbdd879f38de91e44a6fac63cce62032b |
|
MD5 | 1375503d450a63301a4b05af81f2570c |
|
BLAKE2b-256 | f6a10dfe5c8499688333fe1fb59c730e52da81ece3af653624386a5738b8b756 |
File details
Details for the file aidetector-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: aidetector-0.0.2-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bc058ae0a443309cc72e1621ffde655b15a452fc925ab63ffd335f243b40214 |
|
MD5 | cb954bb5e88529e9bf1aa106647af3af |
|
BLAKE2b-256 | b9d320e3fae4136b417c8a765a2d03a5a64b890e861ecbb1af693152cac97d3f |