A custom model for sentiment analysis
Project description
Oracle-n Model
Oracle-n is an AI model based on the BERT architecture, designed for text and sentiment analysis. Using the foundational strengths of BERT, Oracle-n is designed to to optimize performance for specific needs. This repository includes the model code, tokenizer, and training scripts.
Features
- Customized BERT Configuration: Tailored configurations to fit text and sentiment analysis tasks.
- Oracle-n Tokenizer: Custom tokenizer designed for preprocessing text data efficiently.
- Sentiment Analysis: Trained on the IMDb dataset to perform sentiment analysis tasks.
Directory Structure
aclImdb/
: Directory containing the IMDb dataset files in Parquet format.dataset.py
: Script for handling the dataset loading and preprocessing.logs/
: Directory for TensorBoard logs during training.oracle-n-model/
: Directory containing the saved model.oracle-n-tokenizer/
: Directory containing the saved tokenizer.scripts/
: Directory containing additional scripts for training and evaluation..gitignore
: Git ignore file to exclude unnecessary files from the repository.requirements.txt
: File listing the dependencies required for the project.
Setup and Installation
-
Clone the repository:
git clone https://github.com/hilarl/oracle-n.git cd oracle-n
-
Install dependencies:
pip install -r requirements.txt
-
Download the dataset: Ensure you have the IMDb dataset files in the
aclImdb/
directory. If needed, you can download them from the IMDb dataset page and convert them to Parquet format.
Usage
Training the Model
-
Prepare the Dataset: Ensure the dataset files are in the
aclImdb/
directory in Parquet format. -
Run the Training Script:
python scripts/train_model.py
-
Monitor Training with TensorBoard:
tensorboard --logdir logs
Evaluating the Model
- Run the Evaluation Script:
python scripts/evaluate_model.py
Customizing the Model
-
Modify the Configuration: Edit the
dataset.py
script to change the model configuration parameters such as hidden size, number of layers, and attention heads. -
Add Your Own Tokenizer: Customize the tokenizer by editing the
oracle-n-tokenizer
directory.
Contribution
Contributions are welcome! Please fork the repository and submit a pull request.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Acknowledgments
- BERT: This model is based on the BERT architecture developed by Google.
- Hugging Face: Leveraging the Hugging Face Transformers library for model development.
Contact
For any questions or suggestions, please open an issue on GitHub or contact us at hilal@tenzro.com.
This README provides an overview of the Oracle-n model, its features, and how to set up and use it. Feel free to customize it further based on your specific requirements and details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file oracle-n-0.1.0.tar.gz
.
File metadata
- Download URL: oracle-n-0.1.0.tar.gz
- Upload date:
- Size: 2.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1547de9db4b4c7af78b505cfb460574007de07d04bc386281ed9dd0cea627096 |
|
MD5 | 5a0c00b225f1425151a41a7403b6058e |
|
BLAKE2b-256 | 720383b79ae9fcac29bcbd972ef356bdb56a01f159c8292211817f4871802319 |
File details
Details for the file oracle_n-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: oracle_n-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f84a0718fc667a0d0ba5d1d0b8dcbebf895e48fdf7aaa7e2dc4588ffdcf8ede |
|
MD5 | f1fa52067dd58539bd40823d9dc6ed1e |
|
BLAKE2b-256 | 1291e0e93983c1d899330d4284fb6923a1f074096e54ee3e5ab057fc9f214a02 |