Skip to main content

A custom model for sentiment analysis

Project description

Oracle-n Model

Oracle-n is an AI model based on the BERT architecture, designed for text and sentiment analysis. Using the foundational strengths of BERT, Oracle-n is designed to to optimize performance for specific needs. This repository includes the model code, tokenizer, and training scripts.

Features

  • Customized BERT Configuration: Tailored configurations to fit text and sentiment analysis tasks.
  • Oracle-n Tokenizer: Custom tokenizer designed for preprocessing text data efficiently.
  • Sentiment Analysis: Trained on the IMDb dataset to perform sentiment analysis tasks.

Directory Structure

  • aclImdb/: Directory containing the IMDb dataset files in Parquet format.
  • dataset.py: Script for handling the dataset loading and preprocessing.
  • logs/: Directory for TensorBoard logs during training.
  • oracle-n-model/: Directory containing the saved model.
  • oracle-n-tokenizer/: Directory containing the saved tokenizer.
  • scripts/: Directory containing additional scripts for training and evaluation.
  • .gitignore: Git ignore file to exclude unnecessary files from the repository.
  • requirements.txt: File listing the dependencies required for the project.

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/hilarl/oracle-n.git
    cd oracle-n
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download the dataset: Ensure you have the IMDb dataset files in the aclImdb/ directory. If needed, you can download them from the IMDb dataset page and convert them to Parquet format.

Usage

Training the Model

  1. Prepare the Dataset: Ensure the dataset files are in the aclImdb/ directory in Parquet format.

  2. Run the Training Script:

    python scripts/train_model.py
    
  3. Monitor Training with TensorBoard:

    tensorboard --logdir logs
    

Evaluating the Model

  1. Run the Evaluation Script:
    python scripts/evaluate_model.py
    

Customizing the Model

  1. Modify the Configuration: Edit the dataset.py script to change the model configuration parameters such as hidden size, number of layers, and attention heads.

  2. Add Your Own Tokenizer: Customize the tokenizer by editing the oracle-n-tokenizer directory.

Contribution

Contributions are welcome! Please fork the repository and submit a pull request.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

  • BERT: This model is based on the BERT architecture developed by Google.
  • Hugging Face: Leveraging the Hugging Face Transformers library for model development.

Contact

For any questions or suggestions, please open an issue on GitHub or contact us at hilal@tenzro.com.


This README provides an overview of the Oracle-n model, its features, and how to set up and use it. Feel free to customize it further based on your specific requirements and details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oracle-n-0.1.0.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

oracle_n-0.1.0-py3-none-any.whl (2.4 kB view details)

Uploaded Python 3

File details

Details for the file oracle-n-0.1.0.tar.gz.

File metadata

  • Download URL: oracle-n-0.1.0.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for oracle-n-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1547de9db4b4c7af78b505cfb460574007de07d04bc386281ed9dd0cea627096
MD5 5a0c00b225f1425151a41a7403b6058e
BLAKE2b-256 720383b79ae9fcac29bcbd972ef356bdb56a01f159c8292211817f4871802319

See more details on using hashes here.

File details

Details for the file oracle_n-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: oracle_n-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 2.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for oracle_n-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f84a0718fc667a0d0ba5d1d0b8dcbebf895e48fdf7aaa7e2dc4588ffdcf8ede
MD5 f1fa52067dd58539bd40823d9dc6ed1e
BLAKE2b-256 1291e0e93983c1d899330d4284fb6923a1f074096e54ee3e5ab057fc9f214a02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page