Skip to main content

No project description provided

Project description

Domain Adapted Language Modeling Toolkit

This repository primarily contains code for fine-tuning a fully differential Retrieval Augmented Generation (RAG-end2end) architecture. For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llma, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

  • Inside the Training folder, you'll find two codes to train the RAG-end2end and Retriever with contrastive learning.

  • All evaluations related to the Retriever and the Generator are located in the Evaluation folder.

  • Additionally, we have data processing codes and synthetic data generation code inside the Datasets folder.

Project Setup

Create your virtual environment and install. We suggest pyenv

python -m venv .venv && source .venv/bin/activate
pip install invoke && pyenv rehash
inv install

Train Retriever Only

Train Retriever and Generator Jointly

Arcee Domain Pretrained Models - DPT (Coming Soon)

  • Arcee-DPT-PubMed-7b
  • Arcee-DPT-Patent-7b
  • Arcee-DPT-SEC-7b

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

indomain-0.0.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

indomain-0.0.0-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file indomain-0.0.0.tar.gz.

File metadata

  • Download URL: indomain-0.0.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for indomain-0.0.0.tar.gz
Algorithm Hash digest
SHA256 ebfcf7e62e2519b6853ffc77967d58d595b34129ec65d860eb02a94ad13ca3ea
MD5 d16435192f5622bdab4e4aafb6e9823a
BLAKE2b-256 5ce645cc7265ae94c8cdedb310f1bd184373b1eb5b48cbb80544d12234e56690

See more details on using hashes here.

File details

Details for the file indomain-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: indomain-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for indomain-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 582b2e9e2dff78994cbfd435d2380bf94df8301d08781a8c99c675cd980761f4
MD5 4d5815b5f526e892160571a669742a0f
BLAKE2b-256 600d11bcddf1f2e3687344aafc2d91b3cd3e49ece4abac3617aa36663d2db241

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page