Skip to main content

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Project description

InstructGoose

Paper: InstructGPT - Training language models to follow instructions with human feedback

Questions

  • In the context of RLHF, how to calculate the $L_t^{V F}(\theta)$,
    • Like it’s a function of the PPO agent uses to predict how much reward it gets if generates the sequence?
  • Does the RL model and the SFT model use the same tokenizer? Yes
  • I don’t know how to returns the logit of the generation model

Install

pip install instruct_goose

Resources

I used these resources to implement this

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instruct_goose-0.0.1.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

instruct_goose-0.0.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file instruct_goose-0.0.1.tar.gz.

File metadata

  • Download URL: instruct_goose-0.0.1.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for instruct_goose-0.0.1.tar.gz
Algorithm Hash digest
SHA256 94b817e5f79a9c7c560cf94e2fee25951e1a22ab18168c87d06d8fd61af7e944
MD5 f273c1b42eb3a7c113596b9fb5e3398c
BLAKE2b-256 376f9c25744e56f459c5a2f5b3012e555c3e0363c96b1b2b187ef74173569de0

See more details on using hashes here.

File details

Details for the file instruct_goose-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for instruct_goose-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a35c99bdf597da6c28dc13b083390df927789ccb6310e6deb42ec98bbe5c2a85
MD5 6253f5e9b4a394cb254ee00c92a84ba4
BLAKE2b-256 06b613ad2b9e8efe39d1f4c165c4403c575a8f47303fef451b7c5fd7a1e0c4de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page