Wonderful-tuning GPT-2 from human personal tastes

We’ve fine-tuned the 774M parameter GPT-2 language style the use of human comments for more than a few duties, effectively matching the personal tastes of the exterior human labelers, regardless that the ones personal tastes didn’t at all times fit our personal. Particularly, for summarization duties the labelers most popular sentences copied wholesale from the enter (we’d handiest requested them to make sure accuracy), so our fashions discovered to replicate. Summarization required 60k human labels; more practical duties which proceed textual content in more than a few kinds required handiest 5k. Our motivation is to transport protection ways nearer to the overall activity of “machines chatting with people,” which we imagine is vital to extracting details about human values.

Wonderful-tuning GPT-2 from human personal tastes

Leave a Comment Cancel Reply

Sign up to receive email updates, fresh news and more!

Related Posts

Leave a Comment Cancel Reply