Hacker News

Reinforcement Learning from Human Feedback

https://arxiv.org/abs/2504.12501

95 points by onurkanbkrc - 5 comments

dang [3 hidden]5 mins ago

Related. Others?

verdverm [3 hidden]5 mins ago

Last time I saw Nathan say something about the book, he's actively working on the next version and looking for feedback, check his socials

leggerss [3 hidden]5 mins ago

You could say he's also learning from human feedback

klelatti [3 hidden]5 mins ago

Web version with links, etc:

dang [3 hidden]5 mins ago

Thanks! We've switched to that above from https://arxiv.org/abs/2504.12501, and put the latter in the toptext.