←

Hacker News

Reinforcement Learning from Human Feedback

41 points2 comments3 hours ago

klelatti

Web version with links, etc:

https://rlhfbook.com/

show comments