CHAT NO LIMIT
sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows(opens in a new window), rather than what the human demonstrator knows
En savoir plus sur la personne qui a créé le contenu
:background_color(white)/hotmart/product_pictures/35845579-d6b6-4a51-8bf2-8cf9b3c016ff/1_3omtqLglFqOLvgjUf2Sa5g.png?w=920)