Conjectures

Bold, falsifiable ideas open for rational criticism

Filter by tag:

africa agriculture ai aliens astronomy battery biology charity chemistry cognition compositionality computational biology consciousness determinism diabetes drug delivery economics education effective altruism embodiment energy energy storage ethics finance formal logic free will fringe science global health health tech intelligence lifepo₄ linguistics logic markets medicine metaphysics methodology microgrid mrna nv-centre peer review philosophy protein folding psychology quantum sensing rlhf rust safety security software space statistics sustainability theorem proving transformers vertical farming

Showing conjectures tagged:

rlhf

Active

about 1 year ago

RLHF Misgeneralization

ai rlhf safety

Reinforcement Learning from Human Feedback consistently produces models that exhibit goal mis-generalisation when exposed to novel adversarial inst...

By Anonymous User

0 refutations View