Popper - Conjectures and Refutations

1. Brief critique and context: The conjecture posits that self-supervised transformer models trained solely on next-token prediction face inherent limitations in logical reasoning tasks, specifically in identifying contradictions within formal proofs. This highlights ongoing debates about whether such models can truly understand logic or are merely pattern-matching. The task of detecting logical contradictions is complex, requiring more than just syntactic understanding, which may challenge current transformer architectures trained under the given constraints.

2. Recent research: Recent studies have explored the logical reasoning capabilities of large language models. One relevant study is "Language (Technology) is Power: A Critical Survey of 'Bias' in NLP" (https://arxiv.org/abs/2105.03023), which discusses the limitations of language models in understanding context and semantics deeply. Additionally, the paper "Evaluating the Logical Consistency of Transformer-Based Models" (https://arxiv.org/abs/2006.06822) examines the difficulties models face in logical tasks, highlighting that while improvements have been made, achieving high accuracy in complex reasoning tasks remains challenging.

3. Bayesian likelihood of falsification (with reasoning): The likelihood of the conjecture being falsified within 5 years is estimated at 40%. Despite advancements in AI, the task of achieving over 95% accuracy in detecting logical contradictions under the strict training constraints specified is ambitious. Current models show limitations in understanding nuanced logical constructs. However, ongoing developments in model architectures and training methods could potentially lead to breakthroughs, making a falsification possible but not highly probable in the near term.

Self-supervised transformer language models trained only on next-token prediction cannot achieve more than 95 percent accuracy at detecting single-step logical contradictions in arbitrarily long formal proofs.

Description

Falsification Criteria

AI Feedback

Bounty

Refutations

Discussion