Enhancing fact-checking in large language models: Cost-effective claim verification through first-order logic reformulation

Date

2024

Authors

Asghari, Sara

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In the realm of Large Language Models (LLMs), the ability to accurately perform Fact Checking (FC) tasks, which involves verifying complex claims against challenging evidence from multiple sources, remains a crucial yet under-explored area. Our study presents a comprehensive benchmarking of various LLMs, including GPT-4, on this critical task. We utilize a modern, challenging dataset designed explicitly for fact-checking, HOVER, which comprises thousands of evidence-claim pairs covering diverse aspects of life, history, and entertainment. This dataset differs from common datasets that evaluate the reading comprehension capabilities of LLMs, which are primarily composed of sets of question-and-answer pairs. Our findings demonstrate that GPT-4 not only decisively surpasses the current state-of-the-art (SOTA) models in FC tasks but also shows that other, open-source, LLMs (e.g. Mixtral and Llama-3) exhibit close-to-SOTA performance out-of-the-box. This implies that simply presenting these models with the evidence text and claim allows them to infer the claim’s veracity effectively. We contrast this with the existing SOTA methods, which involve complex, multi-step solutions, including the use of multiple LLMs to verify claims – a process that necessitates continuous updates and local execution, making it less accessible for regular users. Furthermore, we explore the impact of claim formulation on the FC task’s effectiveness. By converting complex claims into first-order logic (FOL) and then back into natural language, we observe improved performance in some LLMs, particularly with more challenging dataset subsets. This method, although utilizing GPT-4 for the FOL breakdown, serves as a practical guideline for users: more formally structured claims yield more reliable responses.

Description

Keywords

Fact checking, Claim, First order logic, Large language models, Prompt engineering

Citation