Accurate Table Question Answering with Accessible LLMs

Yangfan Jiang¹, Fei Wei², Ergute Bao³, Yaliang Li², Bolin Ding², Yin Yang¹, and Xiaokui Xiao¹
¹National University of Singapore; ²Alibaba Group; ³Mohamed bin Zayed University of Artificial Intelligence; ⁴Hamad Bin Khalifa University

Introduction

Table Question Answering (TQA) is a task that aims to provide accurate answers to natural language questions based on the content of a given (database) table. While state-of-the-art solutions often leverage powerful, proprietary Large Language Models (LLMs), their high cost and the need to send potentially sensitive data to external APIs can be significant barriers. This project focuses on making high-quality TQA accessible to a broader audience by using smaller, open-weight LLMs that can run on local machines.

Existing methods struggle when paired with these smaller LLMs because they often rely on long, complex prompts that are beyond the capabilities of these models. To address this, we introduce Orchestra, a novel multi-agent approach.

The Orchestra Framework

The core idea of Orchestra is to break down the complex TQA process into simpler, manageable sub-tasks, each handled by a specialized LLM agent. This is analogous to how an orchestra coordinates different instruments to create a harmonious piece of music. By decomposing the workflow, we significantly reduce the complexity of the prompts each agent receives, thereby improving the reliability of their outputs.

Our framework consists of three main agents:

Orchestra Two-Agent Interaction

Figure 1: An illustration of the interaction between the Logic and Query agents in the Orchestra framework. The Logic Agent determines the reasoning step, the Query Agent executes it by generating code, and the processed data is fed back into the system.

This structured, layered workflow ensures that each agent focuses on a task that is well within its capabilities. The agents interact strategically to maintain consistency and mitigate errors, leading to a more robust and accurate TQA process.

Results and Performance

We conducted extensive experiments on several TQA benchmarks. The results demonstrate that Orchestra significantly enhances the performance of open-weight LLMs. For instance, when using the Qwen-14B model, Orchestra achieved a test accuracy of 72.1% on the WikiTQ benchmark, a substantial improvement over other frameworks and approaching the performance of much larger proprietary models like GPT-4.

Performance comparison on WikiTQ and TabFact benchmarks

Figure 2: Performance comparison showing Orchestra (with decision agent) consistently outperforming the baseline (ReAcTable) and a simpler two-agent system across various LLMs on the WikiTQ and TabFact benchmarks.

Furthermore, when paired with larger open-weight models, Orchestra not only matches but often surpasses the best prior results, establishing new state-of-the-art performance on all tested benchmarks.