Project Website

Trustworthy LLMs and LLM Agents

Reliable, fair, and accountable language systems at scale

This project studies how to make large language models and multi-agent LLM systems more transparent, robust, and dependable in real-world deployments.

Why this project matters

Our research on Trustworthy Large Language Models and LLM Agents tackles some of the pressing challenges in modern AI: ensuring that increasingly powerful, collaborative language systems remain reliable, fair, and accountable at scale.

While LLMs have transformed domains far beyond natural language processing, their inherent randomness, susceptibility to bias, and reliance on imperfect data introduce risks that cannot be ignored. These challenges are amplified in emerging agentic settings, where multiple LLMs interact, potentially propagating errors, reinforcing biases, and obscuring accountability.

Our project develops principled, scalable, and deployment-ready solutions grounded in algorithmic innovation, spanning sampling, ranking and top-k optimization, relevance learning, reinforcement learning, computational geometry, and query rewriting.

Through a combination of theoretical rigor, system design, and extensive empirical validation, including user studies and open-source prototypes, we aim to enable transparent, robust, and ethically grounded LLM ecosystems that can be confidently deployed in real-world applications.

Publication

Preprint 2025

A Survey on Reliability, Transparency, Accountability, and Fairness in LLM-based Multi-Agent Systems through the Responsibility Lens

Sana Ebrahimi and A. Asudeh · Preprint, 2025

Illustration for the Responsibility Lens survey on LLM-based multi-agent systems

As large language models evolve from standalone assistants into teams of interacting agents, the main challenge is no longer just capability. The harder question is how to design these systems so they remain dependable, understandable, and ethically grounded even when multiple agents critique, debate, route tasks, and aggregate partial results.

This survey makes a foundational contribution by introducing what is, to our knowledge, the first unified taxonomy of responsibility in LLM-based multi-agent systems. Rather than treating concerns such as reliability, transparency, accountability, and fairness as separate conversations, the paper brings them together into one coherent framework for evaluating how multi-agent LLM systems are actually built.

Under this responsibility lens, the survey organizes the literature into four pillars: reliability, covering accuracy, robustness, recoverability, and resilience; transparency, including explainability, interpretability, uncertainty reporting, limitations, and reproducibility; accountability, through traceability, attribution, auditability, and influence analysis; and fairness and ethics, including equity, diversity, and value alignment. Reviewing more than sixty recent systems, the paper shows where current approaches fit this framework, where they only partially address it, and where the field still lacks real foundations.

Beyond synthesizing prior work, the paper offers a cleaner conceptual lens for researchers working on agentic LLMs, safety, evaluation, and coordination. It clarifies which design choices genuinely strengthen responsible behavior, surfaces the trade-offs that remain unresolved, and points to research directions that could materially move the field forward.

Citation: Sana Ebrahimi and A. Asudeh. 2025. A Survey on Reliability, Transparency, Accountability, and Fairness in LLM-based Multi-Agent Systems through the Responsibility Lens. Preprint.

AACL 2025

An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring

Sana Ebrahimi, Mohsen Dehghankar, and Abolfazl Asudeh · Proceedings of IJCNLP-AACL, 2025

Illustration for the credibility scoring paper on adversary-resistant multi-agent LLM systems

Multi-agent LLM systems are attractive because they let multiple models collaborate, critique one another, and combine partial strengths into a stronger final answer. But that same collaboration creates a serious vulnerability: weak or adversarial agents can distort the group outcome, and the damage can become even more severe when harmful agents are numerous or strategically disruptive.

This paper addresses that problem by introducing a general adversary-resistant framework built around credibility scoring. The collaborative query-answering process is modeled as an iterative game in which agents interact over time, and each agent accumulates a credibility score based on its past contributions. Those learned scores then guide aggregation, allowing the system to weigh trustworthy agents more heavily while reducing the influence of unreliable or malicious ones.

The key contribution is not just a new scoring mechanism, but a practical way to make multi-agent cooperation more resilient. Instead of assuming that every participant should be treated equally, the system adapts to observed behavior and uses historical performance to stabilize future decisions. This creates a natural defense layer against adversarial manipulation without giving up the benefits of collaborative reasoning.

Experiments across multiple tasks and settings show that the framework can substantially reduce adversarial influence and improve robustness, even in challenging adversary-majority scenarios. More broadly, the work offers an important step toward trustworthy multi-agent LLM systems by showing that resilience can be built directly into the aggregation layer of agent interaction.

Citation: Sana Ebrahimi, Mohsen Dehghankar, and Abolfazl Asudeh. 2025. An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics.

KDD 2025

Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks

Mohsen Dehghankar and Abolfazl Asudeh · Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2025

Illustration for the input reranking paper on symmetric LLM tasks

Large language models often receive information as ordered sequences, even when the underlying task is fundamentally unordered. This mismatch becomes especially important in symmetric tasks, where the query is asked over a bag of elements and the answer should not depend on any particular ordering. In practice, however, LLMs do depend on that ordering, and when the input is large, they may overlook elements that are crucial for producing an accurate answer.

This paper turns that limitation into an opportunity. Rather than treating the input order as arbitrary, it asks how the same set of elements can be rearranged to maximize LLM performance without changing the semantics of the task. The resulting problem, LLM input reranking, is built on two key ideas: estimating how relevant each element is for answering the query, and estimating how important each input position is in shaping the model’s attention.

To solve this efficiently, the paper develops algorithms that use a helper LLM to estimate both element relevance and positional importance, avoiding strong assumptions about the query itself. This makes the method broadly applicable to tasks such as aggregate reasoning over tables and other settings where the data is naturally unordered but the model still consumes it sequentially.

Experiments on synthetic and real datasets show that the reranking approach can dramatically improve performance, bringing model accuracy to within up to 99% of the optimum upper bound. More broadly, the work shows that prompt quality is not just about wording. For many LLM tasks, the structure and order of the input itself can be a powerful optimization lever.

Citation: Mohsen Dehghankar and Abolfazl Asudeh. 2025. Rank It, Then Ask It: Input Reranking for Maximizing the Performance of LLMs on Symmetric Tasks. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2.

ICKG 2024

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, and Nick Koudas · 2024 IEEE International Conference on Knowledge Graph (ICKG), 2024

Illustration for the AXOLOTL paper on assisted self-debiasing of LLM outputs

Large language models have become remarkably capable across a wide range of tasks, but they also inherit biases from the data on which they are trained. In practice, this means that even strong models can produce outputs that are unfair, stereotyped, or systematically harmful, especially in sensitive applications. Many debiasing strategies exist, but they often rely on retraining, internal model access, or computationally expensive interventions that are difficult to apply in real deployment settings.

AXOLOTL approaches this problem from a different angle. Inspired by query rewriting, the paper introduces prompt rewriting as a lightweight and practical mechanism for reducing unfairness in LLM outputs after generation time. Rather than modifying model parameters, the framework works through public APIs and treats the LLM as a black box, making it broadly applicable across tasks and model families.

The framework follows a three-step assisted self-debiasing process: it first identifies potential bias in the model’s output, then proposes ways to resolve that bias, and finally guides the model toward producing a fairer revised response. This is a compelling design because it preserves model utility while avoiding the cost and complexity of heavyweight fine-tuning pipelines.

More broadly, AXOLOTL shows that fairness interventions do not always need to be invasive to be effective. By combining bias detection, resolution, and guided rewriting in a post-processing wrapper, the system offers a practical path toward fairer LLM behavior with low computational overhead and strong deployment potential.

Citation: Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, and Nick Koudas. 2024. AXOLOTL: Fairness through Assisted Prompt Rewriting of Large Language Model Outputs. In 2024 IEEE International Conference on Knowledge Graph (ICKG), pages 75-84.

A. Asudeh

Faculty

Sana Ebrahimi

Lead PhD Student

Nima Shahbazi

PhD Student

Mohsen Dehghankar

PhD Student

Nick Koudas

Collaborator · Professor, University of Toronto

Gautam Das

Collaborator · Distinguished University Chair Professor, University of Texas at Arlington