International Symposium on Trustworthy Foundation Models

Day 2 Program (May 27, Tue, Executive Theatre)

9:00 - 9:30 am	Safety and Robustness of Large Models
	Karthik Nandakumar (Michigan State University / MBZUAI)
	Abstract: Foundation models are a valuable tool for solving tough real-world problems. However, several safety issues need to be addressed before widespread deployment of such models. In this talk, we briefly review these threats and identify key unsolved challenges. First, we will focus on adversarial attacks and defense mechanisms. Next, we consider the challenges in aligning large generative models with human values, which is critical to mitigate the risk of unintended consequences. At the same time, care must be taken to ensure that prevalent human biases do not creep into these models. Finally, we will discuss how foundation models can be collaboratively adapted for sensitive downstream tasks in a double-blind manner, which preserves both model and data confidentiality. Bio: Karthik Nandakumar is an Associate Professor in the Department of Computer Science and Engineering at Michigan State University (MSU) and an Affiliated Associate Professor in the Computer Vision department at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). Earlier, he was a Research Staff Member at IBM Research – Singapore and a Scientist at Institute for Infocomm Research, A*STAR, Singapore. His primary research interests include trustworthy machine learning, computer vision, biometric recognition, and applied cryptography. He was a Senior Area Editor of IEEE Transactions on Information Forensics and Security (T-IFS) (2019-23) and a Distinguished Industry Speaker for the IEEE Signal Processing Society (2020-21).

9:30 - 10:00 am	How Can I Publish My LLM Benchmark Without Giving the True Answers Away?
	Takashi Ishida (The University of Tokyo)
	Abstract: Publishing a large language model (LLM) benchmark on the Internet risks contaminating future LLMs: the benchmark may be unintentionally (or intentionally) used to train or select a model. A common mitigation is to keep the benchmark private and let participants submit their models or predictions to the organizers. However, this strategy will require trust in a single organization and still permits test-set overfitting through repeated queries. To overcome this issue, we propose a way to publish benchmarks without completely disclosing the ground-truth answers to the questions, while still maintaining the ability to openly evaluate LLMs. Our main idea is to inject randomness to the answers by preparing several logically correct answers, and only include one of them as the solution in the benchmark. This reduces the best possible accuracy, i.e., Bayes accuracy, of the benchmark. Not only is this helpful to keep us from disclosing the ground truth, but this approach also offers a test for detecting data contamination. In principle, even fully capable models should not surpass the Bayes accuracy. If a model surpasses this ceiling despite this expectation, this is a strong signal of data contamination. We present experimental evidence that our method can detect data contamination accurately on a wide range of benchmarks, models, and training methodologies. Bio: Takashi Ishida is a Research Scientist at RIKEN AIP, an Associate Professor at The University of Tokyo, and a part-time Research Scientist at Sakana AI. At UTokyo, he is co-running the Machine Learning and Statistical Data Analysis Lab. He earned his PhD from The University of Tokyo in 2021, advised by Prof. Masashi Sugiyama. He is interested in data-centric approaches (such as Bayes error estimation, LLM benchmarking, and test-set overfitting) and weakly supervised learning (such as learning from complementary labels).

10:00 - 10:30 am	Search-Based Correction for Reasoning Chains of Language Models
	Minsu Kim (Mila - Quebec AI Institute)
	Abstract: In this seminar, I present my recent work on enhancing the reliability of Chain-of-Thought (CoT) reasoning in language models (LMs). We interpret CoT reasoning as hierarchical latent variable inference, decomposing reasoning into high-level statements and low-level boolean variables indicating their veracity. We introduce the Search Corrector, a discrete search algorithm leveraging the LM’s joint likelihood as a proxy reward to efficiently infer veracity. The efficiency of Search Corrector arises from our hierarchical structure, where statements remain fixed, restricting the search space to a combinatorial boolean space rather than the entire natural language space. Search Corrector provides supervised signals to train an Amortized Corrector, which directly converts incorrect statements into corrected ones, significantly boosting reasoning accuracy via zero-shot correction. Our approach consistently detects reasoning errors and substantially improves final answer accuracy across logical (ProntoQA) and mathematical (GSM8K) benchmarks. Bio: Minsu Kim is a postdoctoral fellow at Mila - Quebec AI Institute and KAIST, advised by Prof. Yoshua Bengio, Prof. Sungjin Ahn, and Prof. Sungsoo Ahn. He received his Ph.D. from KAIST for his research at the intersection of combinatorial optimization and deep learning, earning KAIST's Presidential Best Dissertation Award. His current research focuses on integrating search-based methods inspired by metaheuristics and off-policy reinforcement learning (e.g., GFlowNets) to efficiently control large foundation models.

10:30 - 11:00 am

Coffee Break

11:00 - 11:30 am	Universal In-context Approximation
	Aleksandar Petrov (University of Oxford)
	Abstract: Where once new tasks required new datasets and model training, today, increasingly, we just write a prompt. This shift raised a fundamental question: how far can prompting alone take us? In this talk, I explore the formal limits and capabilities of prompting and in-context learning. Classical universal approximation considers whether a model class can approximate arbitrary functions via selecting appropriate parameters. We instead ask: can a fixed model approximate arbitrary functions by varying only the prompt? This leads us to define and study the universal in-context approximation abilities of language models. I’ll show that transformers with a single attention head can approximate any smooth function on the hypersphere via prompting alone, and how this extends to general sequence-to-sequence tasks. Surprisingly, similar results hold for recurrent models, including linear RNNs and modern state space models. These findings reveal that prompting is not just a user interface—it’s a computational mechanism with representational power on par with training, challenging earlier intuitions about its limitations and opening new questions about what models can learn in-context, and raising new concerns about their safety and security. Bio: I am a final-year PhD student at the University of Oxford, supervised by Prof. Philip Torr and Dr. Adel Bibi. My research focuses on the fundamental properties of deep learning models and how they can be harnessed to build more reliable, safe and performant machine learning systems. My recent works focused on long context compression, watermark coexistence, universal in-context approximation, multilingual tokenization fairness and robustness. Prior to Oxford, I completed my MSc at ETH Zürich, where I worked with Emilio Frazzoli’s group on robotics and applied category theory. I also previously did research internships at Motional, Adobe and Google DeepMind.

11:30 - 12:15 pm	Panel Discussion
	Member: Tomas Mikolov; Hakim Hacid; Nouha Dziri; Tongliang Liu
	Tomas Mikolov Bio: Tomas Mikolov is a co-founder at BottleCap AI. He is a world-renowned AI researcher. The creator of word2vec, which proposed how machines could understand human language, laying the foundation for modern AI applications from neural language models in 2007 and his RNNLM toolkit was the first to demonstrate the capability to train language models on large corpora, resulting in large improvements over the state of the art. He has been researching under Facebook, Google and Microsoft. Hakim Hacid Bio: Dr. Hakim Hacid is the Chief Researcher of the Artificial Intelligence and Digital Science Research Center in Technology Innovation Institute (TII), a cutting-edge UAE-based scientific research center, leading the diverse efforts around LLM's and Machine Learning. Prior to joining TII, he was an Associate Professor at Zayed University, a customer analytics head at Zain telecom, and a research department head at Bell Labs Research. He is a published author of many research articles in top journals conferences and holds several industrial patents to his credit. His research specialization includes machine learning, databases, natural language processing, security. He obtained his PhD in Data Mining/Databases and also a double master's in Computer Sc (Master by Research & Professional Master) from University of Lyon, France. Nouha Dziri Bio: Nouha Dziri is an AI research scientist at the Allen Institute for AI (Ai2). Her research investigates a wide variety of problems across NLP and AI including building state-of-the-art language models and understanding their limits and inner workings. She also works on AI safety to ensure the responsible deployment of LLMs while enhancing their reasoning capabilities. Prior to Ai2, she worked at Google DeepMind, Microsoft Research and Mila. She earned her PhD from the University of Alberta and the Alberta Machine Intelligence Institute. Her work has been published in top-tier AI venues including NeurIPS, ICML, ICLR, TACL, ACL, NAACL and EMNLP. She was recently awarded the runner-up Best Paper Award at NAACL 2025. Tongliang Liu Bio: Tongliang Liu is an Affiliated Associate Professor with the ML department with MBZUAI. He is also the Director of Sydney AI Centre at the University of Sydney. He is broadly interested in the fields of trustworthy machine learning and its interdisciplinary applications, with a particular emphasis on learning with noisy labels, adversarial learning, causal representation learning, transfer learning, unsupervised learning, and statistical deep learning theory. He has authored and co-authored more than 200 research articles including ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, AAAI, IJCAI, TPAMI, and JMLR. He is/was a senior meta reviewer for many conferences, such as NeurIPS, ICLR, AAAI, and IJCAI. He is a co-Editor-in-Chief for Neural Networks, an Associate Editor of IEEE TPAMI, IEEE TIP, TMLR, and ACM Computing Surveys, and is on the Editorial Boards of JMLR and MLJ. He is a recipient of CORE Award for Outstanding Research Contribution in 2024, the IEEE AI’s 10 to Watch Award in 2022, the Future Fellowship Award from Australian Research Council (ARC) in 2022, the Top-40 Early Achievers by The Australian in 2020, and the Discovery Early Career Researcher Award (DECRA) from ARC in 2018.

12:15 - 12:20 pm	Closing remarks
	Tongliang Liu (MBZUAI / USYD)

12:20 - 14:20 pm

Lunch break