Keynote Speakers

We are happy to announce the keynote speakers of ICDAR 2026!


Dr. C. V. Jawahar

Prof. Dr. C. V. Jawahar

IIIT Hyderabad, India
Co-Founder, Centre for Visual Information Technology (CVIT)

The Changing Landscape of Document Understanding

Abstract

Document understanding has changed dramatically over the past two decades. Early systems were built as tightly connected pipelines combining optical character recognition, layout analysis, and rule based post processing. Over time, the field has moved toward richer representations that aim to capture not just text, but also structure, semantics, and intent. This shift is now accelerating, driven by large scale data, self-supervised learning, and foundation models that increasingly integrate vision, language, and reasoning.

This keynote reflects on how document understanding research has evolved and what has driven these changes. It revisits key shifts in perspective, from handcrafted features to learned representations, from narrowly defined tasks to more unified models, and from explicit structural modeling to implicit multimodal reasoning. Despite this progress, documents continue to pose distinctive challenges. Complex layouts, long range dependencies, multilingual content, and domain specific conventions still make document understanding resistant to simple end to end solutions.

The talk highlights emerging directions that are shaping both research and practice in the field. These include the growing influence of foundation models, new forms of supervision and data creation, and evolving evaluation practices that move beyond recognition accuracy alone. We also discuss how this is impacting the ongoing research as well as researchers in the document community. These changes also impact the under represented languages and geographies. The keynote also touches open questions and opportunities for the community, emphasizing the importance of benchmarks that enable deeper and more meaningful document intelligence.

Bio

Dr. C. V. Jawahar is a Professor at IIIT Hyderabad and the co-founder of the Centre for Visual Information Technology (CVIT). His research spans document image analysis, computer vision, and multimodal learning, with a long-standing focus on building intelligent systems that go beyond recognition to enable higher-level understanding and reasoning over visual and textual data.

Prof. Jawahar has made significant contributions to document understanding, multilingual technologies, and real-world vision systems, particularly in the context of large-scale document processing for societal applications. He has been an active contributor to the ICDAR community through research, tutorials, and leadership roles.

His work has influenced both academic research and systems operating at a population scale. He is a recipient of several national and international recognitions, including the ACM’s Outstanding Contribution to Computing Education (OCCE) and Outstanding Applied Innovation in Computing (OAIC). His current research interests include vision-language models, document intelligence, and next-generation AI systems that integrate perception with reasoning. He is passionate about applications in road safety, assistive technology, education, entertainment, and cultural heritage.


Dr. Christoph Rass

Prof. Dr. Christoph Rass

Osnabrück University

From Archives to Algorithms and Back: What Historians Need from Document Understanding

Abstract

Historians may be among the most demanding users of document analysis technologies – a point often overlooked on both sides of the aisle. The nature of the challenge, however, differs by period. Modern history is drowning in sources: typed reports, standardized forms, administrative registries running into millions of pages – often legible enough, yet far exceeding any human capacity to process. Earlier periods present the inverse problem: fewer documents, but harder to read, written in historical hands and damaged by centuries of use. What unites both is that understanding a historical document means more than recognizing characters. It requires reconstructing meaning, context, and intent; it often demands that we first recover the very categories – administrative, legal, racial – through which contemporaries made sense of their world.

This keynote offers a historian’s perspective on document understanding, drawing on nearly three decades of data-driven research. When I began building databases from Wehrmacht personnel files in 1996, „electronic data processing“ meant relational databases and statistical analysis of manually transcribed records – laborious, yet transformative. Since then, the field has passed through several technological generations: geographical information systems for modeling spatial patterns; automated extraction from archival card files; and now Large Vision Language Models capable of processing document images directly. Each transition opened new possibilities. Each also raised – difficult – questions about what we gain and what we lose when machines mediate our access to the past.

Three themes structure this reflection. First, scale: digital methods allow us to process millions of documents, changing what questions become tractable – yet scale requires new forms of judgment about what patterns mean and what they obscure. Second, representation: how do we model historical knowledge in graphs and ontologies without flattening ambiguity, or worse, reproducing the categories of those who created the documents? How do we encode uncertainty in ways that remain legible to future researchers? Third, validation: when extraction volumes exceed any possibility of comprehensive review, what role can domain expertise still play – and where does it reach its limits?

Document analysis and historical research need each other – not as a polite formula for interdisciplinary funding applications, but as a methodological necessity. Large Vision Language Models can enable „deep indexing“ of archival holdings at unprecedented scale. Whether this proves meaningful depends on sustained exchange across disciplinary boundaries. Historians bring questions, contextual knowledge, and a trained suspicion toward confident claims; computer scientists bring methods and the capacity to implement solutions that work on millions of pages. The challenge is neither simply technical nor simply interpretive. It lies in learning to ask, together, what „understanding“ a document actually requires.

Bio

Christoph Rass (Dr. rer. pol.) is Professor of Modern History and Historical Migration Research at Osnabrück University, Germany. His path toward data-driven historical research began at Saarland University in the early 1990s, where he studied history alongside information science. His dissertation (2001), a database-driven collective biography of Wehrmacht personnel records, established an approach he has pursued through several technological generations since: from relational databases and statistical social profile analysis, through geographical information systems for modeling migration and organized violence, to automated extraction from historical card files. Projects on the Osnabrück Gestapo registry and the city’s foreign resident registry developed pipelines for extracting structured data from administrative documents – typed, handwritten, and hybrid – at scale. Current work aims to extend these approaches to Large Vision Language Models. Funding has come from the German Research Foundation (DFG), the Volkswagen Foundation, and the Foundation Remembrance, Responsibility and Future. Collaborative work with the Pattern Recognition Group at TU Dortmund on few-shot information extraction was presented at ICDAR 2025.

Rass is Principal Investigator in the DFG Collaborative Research Center 1604 „Production of Migration,“ where his subproject applies digital corpus analysis to trace the historical semantics of categories such as ‚Gastarbeiterkinder‘ in educational discourse. He contributes to the Lower Saxony research consortium FuturMig (2025–2029) and serves on the board of the Institute for Migration Research and Intercultural Studies.


Dr. Tong Sun

Dr. Tong Sun

Senior Director, Adobe

The Next Frontier: From Document Understanding to Document Agency

Abstract

For decades, the ICDAR community has advanced the state of the art in document analysis—transforming static pages into machine-readable and structured representations. Today, however, rapid progress in large language models, multimodal foundation models, and agentic AI systems is reshaping the landscape of knowledge work.

Understanding alone is no longer the end goal.

This keynote introduces the concept of document agency—a new paradigm in which documents evolve from passive information containers into dynamic, interactive, and executable knowledge environments. Rather than serving merely as inputs to AI systems, documents become structured state spaces that support reasoning, planning, personalization, and tool use.

I will explore the technical foundations required for this shift, including multimodal structural representations, long-context reasoning architectures, generative interface layers, and new evaluation frameworks that measure collaborative intelligence rather than extraction accuracy.

As AI transforms the knowledge workplace, ICDAR has an opportunity to lead the next frontier: building document systems that not only understand content, but actively participate in cognition and action.

Bio

Dr. Tong Sun is Senior Director of the Document Intelligence Lab at Adobe, where she leads research at the intersection of multimodal AI, document foundation models, and agentic systems. Her work focuses on transforming documents from static files into interactive, intelligent media that power the next generation of knowledge work. Bridging cutting-edge research with large-scale product impact, she and her team at Adobe Research have advanced structured multimodal reasoning, efficient on-device models, and generative interfaces—shaping a future where document systems actively collaborate with humans in thinking, decision-making, and action.