Keynote Speakers

Dr. C. V. Jawahar

Dr. C. V. Jawahar

Professor, IIIT Hyderabad, India
Co-Founder, Centre for Visual Information Technology (CVIT)

The Changing Landscape of Document Understanding

Abstract

Document understanding has changed dramatically over the past two decades. Early systems were built as tightly connected pipelines combining optical character recognition, layout analysis, and rule based post processing. Over time, the field has moved toward richer representations that aim to capture not just text, but also structure, semantics, and intent. This shift is now accelerating, driven by large scale data, self-supervised learning, and foundation models that increasingly integrate vision, language, and reasoning.

This keynote reflects on how document understanding research has evolved and what has driven these changes. It revisits key shifts in perspective, from handcrafted features to learned representations, from narrowly defined tasks to more unified models, and from explicit structural modeling to implicit multimodal reasoning. Despite this progress, documents continue to pose distinctive challenges. Complex layouts, long range dependencies, multilingual content, and domain specific conventions still make document understanding resistant to simple end to end solutions.

The talk highlights emerging directions that are shaping both research and practice in the field. These include the growing influence of foundation models, new forms of supervision and data creation, and evolving evaluation practices that move beyond recognition accuracy alone. We also discuss how this is impacting the ongoing research as well as researchers in the document community. These changes also impact the under represented languages and geographies. The keynote also touches open questions and opportunities for the community, emphasizing the importance of benchmarks that enable deeper and more meaningful document intelligence.

Bio

Dr. C. V. Jawahar is a Professor at IIIT Hyderabad and the co-founder of the Centre for Visual Information Technology (CVIT). His research spans document image analysis, computer vision, and multimodal learning, with a long-standing focus on building intelligent systems that go beyond recognition to enable higher-level understanding and reasoning over visual and textual data.

Prof. Jawahar has made significant contributions to document understanding, multilingual technologies, and real-world vision systems, particularly in the context of large-scale document processing for societal applications. He has been an active contributor to the ICDAR community through research, tutorials, and leadership roles.

His work has influenced both academic research and systems operating at a population scale. He is a recipient of several national and international recognitions, including the ACM’s Outstanding Contribution to Computing Education (OCCE) and Outstanding Applied Innovation in Computing (OAIC). His current research interests include vision-language models, document intelligence, and next-generation AI systems that integrate perception with reasoning. He is passionate about applications in road safety, assistive technology, education, entertainment, and cultural heritage.