The open lecture "Modeling, Exploring and Analyzing Change: The Janus Project" by Dr. Divesh Srivastava will take place on May 9, 2025 at 10:00 (UTC +3 / Eastern European Time (EET)) in the U05-103 (Educational Building of TalTech, Ehitajate tee 5, Tallinn) lecture hall.
Abstract: Data change, all the time. The Janus project seeks to model, explore, and analyze such change, providing valuable insights into the evolving real world and the ways in which data about it are collected and used. We start by identifying technical challenges that need to be addressed to realize the Janus vision. Based on an analysis of the history of 3.5M tables on the English Wikipedia for a total of 53.8M table versions, we then illustrate the rich history of structured Wikipedia data: their creation, evolution, and deletion; indeed, each table has a life of its own. To help automatically interpret the useful knowledge harbored in the history of Wikipedia tables, we present recent results on two technical problems that help infer identity of entities and tables across changes over time: (i) matching tables, info boxes and lists within a Wikipedia page across page revisions, and (ii) identifying Natural Keys, which serves as a primary key in tables over time and consists of attributes inherent to an entity. Finally, we show how to accurately recommend schema changes to Wikipedia tables, based on rules derived from the history of past schema changes. We solve these problems at scale and make the resulting curated datasets available to the community to facilitate future research.
The Janus project is joint work with Tobias Bleifuß, Leon Bornemann, Dmitri Kalashnikov, and Felix Naumann

Speaker Bio: Divesh Srivastava is the Head of Database Research at AT&T. He is an AT&T Fellow, a Fellow of the ACM, the President of the VLDB Endowment, co-chair of the ACM Publications Board, and on the Board of Directors of the Computing Research Association. He has served as PC co-chair of many international conferences including VLDB 2024 (Industrial), SIGMOD 2021, VLDB 2020 (Industrial), SIGMOD 2020 (Industrial), and ICDE 2019. He has presented keynote talks at several international conferences, and his research interests and publications span a variety of topics in data management. He received his Ph.D. from the University of Wisconsin, Madison, USA, and his Bachelor of Technology from the Indian Institute of Technology, Bombay, India.
N.B.: Dr. Divesh Srivastava is also an opponent of Sijo Arakkal Peious's PhD thesis “Measures of Impact and Confounding – an Analysis and Experimental Comparison of Novel and Established Measures”, and will participate in the thesis defence as a member of the defence committee. The defence will take place on May 9, 2025 starting at 15:00 (UTC +3 / Eastern European Time (EET)). The defense will take place in room U05-105 (TalTech Study Building 5, Ehitajate tee 5, Tallinn) and can be also followed via Zoom.