Abstract: The requirements for integration over massive, heterogeneous table repositories (aka data lakes) are fundamentally different than they are for federated data integration (where the data owned by an enterprise is integrated into a cohesive whole) or data exchange (where data is exchanged and shared among a small set of autonomous peers). In this talk, I will outline a vision for data alignment and integration in data lakes. Data lakes afford new opportunities for using new methods, from network science and other areas, to discover emergent semantics from large heterogeneous collections of data sets. I will illustrate these ideas by discussing the problem of data lake disambiguation, work which received the best paper award in EDBT 2021.
Renée J. Miller is a University Distinguished Professor of Computer Science at Northeastern University. She is a Fellow of the Royal Society of Canada and received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and is a fellow of the ACM. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her colleagues received the ICDT Test-of-Time Award and the 2020 Alonzo Church Alonzo Church Award for Outstanding Contributions to Logic and Computation for their influential work establishing the foundations of data exchange. In 2020, she received the CS-Can/Info-Can Lifetime Achievement Award in Computer Science. Professor Miller is an Editor-in-Chief of the VLDB Journal and former president of the Very Large Data Base (VLDB) Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.
Abstract: There is a data-driven revolution underway in science and society, disrupting every form of enterprise. We are collecting and storing data more rapidly than ever before. There is an increasing recognition that data science can assist in leveraging this data and the insights obtained from it into products, systems, and policies. This has resulted in the formation within academia of data science research centres, institutes and even academic units and the establishment of major initiatives within every major industrial organization. However, our understanding of data science is vague and highly varied and, in many cases, are squeezed to fit the available openings within an institution. There is a need to approach this field systematically to define its scope and its boundaries. The objective of this talk is to provide such a consistent and systematic study of the scoping of data science.
M. Tamer Özsu is a University Professor at Cheriton School of Computer Science at University of Waterloo. Previously, he was the Director of the Cheriton School and Associate Dean (Research) of the Faculty of Mathematics. His research is on data engineering aspects of data science focusing on distributed data management and the management of non-conventional data. He is a Fellow of the Royal Society of Canada, American Association for the Advancement of Science, Association for Computing Machinery and Institute of Electrical and Electronics Engineers, an elected member of Science Academy, Turkey and a member of Sigma Xi. Dr. Özsu is the recipient of the IEEE Innovation in Societal Infrastructure Award (2022), CS-Can/Info-Can Lifetime Achievement Award (2018), ACM SIGMOD Test-of-Time Award (2015), the ACM SIGMOD Contributions Award (2006), and The Ohio State University College of Engineering Distinguished Alumnus Award (2008). He is the Founding Editor-in-Chief of ACM Books (2014-2020) and the Founding Series Editor of Synthesis Lectures on Data Management (2009-2014). He serves on the editorial boards of three journals and one book series. He co-edited with Ling Liu the Encyclopedia of Data Management.
Abstract: The healthcare industry has continuously generated large amounts of data, including electronic medical records (EMRs), medical imaging, lab tests, and wearable medical monitoring device data streams. The use of big data analysis techniques in healthcare can enable a smarter healthcare system and bring a lot of positive and life-saving outcomes. However, the management and processing of healthcare data are challenging due to various factors that are inherent in the data itself such as high complexity, irregularity, sparsity and privacy etc. In this talk, I shall discuss the problems and challenges we face in designing algorithms and systems for healthcare data analytics. I shall next discuss several detailed solutions for cleaning, integrating and analyzing (multi-modal) healthcare data as parts of an end-to-end engine that has been designed to provide a holistic view of medical data for facilitating a more effective healthcare.
Meihui is currently a professor of Beijing Institute of Technology (BIT). Before joining BIT, she was an Assistant Professor at the Singapore University of Technology and Design. She obtained her PhD from the National University of Singapore. Her main research interests include Big Data Management and Analytics, Modern Database Systems, Blockchain Systems and AI. She won the 2020 VLDB Early Career Research Contribution Award and 2019 CCF-IEEE CS Young Scientist Award. She is also a co-author of the VLDB 2019 Best Paper, ICDE 2018 Best Paper Runner Up, and 2019 ACM SIGMOD Highlight Award paper. Meihui has served/serves as a Research Track Associate Editor of VLDB 2018, 2019, 2020, 2023, ACM SIGMOD 2021, SIGMOD 2023, and IEEE ICDE 2018, 2022, 2023. She is serving as a co PC-chair of VLDB 2024. She is serving as Associate Editor for IEEE Transactions on Knowledge and Data Engineering (TKDE) and Survey Track Editor of Distributed and Parallel Databases. She is a trustee of VLDB endowment.