Abstract: Data science is an emerging discipline that offers both promise and peril. Responsible data science refers to efforts that address both the technical and societal issues in emerging data-driven technologies. How can data-driven systems reason effectively about complex dependencies and uncertainty? Furthermore, how do we understand the ethical and societal issues involved in data-driven decision-making? There is a pressing need to integrate algorithmic and statistical principles, social science theories, and basic humanist concepts so that we can think critically and constructively about the socio-technical systems we are building. In this talk, I will overview this emerging area.
Prof. Lise Getoor is a professor in the Computer Science Department at UC Santa Cruz and founding director of the Data, Discovery and Decisions (D3) Data Science Research Center at the University of California, Santa. Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She has over 250 publications, including 13 best paper awards. She is a Fellow of the Association for Artificial Intelligence, has served as an elected board member of the International Machine Learning Society and the Computing Research Association (CRA). She received her PhD from Stanford University in 2001, her MS from UC Berkeley, and her BS from UC Santa Barbara, and was a professor at the University of Maryland, College Park from 2001-2013.
Abstract: Publicly available data from open sources are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets to make them useful --- scraping, cleaning, normalizing, joining --- is tedious, error prone and has to repeated by every group. DataCommons attempts to alleviate some of this pain by synthesizing a single Knowledge Graph from many different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the DataCommons graph is open - any user can contribute data or build applications powered by the graph. In the Google DataCommons, we are jump-starting the graph with data from publicly available sources such as CDC, Census, BLS, FBI, etc. and are looking to engage with the academic community to take it further.
Dr. Ramanathan Guha is the founder and lead for DataCommons.org, a platform which synthesizes a wide range of data sets into a single knowledge graph, for use by students and researchers. He is the creator of widely used web standards such as RSS, RDF and Schema.org, and products such as Google Custom Search, and co-founder of Epinions.com and Alpiri. He is currently a Google Fellow and Vice President at Google. He has a Ph.D. in Computer Science from Stanford University, a Master of Science from University of California, Berkeley and a Bachelor of Technology in Mechanical Engineering from IIT Chennai.
Abstract: We are entering an exciting era where human intelligence is being enhanced by machine intelligence through big data fueled artificial intelligence (AI) and machine learning (ML). However, recent work shows that DNN models trained privately are vulnerable to adversarial inputs. Such adversarial inputs inject small amount of perturbations to the input data to fool machine learning models to misbehave, turning a deep neural network against itself. As new defense methods are proposed, more sophisticated attack algorithms are surfaced. This arms race has been ongoing since the rise of adversarial machine learning. This keynote provides a comprehensive analysis and characterization of the most representative attacks and their defenses. As more and more mission critical systems are incorporating machine learning and AI as an essential component in their real-world big data applications and their big data service provisioning platforms or products, understanding and ensuring the verifiable robustness of deep learning becomes a pressing challenge in the presence of adversarial attacks. This includes (1) the development of formal metrics to quantitatively evaluate and measure the robustness of a DNN prediction with respect of intentional and unintentional artifacts and deceptions, (2) the comprehensive understanding of the blind spots and the invariants in the DNN trained models and the DNN training process, and (3) the statistical measurement of trust and distrust that we can place on a deep learning algorithm to perform reliably and truthfully. In this keynote talk, I will use empirical analysis and evaluation of our cross-layer strategic teaming defense framework and techniques to illustrate the feasibility of ensuring robust deep learning.
Prof. Ling Liu is a Professor in the School of Computer Science at Georgia Institute of Technology. She directs the research programs in Distributed Data Intensive Systems Lab (DiSL), examining various aspects of large-scale data intensive systems. Prof. Liu is an internationally recognized expert in the areas of Big Data Systems and Analytics, Distributed Systems, Database and Storage Systems, Internet Computing, Privacy, Security and Trust. Prof. Liu has published over 300 international journal and conference articles, and is a recipient of the best paper award from a number of top venues, including ICDCS 2003, WWW 2004, 2005 Pat Goldberg Memorial Best Paper Award, IEEE CLOUD 2012, IEEE ICWS 2013, ACM/IEEE CCGrid 2015, IEEE Edge 2017. Prof. Liu is an elected IEEE Fellow and a recipient of IEEE Computer Society Technical Achievement Award. Prof. Liu has served as general chair and PC chairs of numerous IEEE and ACM conferences in the fields of big data, cloud computing, data engineering, distributed computing, very large databases, World Wide Web, and served as the editor in chief of IEEE Transactions on Services Computing from 2013-2016. Currently Prof. Liu is co-PC chair of The Web 2019 (WWW 2019) and the Editor in Chief of ACM Transactions on Internet Technology (TOIT). Prof. Liu’s research is primarily sponsored by NSF, IBM and Intel.
Abstract: The past three decades have seen the development of powerful tools for modeling and computing causal relationships which may have major impact on data science. My talk will illustrate how these tools work in seven tasks:
1. Encoding causal assumptions in transparent and testable way
2. Predicting the effects of actions and policies
3. Computing counterfactuals and finding causes of effects
4. Computing direct and indirect effects (Mediation)
5. Integrating data from diverse sources.
6. Recovering from missing data
7. Discovering causal relations from data
A friendly, non technical account of these ideas is available in: "The Book of Why: the new science of cause and effect," Judea Pearl and Dana MacKenzie,(Basic Books, 2018). http://bayes.cs.ucla.edu/WHY/
Prof. Judea Pearl is Chancellor professor of computer science and statistics at UCLA, where he directs the Cognitive Systems Laboratory and conducts research in artificial intelligence, human reasoning, and philosophy of science. He has authored hundreds of researche papers and three books: Heuristics (1983), Probabilistic Reasoning (1988) and Causality (2000, 2009) which won of the London School of Economics Lakatos Award in 2002. More recently, he co-authored Causal Inference in Statistics (2016, with M. Glymour and N. Jewell) and "The Book of Why" (2018, with Dana Mackenzie) which introduces causal analysis to a general audience. Pearl is a member of the National Academy of Sciences the National Academy of Engineering, a fellow of the IEEE, the Cognitive Science Society and the Association for the Advancement of Artificial Intelligence. In 2012, he won the Technion's Harvey Prize and the ACM Alan Turing Award "for fundamental contribution to artificial intelligence through the development of a calculus for probabilistic and causal reasoning."
Abstract: Despite its great progress so far, artificial intelligence (AI) is facing a serious challenge in the availability of high-quality Big Data. In many practical applications, data are in the form of isolated islands. Efforts to integrate the data are increasingly difficult partly due to serious concerns over user privacy and data security. The problem is exacerbated by strict government regulations such as Europe's General Data Privacy Regulations (GDPR). In this talk, I will review these challenges and describe efforts to address them in recommendation systems area. In particular, I will give an overview of recent advances in federated learning and then focus on developments of “federated recommendation systems”, which aims to build high-performance recommendation systems by bridging data repositories without compromising data security and privacy.
Prof. Yang Qiang is the Chief AI Officer of WeBank, China's first internet only bank with more than 100 million customers. He is also a chair professor at Computer Science and Engineering Department at Hong Kong University of Science and Technology (HKUST). His research interests include artificial intelligence, machine learning, especially transfer learning and federated learning. He is a fellow of AAAI, ACM, IEEE, AAAS, etc., and the founding Editor in Chief of the ACM Transactions on Intelligent Systems and Technology (ACM TIST) and the founding Editor in Chief of IEEE Transactions on Big Data (IEEE TBD). He received his PhD from the University of Maryland, College Park in 1989 and has taught at the University of Waterloo and Simon Fraser University. He received the ACM SIGKDD Distinguished Service Award in 2017, AAAI Distinguished Applications Award in 2018, Best Paper Award of ACM TiiS in 2017, and the championship of ACM KDDCUP in 2004 and 2005. He is the past President of IJCAI (2017-2019) and an executive council member of AAAI.