2013 IEEE International Conference on Big Data (IEEE BigData 2013)

Keynote Speeches

                                                                    (1) Apache Hadoop in the Enterprise
                                                                                     Amr Awadallah, CTO
                                                                                         Cloudera, USA

Slides Download: Apache Hadoop in the Enterprise.pdf

Abstract

Cloudera Cloudera provides enterprises with *the* big data platform for next generation data management and analytics. This new platform allows companies to perform more flexible analysis on more types of data and in greater volumes. Amr Awadallah, CTO/Founder at Cloudera, will cover the key underlying patterns for how Hadoop is transforming the way organizations manage and derive value from data.

Biography

Before co-founding Cloudera in 2008, Amr (@awadallah) was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after they acquired his first startup, VivaSmart, in July of 2000. Amr holds a Bachelor’s and Master’s degrees in Electrical Engineering from Cairo University, Egypt, and a Doctorate in Electrical Engineering from Stanford University.

                                                 (2) The Berkeley Data Analytics Stack: Present and Future
                                                                                      Mike Franklin
                                                                                  UC Berkeley, USA

Slides Download: The Berkeley Data Analytics Stack: Present and Future.pptx

Abstract

The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications requires a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and en masse, as with crowdsourced human computation). To pursue this goal, we assembled a research team with diverse interests across computer science, forged relationships with domain experts on campus and elsewhere, and obtained the support of leading industry partners and major government sponsors. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly three years the lab has been in operation, we've released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark in-memory computation framework, and the Shark query processing system. BDAS shows up prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support deep machine learning and to make people a full-fledged resource for making sense of Big Data.
In this talk, I'll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe the current state of BDAS with an emphasis on the key components listed above and will address our current efforts on machine learning scalability and ease of use, and hybrid human/computer processing. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.

Biography

Michael Franklin is the Thomas M. Siebel Professor of Computer Science at UC Berkeley, where he also serves as Director of the Algorithms, Machines and People Lab (AMPLab). The Berkeley AMPLab is a collaboration of over 60 researchers supported by Founding Sponsors Amazon Web Services, Google, and SAP, along with 17 other leading companies, the Darpa XData program, and an NSF Expeditions in Computing award. The latter was announced as part of the Obama Administration's Big Data research initiative in 2012. His research interests include large-scale data management and analytics, data integration, and hybrid human/computer data processing systems. He was founder and CTO of Truviso, a real-time data analytics company acquired by Cisco Systems in 2012. He is an ACM Fellow and two-time winner of the ACM SIGMOD Test of Time Award (2013 and 2004). He also recently received the Best Paper awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012" selection, Best Demo awards at SIGMOD 2012 and VLDB 2011 and the Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data and a Transportation Research Board committee on long-term data stewardship. Prof. Franklin received his Ph.D. in Computer Science from the University of Wisconsin-Madison in 1993.

                                                                (3) Using Crowdsourcing for Data Analytics
                                                                                    Hector Garcia-Molina
                                                                                 Stanford University, USA

Slides Download: Using CrowdSourcing for Data Analytics.pdf

Abstract

It may sound contradictory to use humans to analyze big data, since humans cannot process huge amounts of data, may be error prone and are relatively slow. However, humans can do certain tasks much better than machines, e.g., tasks that involve image analysis or natural language.

In this talk I will discuss how humans can be judiciously used to improve data analytics by cleansing, clustering and filtering critical data. I will also briefly describe ongoing work at our Stanford InfoLab in this area

Biography

Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in the Departments of Computer Science and Electrical Engineering at Stanford University, Stanford, California. He was the chairman of the Computer Science Department from January 2001 to December 2004. From 1997 to 2001 he was a member the President's Information Technology Advisory Committee (PITAC). From 1979 to 1991 he was on the faculty of the Computer Science Department at Princeton University, Princeton, New Jersey. He received a BS in electrical engineering from the Instituto Tecnologico de Monterrey, Mexico, in 1974. From Stanford University, Stanford, California, he received in 1975 a MS in electrical engineering and a PhD in computer science in 1979. He holds an honorary PhD from ETH Zurich (2007). Garcia-Molina is a Fellow of the Association for Computing Machinery and of the American Academy of Arts and Sciences; is a member of the National Academy of Engineering; received the 1999 ACM SIGMOD Innovations Award; is a Venture Advisor for Onset Ventures, and is a member of the Board of Directors of Oracle.

                                                                (4) Security – A Big Question for Big Data
                                                                                        Roger Schell
                                                                          University of Southern California, USA

Slides Download: Security – A Big Question for Big Data.pdf
Abstract

Big data implies performing computation and database operations for massive amounts of data, remotely from the data owner’s enterprise. Since a key value proposition of big data is access to data from multiple and diverse domains, security and privacy will play a very important role in big data research and technology. The limitations of standard IT security practices are well-known, making the ability of attackers to use software subversion to insert malicious software into applications and operating systems a serious and growing threat whose adverse impact is intensified by big data. So, a big question is what security and privacy technology is adequate for controlled assured sharing for efficient direct access to big data. Making effective use of big data requires access from any domain to data in that domain, or any other domain it is authorized to access. Several decades of trusted systems developments have produced a rich set of proven concepts for verifiable protection to substantially cope with determined adversaries, but this technology has largely been marginalized as “overkill” and vendors do not widely offer it. This talk will discuss pivotal choices for big data to leverage this mature security and privacy technology, while identifying remaining research challenges.

Biography

Dr. Roger R. Schell recently joined USC/ISI supporting their Masters of Cyber Security degree program. He is internationally recognized for originating several key modern security design and evaluation techniques, and he holds patents in cryptography, authentication and trusted workstation. For more than decade he has been co-founder and President of Aesec Corporation, a start-up company providing verifiably secure platforms. Previously Dr. Schell was co-founder and vice president for Gemini Computers, Inc., where he directed development of their highly secure (what NSA called “Class A1”) commercial product, the Gemini Multiprocessing Secure Operating System (GEMSOS). He was also the founding Deputy Director of NSA’s National Computer Security Center. He has been referred to as the "father" of the Trusted Computer System Evaluation Criteria (the "Orange Book"). Dr. Schell is a retired USAF Colonel. He received a Ph.D. in Computer Science from the MIT, an M.S.E.E. from Washington State, and a B.S.E.E. from Montana State. The NIST and NSA have recognized Dr. Schell with the National Computer System Security Award. In 2012 he was inducted into the inaugural class of the National Cyber Security Hall of Fame.