Keynote Speeches
(1) Apache Hadoop in the Enterprise
Amr Awadallah, CTO
Cloudera, USA
Slides Download:
Apache Hadoop
in the Enterprise.pdf
Abstract
Cloudera Cloudera provides enterprises with *the* big data platform
for next generation data management and analytics. This new platform allows
companies to perform more flexible analysis on more types of data and in
greater volumes. Amr Awadallah, CTO/Founder at Cloudera, will cover the key
underlying patterns for how Hadoop is transforming the way organizations
manage and derive value from data.
Biography
Before co-founding Cloudera in 2008, Amr (@awadallah) was an
Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he
served as Vice President of Product Intelligence Engineering at Yahoo!, and
ran one of the very first organizations to use Hadoop for data analysis and
business intelligence. Amr joined Yahoo after they acquired his first
startup, VivaSmart, in July of 2000. Amr holds a Bachelor’s and Master’s
degrees in Electrical Engineering from Cairo University, Egypt, and a
Doctorate in Electrical Engineering from Stanford University.
(2) The Berkeley Data
Analytics Stack: Present and Future
Mike Franklin
UC
Berkeley, USA
Slides Download:
The Berkeley Data Analytics Stack: Present and Future.pptx
Abstract
The Berkeley AMPLab was founded on the idea that the challenges of emerging
Big Data applications requires a new approach to analytics systems.
Launching in early 2011, the project set out to rethink the traditional
analytics stack, breaking down technical and intellectual barriers that had
arisen during decades of evolutionary development. The vision of the lab is
to seamlessly integrate the three main resources available for making sense
of data at scale: Algorithms (such as machine learning and statistical
techniques), Machines (in the form of scalable clusters and elastic cloud
computing), and People (both individually as analysts and en masse, as with
crowdsourced human computation). To pursue this goal, we assembled a
research team with diverse interests across computer science, forged
relationships with domain experts on campus and elsewhere, and obtained the
support of leading industry partners and major government sponsors. The lab
is realizing its ideas through the development of a freely-available Open
Source software stack called BDAS: the Berkeley Data Analytics Stack. In the
nearly three years the lab has been in operation, we've released major
components of BDAS. Several of these components have gained significant
traction in industry and elsewhere: the Mesos cluster resource manager, the
Spark in-memory computation framework, and the Shark query processing
system. BDAS shows up prominently in many industry discussions of the future
of the Big Data analytics ecosystem - a rare degree of impact for an ongoing
academic project. Given this initial success, the lab is continuing on its
research path, moving "up the stack" to better integrate and support deep
machine learning and to make people a full-fledged resource for making sense
of Big Data.
In this talk, I'll first outline the motivation and insights behind our
research approach and describe how we have organized to address the
cross-disciplinary nature of Big Data challenges. I will then describe the
current state of BDAS with an emphasis on the key components listed above
and will address our current efforts on machine learning scalability and
ease of use, and hybrid human/computer processing. Finally I will present
our current views of how all the pieces will fit together to form a system
that can adaptively bring the right resources to bear on a given data-driven
question to meet time, cost and quality requirements throughout the
analytics lifecycle.
Biography
Michael Franklin is the Thomas M. Siebel Professor of Computer Science at UC
Berkeley, where he also serves as Director of the Algorithms, Machines and
People Lab (AMPLab). The Berkeley AMPLab is a collaboration of over 60
researchers supported by Founding Sponsors Amazon Web Services, Google, and
SAP, along with 17 other leading companies, the Darpa XData program, and an
NSF Expeditions in Computing award. The latter was announced as part of the
Obama Administration's Big Data research initiative in 2012. His research
interests include large-scale data management and analytics, data
integration, and hybrid human/computer data processing systems. He was
founder and CTO of Truviso, a real-time data analytics company acquired by
Cisco Systems in 2012. He is an ACM Fellow and two-time winner of the ACM
SIGMOD Test of Time Award (2013 and 2004). He also recently received the
Best Paper awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012"
selection, Best Demo awards at SIGMOD 2012 and VLDB 2011 and the Outstanding
Advisor Award from the Computer Science Graduate Student Association at
Berkeley. He is a committee member on the U.S. National Academy of Sciences
study on Analysis of Massive Data and a Transportation Research Board
committee on long-term data stewardship. Prof. Franklin received his Ph.D.
in Computer Science from the University of Wisconsin-Madison in 1993.
(3) Using Crowdsourcing for Data Analytics
Hector Garcia-Molina
Stanford University, USA
Slides Download:
Using
CrowdSourcing for Data Analytics.pdf
Abstract
It may sound contradictory to use humans to analyze big data, since humans
cannot process huge amounts of data, may be error prone and are relatively
slow. However, humans can do certain tasks much better than machines, e.g.,
tasks that involve image analysis or natural language.
In this talk I will discuss how humans can be judiciously used to improve
data analytics by cleansing, clustering and filtering critical data. I will
also briefly describe ongoing work at our Stanford InfoLab in this area
Biography
Hector Garcia-Molina is the Leonard Bosack and Sandra Lerner Professor in
the Departments of Computer Science and Electrical Engineering at Stanford
University, Stanford, California. He was the chairman of the Computer
Science Department from January 2001 to December 2004. From 1997 to 2001 he
was a member the President's Information Technology Advisory Committee
(PITAC). From 1979 to 1991 he was on the faculty of the Computer Science
Department at Princeton University, Princeton, New Jersey. He received a BS
in electrical engineering from the Instituto Tecnologico de Monterrey,
Mexico, in 1974. From Stanford University, Stanford, California, he received
in 1975 a MS in electrical engineering and a PhD in computer science in
1979. He holds an honorary PhD from ETH Zurich (2007). Garcia-Molina is a
Fellow of the Association for Computing Machinery and of the American
Academy of Arts and Sciences; is a member of the National Academy of
Engineering; received the 1999 ACM SIGMOD Innovations Award; is a Venture
Advisor for Onset Ventures, and is a member of the Board of Directors of
Oracle.
(4) Security
– A Big Question for Big Data
Roger Schell
University of Southern California, USA
Slides Download:
Security –
A Big
Question for Big Data.pdf
Abstract
Big data implies performing computation and database operations for massive
amounts of data, remotely from the data owner’s enterprise. Since a key
value proposition of big data is access to data from multiple and diverse
domains, security and privacy will play a very important role in big data
research and technology. The limitations of standard IT security practices
are well-known, making the ability of attackers to use software subversion
to insert malicious software into applications and operating systems a
serious and growing threat whose adverse impact is intensified by big data.
So, a big question is what security and privacy technology is
adequate for controlled assured sharing for efficient direct access
to big data. Making effective use of big data requires access from any
domain to data in that domain, or any other domain it is authorized to
access. Several decades of trusted systems developments have produced a rich
set of proven concepts for verifiable protection to substantially cope with
determined adversaries, but this technology has largely been marginalized as
“overkill” and vendors do not widely offer it. This talk will discuss
pivotal choices for big data to leverage this mature security and privacy
technology, while identifying remaining research challenges.
Biography
Dr. Roger R. Schell recently joined USC/ISI supporting their Masters of
Cyber Security degree program. He is internationally recognized for
originating several key modern security design and evaluation techniques,
and he holds patents in cryptography, authentication and trusted
workstation. For more than decade he has been co-founder and President of
Aesec Corporation, a start-up company providing verifiably secure platforms.
Previously Dr. Schell was co-founder and vice president for Gemini
Computers, Inc., where he directed development of their highly secure (what
NSA called “Class A1”) commercial product, the Gemini Multiprocessing Secure
Operating System (GEMSOS). He was also the founding Deputy Director of NSA’s
National Computer Security Center. He has been referred to as the "father"
of the Trusted Computer System Evaluation Criteria (the "Orange Book"). Dr.
Schell is a retired USAF Colonel. He received a Ph.D. in Computer Science
from the MIT, an M.S.E.E. from Washington State, and a B.S.E.E. from Montana
State. The NIST and NSA have recognized Dr. Schell with the National
Computer System Security Award. In 2012 he was inducted into the inaugural
class of the National Cyber Security Hall of Fame.
|