Tutorials
Download Presentations
TUTORIAL 1: Online Learning for
Big Data Analytics
TUTORIAL 2: Large-Scale
Click-stream and transaction log mining in practice
TUTORIAL 1
Title: Online Learning for Big Data Analytics
Summary:
Nowadays, Big data becomes a new era as science, engineering and
tech-nology are producing increasingly large data streams daily making them
in petabyte and exabyte scales. Moreover, massive data embedding human
activity are online and available to analyze and build business models for
providing personalized services in commerce. Learning from big data is a
novel topic to expand the area of machine learning. Many new learning
techniques need to be developed to increase the e_ectiveness and e_ciency of
learning the data. Among them, online learning is one of the promising
techniques, which we have deeply investigated for several years, for
learning big data.
The tutorial will investigate several important components of online
learning techniques for big data. First, a brief introduction of the basic
con-cept of big data and big data analytics will be given. The basic concept
of di_erent learning paradigms and online learning will be provided to give
a whole map of the techniques developed in this area. Second, the connection
of online learning techniques and big data will be addressed. After that,
some motivating examples will be presented to illustrate the promising of
online learning techniques. Fourth, we will present di_erent online learning
techniques for non-sparse learning models, sparse learning models,
unsupervised learning models, etc. Some hand-on demos may be given in the
tutorial.
The tutorial will conclude by summarizing and reecting back on the trends of
online learning techniques for big data which may lead to the change of the
whole area of exciting and dynamic research that is worthy of more detailed
investigation for many years to come.
Content:
1. Introduction
1.1 Basic concept of big data and big data analytics
1.2 Basic concept of online learning and its applications
2. Online Learning Algorithms
2.1 Perceptron
2.2 Online non-sparse learning
2.3 Online sparse learning
2.4 Online unsupervised learning
3. Discussion and Q & A
Short Bio.
Prof. King's Profile
Prof. King's research interests include machine learning, social computing,
web intelligence, data mining, and multimedia information processing. In
these research areas, he has over 210 technical publications in journals and
conferences. In addition, he has contributed over 20 book chapters and
edited volumes. Moreover, Prof. King has over 30 research and applied
grants. One notable patented system he has developed is the VeriGuide
System, previously known as the CUPIDE (Chinese University Plagiarism
IDentification Engine) system, which detects similar sentences and performs
readability analysis of text-based documents in both English and in Chinese
to promote academic integrity and honesty.
Prof. King is the Book Series Editor for ``Social Media and Social
Computing" with Taylor and Francis (CRC Press). He is also an Associate
Editor of the ACM Transactions on Knowledge Discovery from Data (ACM TKDD)
and a former Associate Editor of the IEEE Transactions on Neural Networks
(TNN) and IEEE Computational Intelligence Magazine (CIM). He is a member of
the Editorial Board of the Open Information Systems Journal, Journal of
Nonlinear Analysis and Applied Mathematics, and Neural Information
Processing Letters and Reviews Journal (NIP-LR). He has also served as
Special Issue Guest Editor for Neurocomputing, International Journal of
Intelligent Computing and Cybernetics (IJICC), Journal of Intelligent
Information Systems (JIIS), and International Journal of Computational
Intelligent Research (IJCIR). He is a senior member of IEEE and a member of
ACM, International Neural Network Society (INNS), and Asian Pacific Neural
Network Assembly (APNNA). Currently, he is serving the Neural Network
Technical Committee (NNTC) and the Data Mining Technical Committee under the
IEEE Computational Intelligence Society (formerly the IEEE Neural Network
Society). He is also a member of the Board of Governors of INNS and a
Vice-President and Governing Board Member of APNNA. He also serves INNS as
the Vice-President for Membership in the Board of Governors.
Prof. King is an associate dean of engineering faculty and a professor at
the Department of Computer Science and Engineering, The Chinese University
of Hong Kong. He received his B.Sc. degree in Engineering and Applied
Science from California Institute of Technology, Pasadena and his M.Sc. and
Ph.D. degree in Computer Science from the University of Southern California,
Los Angeles.
Prof. Lyu's Profile
Prof. Lyu's research interests include software reliability engineering,
distributed systems, fault-tolerant computing, web technologies, mobile
networks, digital video library, multimedia processing, and video searching
and delivery. He has participated in more than 30 industrial projects in
these areas, and helped to develop many commercial systems and software
tools. He has been frequently invited as a keynote or tutorial speaker to
conferences and workshops in U.S., Europe, and Asia.
Prof. Lyu has published over 400 refereed journal and conference papers in
his research areas. He initiated the first International Symposium on
Software Reliability Engineering (ISSRE) in 1990. He was the Program Chair
for ISSRE'96, Program co-Chair for WWW10, General Chair for ISSRE'2001,
General co-Chair for PRDC'2005, and has served in program committees for
many conferences. He is the editor for two book volumes: Software Fault
Tolerance, published by Wiley in 1995 and the Handbook of Software
Reliability Engineering, published by IEEE and McGraw-Hill in 1996. These
books have received an overwhelming response from both the academia and the
industry. He was an Associate Editor of IEEE Transactions on Reliability,
IEEE Transactions on Knowledge and Data Engineering, and Journal of
Information Science and Engineering. He is currently on the editorial board
of Wiley Software Testing, Verification and Reliability Journal. He was
elected to IEEE Fellow (2004) and AAAS Fellow (2007) for his contributions
to software reliability engineering and software fault tolerance. He was
also named Croucher Senior Research Fellow in 2008 and IEEE Reliability
Society Engineer of the Year in 2010.
Prof. Lyu is currently a Professor in the Computer Science and Engineering
department of the Chinese University of Hong Kong. He received his B.S. in
Electrical Engineering from National Taiwan University, his M.S. in Computer
Science from University of California, Santa Barbara, and his Ph.D. in
Computer Science from University of California, Los Angeles.
Dr. Yang's Profile
Dr. Haiqin Yang's research interests include machine learning, data mining,
and financial engineering. In these areas, he has over 30 technical
publications in journals (JMLR, IEEE TNN, Neurocomputing, IEEE BME, IEEE
SMC) and conferences (ICML, CIKM, IJCNN, ICONIP, etc.). In addition, he has
written two books, four book chapters, and granted seven patents. He has
served as a reviewer for many journals and in program committees for many
conferences, e.g., CIKM, ACML, and IEEE BigData 2013, IEEE BDSE 2013. He
also received many awards, including the ``First Prize" postgraduate paper
award in the IEEE Hong Kong Section 2010, PCCW Foundation Scholarship, and
The Global Scholarship Programme for Research Excellence. Dr.~Yang is
currently a Postdoctoral Fellow in The Chinese University of Hong Kong. He
received his B.S. degree in the Computer Science and Technology in Nanjing
University, his M.Phil. and Ph.D. degree in Computer Science and Engineering
from The Chinese University of Hong Kong.
TUTORIAL 2:
Title: Large-Scale Click-stream and transaction log mining
in practice
Summary:
This tutorial will summarize state-of-the-art approaches in the growing area
of large scale click-stream mining. It will give an opportunity to data
scientists, researchers and engineers with diverse backgrounds to
familiarize themselves with practical platforms, approaches and tools for
extracting actionable insights and building products from big and diverse
data sources. The organizers will accomplish this goal using three real-life
stories from the field (large scale data initiatives at eBay – one of the
world’s largest e-commerce platforms). The tutorial will feature transaction
mining, behavior log mining and time-series mining. We will talk about
building robust recommendation systems over map reduce clusters (query
suggestions, shipping fee recommendations). Talk will also include topics
like user bias removal from data, using heuristics to make intractable
algorithms practical and appropriate de-noising and normalization of diverse
data-sets. Audience is expected to be familiar with map-reduce (preferably
Hadoop). Audience is also expected to be working or grappling with data
problems. Some basic background in algorithms, statistics would be
beneficial.
Content:
We will present the tutorial through real applications built at eBay. We
will present three case studies.
• Shipping Recommendation System
• Mining large-scale temporal dynamics with Hadoop
• Query Suggestions at scale with Hadoop:
Short Bio.
Uwe Mayer (http://labs.ebay.com/people/uwe-mayer/)
Prior to joining eBay, Uwe Mayer was a senior research scientist at Yahoo,
and was a director of Analytic Science at FICO. He has been a professor of
mathematics at universities in both the U.S. and in Germany.
Uwe received his MA and PhD in mathematics from the University of Utah where
he was a Fulbright scholar, with an extended research stay at the Institute
for Advanced Studies at Princeton. He carried out his undergraduate studies
with a double major in Mathematics and Computer Sciences in Germany.
Bringing his academic career full circle from computer sciences to
mathematics back to computers, Uwe also has co-advised a PhD student in data
mining at the University of California, San Diego, and has published in
several data mining/machine learning conferences including KDD.
Nish Parikh (http://labs.ebay.com/people/nish-parikh/)
Nish Parikh joined eBay Research Labs in February 2008 and currently is the
Head of Data Sciences Research. At eBay Research Labs, he leads efforts on
query analysis, recommender systems and large-scale data processing from a
data science perspective. Prior to joining eBay Research Labs he was part of
the team that launched eBay's Next Generation Search Engine Voyager which
supported near real-time indexing of products and served billions of search
queries every week. Prior to joining eBay, Nish received an M.S. in Computer
Science from University of Southern California and a B.S. in Electrical
Engineering from Gujarat University where he was awarded a gold medal for
academic excellence. Nish has published in premier conferences such as
SIGIR, KDD, WWW, CIKM and WSDM. In addition to the research community
engagement, Nish is a frequent speaker in industry and big data forums such
as the Hadoop Summit and XLDB.
Gyanit Singh (http://labs.ebay.com/people/gyanit-singh/)
Gyanit Singh is a Research Scientist at eBay Research Labs. His research
interests are in large scale data mining, query log mining and large scale
data platforms. At eBay he has worked on problems like query suggestion and
recovery from null search. He has also worked on in house Map-Reduce data
platform called Mobius. Prior to joining eBay, Gyanit completed his masters
in Computer Science from university of Washington, Seattle. Before that he
was at Indian institute of Technology, Delhi pursuing his bachelors in
Computer Science. Gyanit has published in premier conferences such as SIGIR,
WWW, APPROX-RANDOM and WSDM. In addition to the research community
engagement, Gyanit is a frequent speaker in industry and big data forums
such as the Hadoop Summit and Hadoop World, ACM Data Mining Camp, Bay Area
Search Forum.
|