2013 IEEE International Conference on Big Data (IEEE BigData 2013)

Tutorials

Download Presentations

TUTORIAL 1: Online Learning for Big Data Analytics
TUTORIAL 2: Large-Scale Click-stream and transaction log mining in practice

TUTORIAL 1

Title: Online Learning for Big Data Analytics

Summary:

Nowadays, Big data becomes a new era as science, engineering and tech-nology are producing increasingly large data streams daily making them in petabyte and exabyte scales. Moreover, massive data embedding human activity are online and available to analyze and build business models for providing personalized services in commerce. Learning from big data is a novel topic to expand the area of machine learning. Many new learning techniques need to be developed to increase the e_ectiveness and e_ciency of learning the data. Among them, online learning is one of the promising techniques, which we have deeply investigated for several years, for learning big data.

The tutorial will investigate several important components of online learning techniques for big data. First, a brief introduction of the basic con-cept of big data and big data analytics will be given. The basic concept of di_erent learning paradigms and online learning will be provided to give a whole map of the techniques developed in this area. Second, the connection of online learning techniques and big data will be addressed. After that, some motivating examples will be presented to illustrate the promising of online learning techniques. Fourth, we will present di_erent online learning techniques for non-sparse learning models, sparse learning models, unsupervised learning models, etc. Some hand-on demos may be given in the tutorial.

The tutorial will conclude by summarizing and reecting back on the trends of online learning techniques for big data which may lead to the change of the whole area of exciting and dynamic research that is worthy of more detailed investigation for many years to come.

Content:

1. Introduction
1.1 Basic concept of big data and big data analytics
1.2 Basic concept of online learning and its applications
2. Online Learning Algorithms
2.1 Perceptron
2.2 Online non-sparse learning
2.3 Online sparse learning
2.4 Online unsupervised learning
3. Discussion and Q & A

Short Bio.

Prof. King's Profile

Prof. King's research interests include machine learning, social computing, web intelligence, data mining, and multimedia information processing. In these research areas, he has over 210 technical publications in journals and conferences. In addition, he has contributed over 20 book chapters and edited volumes. Moreover, Prof. King has over 30 research and applied grants. One notable patented system he has developed is the VeriGuide System, previously known as the CUPIDE (Chinese University Plagiarism IDentification Engine) system, which detects similar sentences and performs readability analysis of text-based documents in both English and in Chinese to promote academic integrity and honesty.

Prof. King is the Book Series Editor for ``Social Media and Social Computing" with Taylor and Francis (CRC Press). He is also an Associate Editor of the ACM Transactions on Knowledge Discovery from Data (ACM TKDD) and a former Associate Editor of the IEEE Transactions on Neural Networks (TNN) and IEEE Computational Intelligence Magazine (CIM). He is a member of the Editorial Board of the Open Information Systems Journal, Journal of Nonlinear Analysis and Applied Mathematics, and Neural Information Processing Letters and Reviews Journal (NIP-LR). He has also served as Special Issue Guest Editor for Neurocomputing, International Journal of Intelligent Computing and Cybernetics (IJICC), Journal of Intelligent Information Systems (JIIS), and International Journal of Computational Intelligent Research (IJCIR). He is a senior member of IEEE and a member of ACM, International Neural Network Society (INNS), and Asian Pacific Neural Network Assembly (APNNA). Currently, he is serving the Neural Network Technical Committee (NNTC) and the Data Mining Technical Committee under the IEEE Computational Intelligence Society (formerly the IEEE Neural Network Society). He is also a member of the Board of Governors of INNS and a Vice-President and Governing Board Member of APNNA. He also serves INNS as the Vice-President for Membership in the Board of Governors.

Prof. King is an associate dean of engineering faculty and a professor at the Department of Computer Science and Engineering, The Chinese University of Hong Kong. He received his B.Sc. degree in Engineering and Applied Science from California Institute of Technology, Pasadena and his M.Sc. and Ph.D. degree in Computer Science from the University of Southern California, Los Angeles.

Prof. Lyu's Profile
Prof. Lyu's research interests include software reliability engineering, distributed systems, fault-tolerant computing, web technologies, mobile networks, digital video library, multimedia processing, and video searching and delivery. He has participated in more than 30 industrial projects in these areas, and helped to develop many commercial systems and software tools. He has been frequently invited as a keynote or tutorial speaker to conferences and workshops in U.S., Europe, and Asia.

Prof. Lyu has published over 400 refereed journal and conference papers in his research areas. He initiated the first International Symposium on Software Reliability Engineering (ISSRE) in 1990. He was the Program Chair for ISSRE'96, Program co-Chair for WWW10, General Chair for ISSRE'2001, General co-Chair for PRDC'2005, and has served in program committees for many conferences. He is the editor for two book volumes: Software Fault Tolerance, published by Wiley in 1995 and the Handbook of Software Reliability Engineering, published by IEEE and McGraw-Hill in 1996. These books have received an overwhelming response from both the academia and the industry. He was an Associate Editor of IEEE Transactions on Reliability, IEEE Transactions on Knowledge and Data Engineering, and Journal of Information Science and Engineering. He is currently on the editorial board of Wiley Software Testing, Verification and Reliability Journal. He was elected to IEEE Fellow (2004) and AAAS Fellow (2007) for his contributions to software reliability engineering and software fault tolerance. He was also named Croucher Senior Research Fellow in 2008 and IEEE Reliability Society Engineer of the Year in 2010.

Prof. Lyu is currently a Professor in the Computer Science and Engineering department of the Chinese University of Hong Kong. He received his B.S. in Electrical Engineering from National Taiwan University, his M.S. in Computer Science from University of California, Santa Barbara, and his Ph.D. in Computer Science from University of California, Los Angeles.

Dr. Yang's Profile
Dr. Haiqin Yang's research interests include machine learning, data mining, and financial engineering. In these areas, he has over 30 technical publications in journals (JMLR, IEEE TNN, Neurocomputing, IEEE BME, IEEE SMC) and conferences (ICML, CIKM, IJCNN, ICONIP, etc.). In addition, he has written two books, four book chapters, and granted seven patents. He has served as a reviewer for many journals and in program committees for many conferences, e.g., CIKM, ACML, and IEEE BigData 2013, IEEE BDSE 2013. He also received many awards, including the ``First Prize" postgraduate paper award in the IEEE Hong Kong Section 2010, PCCW Foundation Scholarship, and The Global Scholarship Programme for Research Excellence. Dr.~Yang is currently a Postdoctoral Fellow in The Chinese University of Hong Kong. He received his B.S. degree in the Computer Science and Technology in Nanjing University, his M.Phil. and Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong.

TUTORIAL 2:

Title: Large-Scale Click-stream and transaction log mining in practice

Summary:

This tutorial will summarize state-of-the-art approaches in the growing area of large scale click-stream mining. It will give an opportunity to data scientists, researchers and engineers with diverse backgrounds to familiarize themselves with practical platforms, approaches and tools for extracting actionable insights and building products from big and diverse data sources. The organizers will accomplish this goal using three real-life stories from the field (large scale data initiatives at eBay – one of the world’s largest e-commerce platforms). The tutorial will feature transaction mining, behavior log mining and time-series mining. We will talk about building robust recommendation systems over map reduce clusters (query suggestions, shipping fee recommendations). Talk will also include topics like user bias removal from data, using heuristics to make intractable algorithms practical and appropriate de-noising and normalization of diverse data-sets. Audience is expected to be familiar with map-reduce (preferably Hadoop). Audience is also expected to be working or grappling with data problems. Some basic background in algorithms, statistics would be beneficial.

Content:

We will present the tutorial through real applications built at eBay. We will present three case studies.

• Shipping Recommendation System
• Mining large-scale temporal dynamics with Hadoop
• Query Suggestions at scale with Hadoop:

Short Bio.

Uwe Mayer (http://labs.ebay.com/people/uwe-mayer/)
Prior to joining eBay, Uwe Mayer was a senior research scientist at Yahoo, and was a director of Analytic Science at FICO. He has been a professor of mathematics at universities in both the U.S. and in Germany.
Uwe received his MA and PhD in mathematics from the University of Utah where he was a Fulbright scholar, with an extended research stay at the Institute for Advanced Studies at Princeton. He carried out his undergraduate studies with a double major in Mathematics and Computer Sciences in Germany. Bringing his academic career full circle from computer sciences to mathematics back to computers, Uwe also has co-advised a PhD student in data mining at the University of California, San Diego, and has published in several data mining/machine learning conferences including KDD.

Nish Parikh (http://labs.ebay.com/people/nish-parikh/)
Nish Parikh joined eBay Research Labs in February 2008 and currently is the Head of Data Sciences Research. At eBay Research Labs, he leads efforts on query analysis, recommender systems and large-scale data processing from a data science perspective. Prior to joining eBay Research Labs he was part of the team that launched eBay's Next Generation Search Engine Voyager which supported near real-time indexing of products and served billions of search queries every week. Prior to joining eBay, Nish received an M.S. in Computer Science from University of Southern California and a B.S. in Electrical Engineering from Gujarat University where he was awarded a gold medal for academic excellence. Nish has published in premier conferences such as SIGIR, KDD, WWW, CIKM and WSDM. In addition to the research community engagement, Nish is a frequent speaker in industry and big data forums such as the Hadoop Summit and XLDB.

Gyanit Singh (http://labs.ebay.com/people/gyanit-singh/)
Gyanit Singh is a Research Scientist at eBay Research Labs. His research interests are in large scale data mining, query log mining and large scale data platforms. At eBay he has worked on problems like query suggestion and recovery from null search. He has also worked on in house Map-Reduce data platform called Mobius. Prior to joining eBay, Gyanit completed his masters in Computer Science from university of Washington, Seattle. Before that he was at Indian institute of Technology, Delhi pursuing his bachelors in Computer Science. Gyanit has published in premier conferences such as SIGIR, WWW, APPROX-RANDOM and WSDM. In addition to the research community engagement, Gyanit is a frequent speaker in industry and big data forums such as the Hadoop Summit and Hadoop World, ACM Data Mining Camp, Bay Area Search Forum.