2016 IEEE International Confernece on Big Data

IEEE Big Data 2016 Tutorials

Tutorial 1: Large Scale Text Mining – Techniques and Applications

Ronen Feldman, Professor

Information Systems Department, School of Business Administration,
Hebrew University Mount Scopus, Jerusalem, ISRAEL 91905
Tel: 972-(0)2-588-3084
Fax: 972-(0)-2-588-1341
Email: Ronen.Feldman@huji.ac.il

Ron Bekkerman, Assistant Professor

Department of Information and Knowledge Management,Faculty of Management
University of Haifa, Mount Carmel, Haifa, ISRAEL 34988
Tel: 972-(0)-4-664-7921
Fax: 972-(0)-4-824-9194
Email: ronb@univ.haifa.ac.il

Abstract: The proliferation of documents available on the Web and on corporate intranets is driving a new wave of text mining research and applications. This massive scale of information is driving a new wave of text mining research and applications. Earlier research addressed extraction of information from relatively small collections of well-structured documents such as newswire or scientific publications. Text mining from the other corpora such as the web requires new techniques drawn from data mining, machine learning, NLP, and information retrieval. Text mining requires preprocessing document collections (text categorization, information extraction, term extraction), storage of the intermediate representations, analysis of these intermediate representations (distributional analysis e.g. word2vec, clustering, trend analysis, association rules, etc.), and visualization of the results. In this tutorial we will present the algorithms and methods used to build text mining systems. The tutorial will cover the state of the art in this rapidly growing area of research, including recent advances in unsupervised methods for extracting facts from text and methods used for web-scale mining. We will also present several real world applications of text mining. Special emphasis will be given to lessons learned from years of experience in developing real world text mining systems, including recent advances in sentiment analysis and information extraction and how to handle user generated text such as blogs and user reviews.

Tutorial 2: Trajectory Data Mining

Tutorial PPT

Zhenhui (Jessie) Li,Assistant Profssor

Penn State University
Email: jessieli@ist.psu.edu

Fei Wu,PhD student

Penn State University
Email: fxw133@psu.edu

Jiawei Han,Professor

Univ. of Illinois at Urbana-Champaign
Email: hanj@cs.uiuc.edu

Abstract: The advances in location-acquisition technologies and the prevalence of location-based services have generated massive spatial trajectory data, which represent the mobility of a diversity of moving objects, such as people, vehicles, and animals. Such trajectories offer us unprecedented information to understand moving objects and locations that could benefit a broad range of applications in business, transportation, ecology, and many more. These important applications in turn call for novel computing technologies for discovering knowledge from trajectory data.

In this tutorial, we present a comprehensive, organized, and systematic survey on methodologies and algorithms on trajectory data mining. The tutorial will first give an overview of basic definitions, applications, data collection, data pre-processing, and patterns in the field of trajectory data mining. Then we will focus on three fundamental categories of trajectory patterns: (1) periodic pattern mining; (2) moving object relationship detection based on the spatial-temporal interactions which include friend relationship, follower/leader relationship, attraction/avoidance relationship, moving-together patterns, and clusters; and (3) semantic trajectory mining using external contexts. We will explore the connections, differences, and limitations of these existing techniques. Finally, we will discuss the use of trajectories in real-world applications such as recommendation, urban computing, and crime inference. We will conclude by discussing the exciting open topics in trajectory data mining.

Tutorial 3: Large Scale Matrix Factorization

Tutorial PPT (Part I)

Tutorial PPT (Part II)

Fei Wang

Cornell University
Email:few2001@med.cornell.edu

Wei Tan

IBM T. J. Watson Research Center
Email:wtan@us.ibm.com

Abstract: Matrix factorization has been a computational tool that aroused considerable interests in recent years in various analytics problems, such as clustering, collaborative filtering and topic modeling. With the arrival of the big data era, the volume and dimensionality of the data samples have increased a lot, which makes traditional batch-mode single core memory based matrix factorization methodologies not applicable and many large scale matrix factorization technologies have emerged. This tutorial will review various kinds of matrix factorization algorithms and their large scale implementation methodologies. We will also discuss about the current challenges and future directions.

Tutorial 4: Dynamic Big Data Processing in the Web of Things: Challenges, Opportunities and Success Stories

Ljiljana Stojanovic

Fraunhofer IOSB ,Germany

Nenad Stojanovic

Nissatech, Serbia
Email: nenad.stojanovic@nissatech.com

Abstract: The Web of Things (WoT) is about involving real-world objects in the complex, Web-wide communication. WoT reuses and leverages readily available and widely popular Web protocols, standards and blueprints to make data and services offered by objects more accessible. However, WoT is generating an enormous amount of data (big data), e.g. 1 million connected devices all sending a sensor reading (e.g., temperature) every second to an IoT cloud means 86.4 billion messages per day (roughly 170 times more than all tweets posted globally that same day) and the most crucial issue is how to ensure an efficient (real-time) processing of this data, by knowing that the real-world objects generates very dynamic data streams. Indeed, the next wave of Big Data is Dynamic Big Data arising from new opportunities for ubiquitous sensing and control of smallest details in engineered and natural systems, through multitudes of heterogeneous sensors and controllers instrumenting these systems, which inherently contain dynamics in their daily operation and require its proper management in order to increase the operational effectiveness and competitiveness. This tutorial tackles the intersection of these two very emerging areas, i.e. an efficient dynamic big data processing and management in the context of Web of Things

More particularly, processing data from real-world objects requires (big) data processing a) close to Things (local reaction: "moving" services to local data), b) close to Services (global reaction: moving data to global services) and c) the two-side interaction between these two levels. In other words, the challenge is to ensure that the local processing reflects the relevant part of the global context (services should be decomposed) and the global processing can react on the dynamicity of the data collected locally (services have to be dynamically changed). This processing & communication pattern can be found in many big data use cases, starting from wearables-driven well-being/fitness scenarios till the sensor-based proactive maintenance in the complex manufacturing scenarios.

Based on the ongoing work of authors, this tutorial explains the most important challenges for realizing dynamic data processing in WoT, the business opportunities derived from such a processing architecture and explains several success stories.

Tutorial 5: Anomalous and Significant Subgraph Detection in Attributed Networks

Tutorial PPT (Part I)

Tutorial PPT (Part II)

Feng Chen
University at Albany – SUNY
Email: fchen5@albany.edu
Petko Bogdanov
University at Albany – SUNY
Email: pbogdanov@albany.edu
Daniel B. Neill
Carnegie Mellon University
Email: neill@cs.cmu.edu
Ambuj K. Singh
University of California, Santa Barbara
Email: ambuj@cs.ucsb.edu

Abstract: Detection of anomalous and significant subgraphs in attributed networks has applications in social networks, bioinformatics, disease surveillance and others. Different from vectors-space, single-vertex or whole graph versions, subgraph detection is often framed as a maximization of a score function over included node/edge attributes, where all connected or compact subgraphs are considered. Connectivity and compactness constraints ensure that subgraphs reflect changes due to localized in-network processes. The resulting problems are combinatorial in nature and, hence, require the design of efficient algorithms that scale to large real-world networks.

In this tutorial, we will present a comprehensive review of the state-of-the-art methods for anomalous and significant subgraphs detection. First, we will classify popular score functions and structure constraints commonly used in the literature. Then we will review methods for static (planar, complex, and heterogeneous) and dynamic networks. We will illustrate the basic theoretical and algorithmic ideas and discuss specific applications in all the above settings.