IEEE Big Data 2018 Accepted Papers
1. Big Data Foundations
Regular Papers | | |
Paper ID | Title | Authors |
BigD357 | Linear Models with Many Cores and CPUs: A Stochastic Atomic Update Scheme | Edward Raff and Jared Sylvester |
BigD409 | Best-Choice Edge Grafting for Efficient Structure Learning of Markov Random Fields | Walid Chaabene and Bert Huang |
BigD504 | Semi-supervised Deep Representation Learning for Multi-View Problems | Vahid Noroozi, Lei Zheng, Sara Bahaadini, Sihong Xie, Weixiang Shao, and Philip S. Yu |
BigD545 | Projection-SVM: Distributed Kernel Support Vector Machine for Big Data using Subspace Partitioning | Dinesh Singh and Krishna Mohan C |
BigD564 | Detecting Latent Structure Uncertainty with Structural Entropy | So Hirai and Kenji Yamanishi |
BigD580 | Time Series Classification Using a Neural Network Ensemble | Soukaina Filali Boubrahimi and Rafal Angryk |
BigD602 | Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets | Mona Nashaat, Aindrila Ghosh, Shaikh Quader, Chad Marston, Jean-Francois Puget, and James Miller |
BigD717 | DANN: Incorporating Prior Domain Knowledge into Model Training | Nikhil Muralidhar, Mohammad Raihanul Islam, Manish Marwah, Anuj Karpatne, and Naren Ramakrishnan |
Short Papers | | |
BigD366 | Efficient Dimensionality Reduction for Sparse Binary Data | Rameshwar Pratap, Raghav Kulkarni, and Ishan Sohony |
BigD369 | Effective Outlier Detection based on Bayesian Network and Proximity | Sha Lu, Lin Liu, Jiuyong Li, and Thuc Duy Le |
BigD405 | Hash-Grams On Many-Cores and Skewed Distributions | Edward Raff and Mark McLean |
BigD442 | Securing Behavior-based Opinion Spam Detection | Shuaijun Ge, Guixiang Ma, Sihong Xie, and Philip S. Yu |
BigD451 | AdaDIF: Adaptive Diffusions for Efficient Semi-supervised Learning over Graphs | Dimitris Berberidis, Athanasios Nikolakopoulos, and Georgios B. Giannakis |
BigD482 | Source Free Domain Adaptation Using an Off-the-Shelf Classifier | Arun Reddy Nelakurthi, Ross Maciejwski, and Jingrui He |
BigD501 | Modeling Road Traffic Dynamics Using Big Data | Fan Yang, Alina Vereshchaka, and Wen Dong |
BigD536 | Scalable Bottom-up Subspace Clustering using FP-Trees for High Dimensional Data | Tuan Doan, Jianzhong Qi, Sutharshan Rajasegarar, and Christopher Leckie |
BigD565 | Biomedical Data Classification using Random Projection Ensembles | Sotiris Tasoulis, Aristidis Vrahatis, Spiros Georgakopoulos, and Vassilis Plagianakos |
BigD581 | Representation Learning for Question Classification via Topic Sparse Autoencoder and Entity Embedding | Dingcheng Li, Jingyuan Zhang, and Ping Li |
BigD699 | Scaling up Inference in MLNs with Spark | Maminur Islam, Khan Mohammad Al Farabi, Somdeb Sarkhel, and Deepak Venugopal |
BigD701 | Queryable Compression on Time-Evolving Social Networks with Streaming | Michael Nelson, Sridhar Radhakrishnan, and Chandra Sekharan |
BigD708 | Topological approaches to skin disease image analysis | Yu-Min Chung, Chuan-Shen Hu, Austin Lawson, and Clifford Smyth |
BigD730 | DeepFP: A Deep Learning Framework For User Fingerprinting via Mobile Motion Sensors | Sara Amini, Vahid Noroozi, Sara Bahaadini, Philip S. Yu, and Chris Kanich |
2. Big Data Infrastructure
Regular Papers | | |
BigD234 | An Empirical Analysis on Expressibility of Vertex Centric Graph Processing Paradigm | Siyuan Liu and Arijit Khan |
BigD239 | ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage | Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, and Suren Byna |
BigD294 | Column Cache: Buffer Cache for Columnar Storage on HDFS | Takeshi Yoshimura, Tatsuhiro Chiba, and Hiroshi Horii |
BigD336 | Online Density Estimation over Streaming Data: A Local Adaptive Solution | Zhong Chen, Zhide Fang, Jiabin zhao, Wei Fan, Andrea Edwards, and Kun Zhang |
BigD350 | Practical Cross Program Memoization with KeyChain | Craig Mustard and Alexandra Fedorova |
BigD397 | Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks | Liang Bao, Xin Liu, and Weizhao Chen |
BigD403 | Dynamic and Transparent Memory Sharing for Accelerating Big Data Analytics Workloads in Virtualized Cloud | Wenqi Cao and Ling Liu |
BigD431 | Scalable Manifold Learning for Big Data with Apache Spark | Frank Schoeneman and Jaroslaw Zola |
BigD598 | Mira: Sharing Resources for Distributed Analytics at Small Timescales | Michael Kaufmann, Kornilios Kourtis, Adrian Schuepbach, and Martina Zitterbart |
BigD671 | A Method-Level Test Generation Framework for Debugging Big Data Applications | Huadong Feng, Jaganmohan Chandrasekaran, Yu Lei, Raghu Kacker, and D. Richard Kuhn |
BigD738 | A Reinforcement Learning Based Resource Management Approach for Time-critical Workloads in Distributed Computing Environment | Zixia Liu, Hong Zhang, Bingbing Rao, and Liqiang Wang |
Short Papers | | |
BigD233 | Dilemma between Naive or Costly: Technique of Resembling Data Processing Workloads for Datacenter Flash Storage | Janki Bhimani, Rajinikanth Pandurangan, Ningfang Mi, and Vijay Balakrishnan |
BigD252 | Serverless Big Data Processing using Matrix Multiplication as Example | Sebastian Werner, Jörn Kuhlenkamp, Markus Klems, Johannes Müller, and Stefan Tai |
BigD262 | Versatile Communication Optimization for Deep Learning by Modularized Parameter Server | Po-Yen Wu, Pangfeng Liu, and Jan-Jan Wu |
BigD291 | Analyzing Alibaba's Co-located Datacenter Workloads | Yue Cheng, Ali Anwar, and Xuejing Duan |
BigD317 | Communication Model for Parallel Iterative Stream Processing | Sachini Jayasekara, Xunyun Liu, Shanika Karunasekera, and Aaron Harwood |
BigD394 | OverSketch: Approximate Matrix Multiplication for the Cloud | Vipul Gupta, Shusen Wang, Thomas Courtade, and Kannan Ramchandran |
BigD459 | POSUM: A Portfolio Scheduler for MapReduce Workloads | Maria Voinea, Alexandru Uta, and Alexandru Iosup |
BigD470 | Experimental Characterizations and Analysis of Deep Learning Frameworks | Yanzhao Wu, Wenqi Cao, Semih Sahin, and Ling Liu |
BigD563 | Scalable Distributed Top-k Join Queries in Topic-Based Pub/Sub Systems | Nikos Zacheilas, Dimitris Dedousis, and Vana Kalogeraki |
BigD604 | XOS: An Application-Defined Operating System for Data Center Servers | Chen Zheng, Lei Wang, Sally A. McKee, Jianfeng Zhan, and Lixin Zhang |
BigD617 | Culster-based Data Reduction for Persistent Homology | Anindya Moitra, Nick Malott, and Philip Wilsey |
BigD645 | GeoMatch: Efficient Large-Scale Map Matching on Apache Spark | Ayman Zeidan, Eemil Lagerspetz, Kai Zhao, Petteri Nurmi, Sasu Tarkoma, and Huy Vo |
BigD682 | Parallel DBSCAN Algorithm Using a Data Partitioning Strategy with Spark Implementation | Dianwei Han |
BigD689 | Sync-on-the-fly: A Parallel Framework for Gradient Descent Algorithms on Transient Resources | Guoyi Zhao, Lixin Gao, and David Irwin |
BigD711 | GreenDataFlow: Minimizing the Energy Footprint of Global Data Movement | Zulkar Nine, Luigi Di Tacchio, Asif Imran, Tevfik Kosar, Fatih Bulut, and Jinho Hwang |
BigD719 | ThousandSunny: A Large-Scale Neural Network Training System For Online Advertising | Quanchang Qi, Guangming Lu, Jun Zhang, Lichun Yang, and Haishan Liu |
BigD735 | Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks | Xiaoyi Lu, Dipti Shankar, Haiyang Shi, and Dhabaleswar K. (DK) Panda |
3. Big Data Management
Regular Papers | | |
BigD309 | Concept-Driven Load Shedding: Reducing Size and Error of Voluminous and Variable Data Streams | Nikos R. Katsipoulakis, Alexandros Labrinidis, and Panos K. Chrysanthis |
BigD318 | HYPE: Massive Hypergraph Partitioning with Neighborhood Expansion | Christian Mayer, Ruben Mayer, Sukanya Bhowmik, Lukas Epple, and Kurt Rothermel |
BigD358 | Accelerating a Distributed CPD Algorithm for Large Dense, Skewed Tensors | Kareem Aggour, Alex Gittens, and Bülent Yener |
BigD379 | Explaining Aggregates for Exploratory Analytics | Fotis Savva, Christos Anagnostopoulos, and Peter Triantafillou |
BigD396 | Optimizing Lossy Compression with Adjacent Snapshots for N-body Simulation Data | Sihuan Li, Sheng Di, Xin Liang, Zizhong Chen, and Franck Cappello |
BigD475 | Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets | Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello |
BigD481 | Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy | Haipei Sun, Boxiang Dong, Wendy Hui Wang, Ting Yu, and Zhan Qin |
BigD529 | Cloud based Real-Time and Low Latency Scientific Event Analysis | Chen Yang, Zhihui Du, and Xiaofeng Meng |
BigD573 | Influence Maximization in Evolving Multi-Campaign Environments | Iouliana Litou and Vana Kalogeraki |
BigD611 | Alleviating I/O Inefficiencies to Enable Effective Model Training Over Voluminous, High-Dimensional Datasets | Daniel Rammer, Walid Budgaga, Thilina Buddhika, Shrideep Pallickara, and Sangmi Lee Pallickara |
Short Papers | | |
BigD287 | Steering Top-k Influencers in Dynamic Graphs via Local Updates | Vijaya Krishna Yalavarthi and Arijit Khan |
BigD290 | Distributed Execution of Spatial SQL Queries | Konstantinos Giannousis, Konstantina Bereta, Nikolaos Karalis, and Manolis Koubarakis |
BigD325 | Efficient Processing of Probabilistic Single and Batch Reachability Queries in Large and Evolving Spatiotemporal Contact Networks | Zohreh Raghebi and Farnoush Banaei-Kashani |
BigD351 | FairGAN: Fairness-aware Generative Adversarial Networks | Depeng Xu, Shuhan Yuan, Lu Zhang, and Xintao Wu |
BigD365 | Aggregation of Linked Data: a case study in the cultural heritage domain | Nuno Freire, Enno Meijers, René Voorburg, Roland Cornelissen, Antoine Isaac, and Sjors de Valk |
BigD381 | Integrated Real-Time Data Stream Analysis and Sketch-Based Video Retrieval in Team Sports | Lukas Probst, Fabian Rauschenbach, Heiko Schuldt, Philipp Seidenschwarz, and Martin Rumo |
BigD384 | A Universal Namespace Approach to Support Metadata Management and Efficient Data Convergence of HPC and Cloud Scientific Workflows | Hsing-bung Chen |
BigD473 | FastTopK: A Fast Top-K Trajectory Similarity Query Processing Algorithm for GPUs | Hamza Mustafa, Eleazar Leal, and Le Gruenwald |
BigD621 | Optimized Storing of Workflow Outputs through Mining Association Rules | Debasish Chakroborti, Manishankar Mondal, Banani Roy, Chanchal K. Roy, and Kevin A. Schneider |
BigD627 | A Survey on Trajectory Data Management for Hybrid Transactional and Analytical Workloads | Keven Richly |
BigD680 | Aion: It's Never too Late in Event-Time Streams | Sérgio Esteves, Gianmarco De Francisci Morales, Rodrigo Rodrigues, Marco Serafini, and Luís Veiga |
BigD741 | Dynamic Online Performance Optimization in Streaming Data Compression | Kade Gibson, Dongeun Lee, Jaesik Choi, and Alexander Sim |
4. Big Data Search and Mining
Regular Papers | | |
BigD236 | Benchmarking API Costs of Network Sampling Strategies | Michele Coscia and Luca Rossi |
BigD240 | Using Smart Card Data to Model Commuters' Response Upon Unexpected Train Delays | Xiancai Tian and Baihua Zheng |
BigD302 | Optimal k-Nearest-Neighbor Query Processing via Multiple Lower Bound Approximations | Christian Beecks and Max Berrendorf |
BigD305 | Differentially Private Semi-Supervised Learning With Known Class Priors | Anh Pham and Jing Xi |
BigD307 | Revisiting Exact kNN Query Processing with Probabilistic Data Space Transformations | Atoshum Samuel Cahsai, Christos Anagnostopoulos, Nikos Ntarmos, and Peter Triantafillou |
BigD328 | Scalable Construction of Text Indexes with Thrill | Timo Bingmann, Simon Gog, and Florian Kurpicz |
BigD334 | AURORA: Auditing PageRank on Large Graphs | Jian Kang, Meijia Wang, Nan Cao, Yinglong Xia, Wei Fan, and Hanghang Tong |
BigD356 | Adaptive Data Pruning for Support Vector Machines | Yasuhiro Fujiwara, Junya Arai, Sekitoshi Kanai, Yasutoshi Ida, and Naonori Ueda |
BigD364 | An Efficient System for Subgraph Discovery | Aparna Joshi, Yu Zhang, Petko Bogdanov, and Jeong-Hyon Hwang |
BigD414 | ImVerde: Vertex-Diminished Random Walk for Learning Imbalanced Network Representation | Jun Wu, Jingrui He, and Yongming Liu |
BigD423 | Efficient Discovery of Weighted Frequent Itemsets in Very Large Transactional Databases: A Re-visit | RAGE UDAY KIRAN |
BigD425 | On Learning Psycholinguistics Tools for English-based Creole Languages using Social Media Data | Pei-Chi LO and Ee-Peng LIM |
BigD437 | Automated Extraction of Personal Knowledge from Smartphone Push Notifications | Yuanchun Li, Ziyue Yang, Yao Guo, Xiangqun Chen, Yuvraj Agarwal, and Jason Hong |
BigD440 | Semi-supervised Multi-instance Learning for Flu Shot Adverse Event Detection | Junxiang Wang, Liang Zhao, and Yanfang Ye |
BigD457 | Candidate List Maintenance in High Utility Sequential Pattern Mining | Scott Buffett |
BigD469 | ParIS: The Next Destination for Fast Data Series Indexing and Query Answering | Botao Peng, Themis Palpanas, and Panagiota Fatourou |
BigD477 | One-Shot Learning on Attributed Sequences | Zhongfang Zhuang, Xiangnan Kong, Elke Rundensteiner, Aditya Arora, and Jihane Zouaoui |
BigD483 | A Data-Centric Approach for Image Scene Localization | Abdullah Alfarrarjeh, Seon Ho Kim, Shivnesh Rajan, Akshay Deshmukh, and Cyrus Shahabi |
BigD498 | Learning Multiclassifiers with Predictive Features Varied along with Data Distribution | Xuan-Hong Dang, Omid Askarisichani, and Ambuj K. Singh |
BigD508 | FauxBuster: A Content-free Fauxtography Detector Using Social Media Comments | Daniel Zhang, Lanyu Shang, Biao Geng, Shuyue Lai, Ke Li, Hongmin Zhu, Md Tanvir Amin, and Dong Wang |
BigD509 | Lifelong Memory Networks with Knowledge Learning from Big Data for Aspect Sentiment Classification | Shuai Wang, Guangyi Lv, Sahisnu Mazumder, Geli Fei, and Bing Liu |
BigD542 | Hot Spot Analysis for Big Trajectory Data | Panagiotis Nikitopoulos, Aris-Iakovos Paraskevopoulos, Christos Doulkeridis, Nikos Pelekis, and Yannis Theodoridis |
BigD566 | PER: A Probabilistic Attentional Model for Personalized Text Recommendations | Lei Zheng, Yixue Wang, Lifang He, Sihong Xie, Fengjiao Wang, and Philip S. Yu |
BigD569 | Fast and Accurate Mining of Node Importance \\ in Trajectory Networks | Tilemachos Pechlivanoglou and Manos Papagelis |
BigD571 | Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems | Michal Siedlaczek, Qi Wang, Yen-Yu Chen, and Torsten Suel |
BigD600 | BigSR: real-time expressive RDF stream reasoning on modern Big Data platforms | Xiangnan Ren, Olivier Curé, Hubert Naacke, and Guohui Xiao |
BigD624 | Constructing Influence Trees from Temporal Sequence of Retweets: An Analytical Approach | Ayan Kumar Bhowmick, G. Sai Bharath Chandra, Yogesh Singh, and Bivas Mitra |
BigD626 | StreamGuard: A Bayesian Network Approach to Copyright Infringement Detection Problem in Large-scale Live Video Sharing Systems | Daniel Zhang, Lixing Song, Qi Li, Yang Zhang, and Dong Wang |
BigD658 | Influence Maximization in Social Networks With Non-Target Constraints | Madhavan Padmanabhan, Naresh Somisetty, Samik Basu, and A Pavan |
BigD662 | A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators | Damien Graux, Louis Jachiet, Pierre Genevès, and Nabil Layaïda |
BigD665 | Mining top-k Popular Datasets via a Deep Generative Model | Uchenna Akujuobi, Ke Sun, and Xiangliang Zhang |
BigD722 | Fusion of Terrain Information and Mobile Phone Location Data for Flood Area Detection in Rural Areas | Takahiro Yabe, Kota Tsubouchi, and Yoshihide Sekimoto |
BigD740 | Fast Clustering with Flexible Balance Constraints | Hongfu Liu, Ziming Huang, Yun Fu, Qi Chen, Mingqin Li, and Lintao Zhang |
BigD753 | Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training | Yao Wan, Wenqiang Yan, Jianwei Gao, Zhou Zhao, Jian Wu, and Philip S. Yu |
BigD758 | A Sketch-Based Naive Bayes Algorithms for Evolving Data Streams | Maroua Bahri, Silviu Maniu, and Albert Bifet |
Short Papers | | |
BigD276 | Efficient Principal Subspace Projection of Streaming Data Through Fast Similarity Matching | Andrea Giovannucci, Victor Minden, Cengiz Pehlevan, and Dmitri Chklovskii |
BigD293 | Identifying Pros and Cons of Product Aspects Based on Customer Reviews | Ebad Ahmadzadeh and Philip Chan |
BigD301 | Dynamic Network Embeddings: From Random Walks to Temporal Random Walks | Giang Nguyen, John Boaz Lee, Ryan Rossi, Nesreen Ahmed, Eunyee Koh, and Sungchul Kim |
BigD323 | Efficient Triangles Estimation in Network Streams Based on Edge Sampling | Roohollah Etemadi and Jianguo Lu |
BigD330 | StageMap: Extracting and Summarizing Progression Stages in Event Sequences | Yuanzhe Chen, Abishek Puri, Linping Yuan, and Huamin Qu |
BigD339 | Speed Accuracy Trade-off for Pedestrian and Vehicle Detection using Localized Big Data | Yeongro Yun, Youngseok Park, Chanhee Woo, and Sejoon Lim |
BigD345 | The content correlation of multiple streaming edges | Michel de Rougemont and Guillaume Vimont |
BigD383 | Learning Fast and Slow - A Unified Batch/Stream Framework | Jacob Montiel, Albert Bifet, Viktor Losing, Jesse Read, and Talel Abdessalem |
BigD385 | Top-N-Rank: A Truncated List-wise Ranking Approach for Large-scale Top-N Recommendation | Junjie Liang, Jinlong Hu, Shoubin Dong, and Vasant Honavar |
BigD386 | Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing | Yangdi Lu, Yang Bo, Wenbo He, and Amir Nabatchian |
BigD390 | Monitoring the shape of weather, soundscapes, and dynamical systems: a new statistic for dimension-driven data analysis on large data sets | Henry Kvinge, Elin Farnell, Michael Kirby, and Chris Peterson |
BigD393 | Clustering-Driven and Dynamically Diversified Ensemble for Drifting Data Streams | Lukasz Korycki and Bartosz Krawczyk |
BigD395 | Predicting computational reproducibility of data analysis pipelines in large population studies using collaborative filtering | Soudabeh Barghi, Lalet Scaria, Ali Salari, and Tristan Glatard |
BigD436 | Detecting Highly Overlapping Community Structure by Model-based Maximal Clique Expansion | Said Jabbour, Nizar Mhadhbi, Badran Raddaoui, and Lakhdar Sais |
BigD444 | Improving Query Execution Performance in Big Data using Cuckoo Filter | Sharafat Ibn Mollah Mosharraf and Muhammad Abdullah Adnan |
BigD461 | CAM: A Combined Attention Model for Natural Language Inference | Amit Gajbhiye, Sardar Jaf, Noura Al Moubayed, Steven Bradley, and A. Stephen McGough |
BigD480 | Local Partition in Rich Graphs | Scott Freitas, Nan Cao, Yinglong Xia, Duen Horng Chau, and Hanghang Tong |
BigD484 | All-in-One Urban Mobility Mapping Application with Optional Routing Capabilities | Rebekah Thompson, Jose Stovall, Daniel Velasquez, Viswa Sri Rupa Anne, Alex Samoylov, and Mina Sartipi |
BigD497 | Multi-Attribute Topic Feature Construction for Social Media-based Prediction | Alex Morales, Nupoor Gandhi, Man-pui Sally Chan, Sophie Lohmann, Travis Sanchez, Lyle Ungar, Dolores Albaracin, and Chengxiang Zhai |
BigD499 | Context-Aware Deep Sequence Learning with Multi-View Factor Pooling for Time Series Classification | Sreyasee Das Bhattacharjee, William J. Tolone, Ashish Mahabal, Mohammed Elshambakey, Isaac Cho, and S. G. Djorgovski |
BigD530 | DLA: a Distributed, Location-based and Apriori-based Algorithm for Biological Sequence Pattern Mining | Eirini Stamoulakatou, Andrea Gulino, and Pietro Pinoli |
BigD533 | Motif-Preserving Dynamic Local Graph Cut | Dawei Zhou, Jingrui He, Hasan Davulcu, and Ross Maciejewski |
BigD537 | AnySC: Anytime Set-wise Classification of Variable Speed Data Streams | Jagat Sesh Challa, Poonam Goyal, Vijay M Giri, Dhananjay Mantri, and Navneet Goyal |
BigD576 | Pseudo-Inverse Linear Discriminants for Highly Imbalanced Big Datasets | Daqi Gao, Jingguang Zhang, and Jiamin Song |
BigD583 | Correlated Anomaly Detection from Large Streaming Data | Zheng Chen, Xinli Yu, Yuan Lin, Xiaohua Hu, and Erjia Yan |
BigD605 | TSM2: Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs | Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen |
BigD614 | SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes | Edmon Begoli, Kris Brown, Sudarshan Srinivas, and Suzanne Tamang |
BigD633 | Spatio-Temporal Attention based recurrent neural network for next poi prediction | Basmah Altaf, Lu Yu, and Xiangliang Zhang |
BigD644 | An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation | Utsav Sheth, Sanghamitra Dutta, Malhar Chaudhari, Haewon Jeong, Yaoqing Yang, Jukka Kohonen, Teemu Roos, and Pulkit Grover |
BigD666 | PACAS: Privacy-Aware, Data Cleaning-as-a-Service | Yu Huang, Mostafa Milani, and Fei Chiang |
BigD669 | Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure | Alok Tripathy, Fred Hohman, Duen Horng Chau, and Oded Green |
BigD697 | Deep Learning for Predicting Dynamic Uncertain Opinions in Network Data | Xujiang Zhao, Feng Chen, and Jin-Hee Cho |
BigD698 | Density-aware Local Siamese Autoencoder Network Embedding with Autoencoder Graph Clustering | Yang Zhou, Amnay Amimeur, Chao Jiang, Dejing Dou, Ruoming Jin, and Pengwei Wang |
BigD703 | Exploring Size-Speed Trade-Offs In Static Index Pruning | Juan Rodriguez and Torsten Suel |
BigD731 | Enumerating Top-k Quasi-Cliques | Seyed-Vahid Sanei-Mehri, Apurba Das, and Srikanta Tirthapura |
BigD751 | Context Aware Recommender System for Large Scaled Flash Sale Sites | Wanying Ding, Ran Xu, Ying Ding, Yue Zhang, and Chuanjiang Luo |
5. Big Data Security & Privacy
Regular Papers | | |
BigD238 | Do Bitcoin Users Really Care About Anonymity? A Graph Analysis on Bitcoin Transaction Graphs | Anil Gaihre, Yan Luo, and Hang Liu |
BigD247 | Distributed Machine Learning Meets Blockchain: A Decentralized, Secure, and Privacy-preserving Realization | Xuhui Chen, Jinlong Ji, Changqing Luo, Weixian Liao, and Pan Li |
BigD296 | Benchmarking Anomaly Detection Algorithms in an Industrial Context: Dealing with Scarce Labels and Multiple Positive Types | David Renaudie, Maria A. Zuluaga, and Rodrigo Acuna-Agost |
BigD314 | A Unified Unsupervised Gaussian Mixture Variational Autoencoder for High Dimensional Outlier Detection | Weixian Liao, Yifan Guo, Xuhui Chen, and Pan Li |
BigD388 | PrivacyZone: A novel approach to protecting location privacy of mobile users | Emre Yigitoglu, Mehmet Emre Gursoy, Ling Liu, Margaret Loper, Bhuvan Bamba, and Kisung Lee |
BigD531 | There goes Wally: Anonymously sharing your location gives you away | Apostolos Pyrgelis, Nicolas Kourtellis, Ilias Leontiadis, Joan Serra, and Claudio Soriente |
BigD534 | Actionable Objective Optimization for Suspicious Behavior Detection on Large Bipartite Graphs | Tong Zhao, Matthew Malir, and Meng Jiang |
BigD538 | Phishing URL Detection with Oversampling based on Text Generative Adversarial Networks | Ankesh Anand, Kshitij Gorde, Joel Moniz, Noseong Park, Tanmoy Chakraborty, and Bei-Tseng Chu |
BigD677 | GCI: A Transfer Learning Approach for Detecting Cheats of Computer Game | Bo Dong, Md Shihabul Islam, Swarup Chandra, Latifur Khan, and Bhavani Thuraisingham |
Short Papers | | |
BigD217 | Novel anomaly detection and classification schemes for Machine-to-Machine uplink | Akshay Kumar, Ahmed Abdelhadi, and Charles Clancy |
BigD255 | Algorithmic Reputation | Michael Katell |
BigD346 | Toward End-to-End Deception Detection in Videos | Hamid Karimi, Jiliang Tang, and Yanen Li |
BigD443 | Learning Light-Weight Edge-Deployable Privacy Models | Yeon-sup Lim, Mudhakar Srivatsa, Supriyo Chakraborty, and Ian Taylor |
BigD446 | Automated Generation and Selection of Interpretable Features for Enterprise Security | Jiayi Duan, Ziheng Zeng, Alina Oprea, and Shobha Vasudevan |
BigD523 | Graph Mining-based Trust Evaluation Mechanism with Multidimensional Features for Large-scale Heterogeneous Threat Intelligence | Yali Gao, Xiaoyong Li, Jirui Li, Yunquan Gao and Ning Guo |
BigD589 | CVExplorer: Multidimensional Visualization for Common Vulnerabilities andExposures | Vung Pham and Tommy Dang |
BigD630 | dynamicMF: A Matrix Factorization Approach to Monitor Resource Usage in High Performance Computing | Niyazi Sorkunlu, Duc Thanh Anh Luong, and Varun Chandola |
BigD667 | An Integrated Knowledge Graph to Automate GDPR and PCI DSS Compliance | Lavanya Elluri, Ankur Nagar, and Karuna Pande Joshi |
6. Big Data Applications
Regular Papers | | |
BigD278 | A Bayesian Approach to Residential Property Valuation Based on Built Environment and House Characteristics | Zhicheng Liu, Shuai Yan, Jun Cao, Tanhua Jin, Jiabo Tang, Junyan Yang, and Qiao Wang |
BigD285 | Realtime Robustification of Interdependent Networks under Cascading Attacks | Zhen Chen, Hanghang Tong, and Lei Ying |
BigD292 | Market Abnormality Period Detection via Co-movement Attention Model | Yue Wang, Chenwei Zhang, Shen Wang, Philip S. Yu, Lu Bai, and Lixin Cui |
BigD298 | Optimizing Taxi Carpool Policies via Reinforcement Learning and Spatio-Temporal Mining | Ishan Jindal, Zhiwei (Tony) Qin, Xuewen Chen, Matthew Nokleby, and Jieping Ye |
BigD353 | Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking | Justin Sybrandt, Micheal Shtutman, and Ilya Safro |
BigD354 | Are Abstracts Enough for Hypothesis Generation? | Justin Sybrandt, Angelo Carrabba, Alexander Herzog, and Ilya Safro |
BigD371 | Enabling of Predictive Maintenance in the Brownfield through Low-Cost Sensors, an IIoT-Architecture and Machine Learning | Patrick Strauß, René Wöstmann, Markus Schmitz, and Jochen Deuse |
BigD387 | Integrating the University of São Paulo Security Mobile App to the Electronic Monitoring System | João Eduardo Fereira, José Antônio Visintin, Jun Okamoto Jr., Mauro Cesar Bernardes, Adriano Paterlini, Alexander Csóka Roque, and Moisés Ramalho Miguel |
BigD401 | IL-Net: Using Expert Knowledge to Guide the Design of a Furcated Neural Networks | Khushmeen Sakloth, Wesley Beckner, Jim Pfaendtner, and Garrett Goh |
BigD402 | Dynamic Prediction of ICU Mortality Risk Using Domain Adaptation | Tiago Alves, Alberto Laender, Adriano Veloso, and Nivio Ziviani |
BigD406 | Two Birds with One Network: Unifying Event Prediction and Time-to-failure Modeling | Karan Aggarwal, Onur Atan, Ahmed Farahat, Chi Zhang, Kosta Ristovski, and Chetan Gupta |
BigD445 | Transfer learning for time series classification | Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller |
BigD456 | A Structured Learning Approach with Neural Conditional Random Fields for Sleep Staging | Karan Aggarwal, Swaraj Khadanga, Shafiq Joty, Louis Kazaglis, and Jaideep Srivastava |
BigD476 | RiskSens: A Multi-view Learning Approach to Identifying Risky Traffic Locations in Intelligent Transportation Systems Using Social and Remote Sensing | Yang Zhang, Yiwen Lu, Daniel Zhang, Lanyu Shang, and Dong Wang |
BigD495 | Exploiting Knowledge Graph to Improve Text-based Prediction | Shan Jiang, Chengxiang Zhai, and Qiaozhu Mei |
BigD526 | A Minimax Approach for Classification with Big-data | Krishnan Raghavan, Jagannathan Sarangapani, and VA Samaranayake |
BigD528 | Mining Illegal Insider Trading of Stocks: A Proactive Approach | Sheikh Rabiul Islam, Sheikh Khaled Ghafoor, and William Eberle |
BigD547 | Profiling Driver Behavior for Personalized Insurance Pricing and Maximal Profit | Bing He, Dian Zhang, Siyuan Liu, Hao Liu, Dawei Han, and Lionel M. Ni |
BigD590 | Inferring Housing Demand based on Express Delivery Data | Qingyang Li, Zhiwen Yu, Bin Guo, and Xinjiang Lu |
BigD593 | Knowledge-guided Bayesian Support Vector Machine for High-Dimensional Data with Application to Analysis of Genomics Data | Wenli Sun, Changgee Chang, Yize Zhao, and Qi Long |
BigD643 | Situation-Based Interaction Learning for Personality Prediction on Facebook | Lei Zhang, Liang Zhao, Xuchao Zhang, Wenmo Kong, Zitong Sheng, and Chang-Tien Lu |
BigD653 | Technology Enablers for Big Data, Multi-Stage Analysis in Medical Image Processing | Shunxing Bao, Prasanna Parvathaneni, Yuankai Huo, Yogesh Barve, Andrew Plassard, Yuang Yao, Hongyang Sun, Ilwoo Lyu, David Zald, Bennett Landman, and Aniruddha Gokhale |
BigD655 | The unbanked and poverty: Predicting area-level socio-economic status from M-Money transactions | Gregor Engelmann, James Goulding, and Gavin Smith |
BigD685 | An Unsupervised Learning Based Approach for Mining Attribute Based Access Control Policies | Leila Karimi and James Joshi |
BigD721 | Time-Aware Subgroup Matrix Decomposition: Imputing Missing Data Using Forecasting Events | Xi Yang and Min Chi |
BigD760 | Learning Informative and Private Representations via Generative Adversarial Networks | Tsung-Yen Yang, Christopher Brinton, Prateek Mittal, Mung Chiang, and Andrew Lan |
Short Papers | | |
BigD245 | Predicting Perceived Cycling Safety Levels Using Open and Crowdsourced Data | Jiahui Wu, Lingzi Hong, and Vanessa Frias-Martinez |
BigD248 | A Longitudinal Social Network Clustering Method Based on Tie Strength | Zhiyong Zhang, Mao Ye, Yijie Huang, and Nan Sun |
BigD263 | Personalized heart failure severity estimates using passive smartphone data | Ayse Cakmak, Erik Reinertsen, Herman Taylor, Amit Shah, and Gari Clifford |
BigD280 | Data-driven Blockbuster Planning on Online Movie Knowledge Library | Ye Liu, Jiawei Zhang, Chenwei Zhang, and Philip S. Yu |
BigD310 | Deep Convolutional Neural Networks for Log Event Classification on Distributed Cluster Systems | Rui Ren, Jiechao Cheng, Yan Yin, Jianfeng Zhan, and Lei Wang |
BigD321 | Social-Media aided Hyperlocal Help-Network Matching & Routing during Emergencies | Dheeraj Kumar, Takahiro Yabe, and Satish Ukkusuri |
BigD337 | Session Expert: a Lightweight Conference Session Recommender System | Jinfeng Yi, Qi Lei, Junchi Yan, and Wei Sun |
BigD362 | Visual Reasoning of Feature Attribution with Deep Recurrent Neural Networks | Chuan Wang, Takeshi Onishi, Keiichi Nemoto, and Kwan-Liu Ma |
BigD380 | Entropy-Isomap: Manifold Learning for High-dimensional Dynamic Processes | Frank Schoeneman, Varun Chandola, Nils Napp, Olga Wodo, and Jaroslaw Zola |
BigD389 | A Hierarchical Framework for Timely Freeway Accident Detection and Localization | Yasitha Warahena Liyanage, Charalampos Chelmis, and Daphney-Stavroula Zois |
BigD465 | Predicting Individual-Level Call Arrival from Online Account Customer Activity | Somayeh Moazeni |
BigD494 | A Subspace Pre-learning Approach to Fast High-Accuracy Machine Learning of Large XOR PUFs with Component-Differential Challenges | Ahmad O. Aseeri, Yu Zhuang, and Mohammed Saeed Alkatheiri |
BigD506 | Scalable Classification of Univariate and Multivariate Time Series | Saeed Karimi-Bidhendi, Faramarz Munshi, and Ashfaq Munshi |
BigD535 | Short-term local weather forecast using dense weather station\\by deep neural network | Kazuo Yonekura, Hitoshi Hattori, and Taiji Suzuki |
BigD568 | NetClips: A Framework for Video Analytics in Sports Broadcast | Masoumeh Izadi, Shangjing Wu, Aiden Chia, and Bernard Cheng |
BigD594 | Defining an Alert Mechanism for Detecting likely threats to National Security | Pedro Cardenas Canto, Georgios Theodoropoulos, and Boguslaw Obara |
BigD654 | Distributed Learning of Deep Sparse Neural Networks for High-dimensional Classification | Shweta Garg, Krishnan Raghavan, Jagannathan Sarangapani, and Samaranayake V.A. |
BigD659 | Twitter Health Surveillance (THS) System | Manuel Rodriguez-Martinez and Cristian Garzon-Alfonso |
BigD746 | Land Cover Classification at the Wildland Urban Interface using High-Resolution Satellite Imagery and Deep Learning | Mai H. Nguyen, Jessica Block, Daniel Crawl, Vincent Siu, Akshit Bhatnagar, Federico Rodriguez, Alison Kwan, Namrita Baru, and Ilkay Altintas |
BigD755 | Distributed Reverse DNS Geolocation | Ovidiu Dan, Vaibhav Parikh, and Brian D. Davison |
Industry and Government Program
Regular Papers | | |
Paper ID | Title | Authors |
N205 | Relational Similarity Machines (RSM): A Similarity-based Learning Framework for Graphs | Ryan Rossi, Nesreen Ahmed, Rong Zhou, and Hoda Eldardiry |
N209 | Bridging the Gap between Big Data System Software Stack and Applications: A Case Study of Semiconductor Wafer Fabrication Foundries | Hung-Chang Hsiao |
N210 | CUImage: A Neverending Learning Platform on a Convolutional Knowledge Graph of Billion Web Images | Ping Luo, Zhanglin Peng, Lingyun Wu, and Jiamin Ren |
N211 | Learning to Simplify Distributed Systems Management | Christopher Streiffer, Ramya Raghavendra, Theophilus Benson, and Mudhakar Srivatsa |
N213 | Learning Effective Embeddings for Machine Generated Emails with Applications to Email Category Prediction | Yu Sun, Lluis Garcia-Pueyo, James Wendt, Marc Najork, and Andrei Broder |
N217 | Scheduling Large-scale Distributed Training via Reinforcement Learning | Zhanglin Peng, Jiamin Ren, Ruimao Zhang, Lingyun Wu, Xinjiang Wang, and Ping Luo |
N220 | Parallel Polyglot Query Processing on Heterogeneous Cloud Data Stores with LeanXcale | Boyan Kolev, Oleksandra Levchenko, Esther Pacitti, Patrick Valduriez, Ricardo Vilaca, Rui Goncalves, Ricardo Jimenez-Peris, and Pavlos Kranas |
N221 | STIPA: A Memory Efficient Technique for Interval Pattern Discovery | Amit Kumar and Dhaval Patel |
N226 | AISTAR: An Intelligent System for Online IT Ticket Automation Recommendation | Qing Wang, Chunqiu Zeng, S. S. Iyengar, Larisa Shwartz, Genady Ya Grabarnik, and Tao Li |
N228 | Character Recognition by Deep Learning: An Enterprise Solution | Khaled Bouaziz, Thiagarajan Ramakrishnan, Srinivasan Raghavan, Kyle Grove, Awny Al-Omari, and Choudur Lakshminarayan |
N229 | Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere | Jieyang Chen, Qiang Guan, Xin Liang, Paul Bryant, Patricia Grubel, Allen McPherson, Li-Ta Lo, Timothy Randles, Zizhong Chen, and James Ahrens |
N237 | Predicting Age & Gender of Mobile Users at Scale - A Distributed Machine Learning Approach | Kajanan Sangaralingam, Nisha Verma, Aravind Ravi, Anindya Datta, and Varun Chugh |
N241 | High-Throughput Adaptive Data Virtualization via Context-Aware Query Routing | Amirhossein Aleyasen, Mohamed Soliman, Lyublena Antova, Florian Mike Waas, and Marianne Winslett |
N243 | Efficient Super Resolution for Large-Scale Images using Attentional GAN | Brooke Cowan, Xinxin Li, Shervin Minaee, and Harsh Nilesh Pathak |
N244 | ANNOTATE: orgANizing uNstructured cOntenTs viA Topic labEls | Deepak Ajwani, Bilyana Taneva, Sourav Dutta, Pat Nicholson, Ghasem Nobari, and Alessandra Sala |
N256 | Reacting to Variations in Product Demand: An Application for Conversion Rate (CR) Prediction in Sponsored Search | Marcello Tallis and Pranjul Yadav |
N257 | A Smart System for Selection of Optimal Product Images in E-Commerce | Abon Chaudhuri, Paolo Messina, Samrat Kokkula, Aditya Subramanian, Abhinandan Krishnan, Shreyansh Gandhi, Alessandro Magnani, and Venkatesh Kandaswamy |
N259 | Finding Data Should be Easier than Finding Oil | Evgeny Kharlamov, Martin Skjaeveland, Theofilos Mailis, Ernesto Jimenez-Ruiz, Guohui Xiao, Ahmet Soylu, Hallstein Lie, and Arild Waaler |
N265 | Data models for service failure prediction in supply-chain networks | Monika Sharma, Tristan Glatard, Eric Gelinas, Mariam Tagmouti, and Brigitte Jaumard |
Short Papers | | |
N208 | Focusing on the Big Picture: Insights into an End-to-End Systems Approach to Deep Learning for Satellite Imagery | Ritwik Gupta, Carson Sestili, Javier Vazquez-Trejo, and Matthew Gaston |
N212 | A Generic and Scalable Pipeline for Large-Scale Analytics of Continuous Operational Aircraft Engine Data | Florent Forest, Jérôme Lacaille, Mustapha Lebbah, and Hanene Azzag |
N214 | Large Scale Open Source Video Recommender Tool Using Metadata Surrogates | George Mathew, Steven Smith, and John Passarelli |
N218 | Distributed NoSQL Data Stores: Performance Analysis and a Case Study | Abdeltawab Hendawi, Jayant Gupta, Liu Jiayi, Ankur Teredesai, Ramakrishnan Naveen, Shah Mohak, and Mohamed Ali |
N222 | Using Real-World Store Data for Foot Traffic Forecasting | Soheila Abrishami and Piyush Kumar |
N225 | Root Cause Detection using Dynamic Dependency Graphs from Time Series Data | Syed Yousaf Shah, Xuan-Hong Dang, and Petros Zerfos |
N227 | A Complete Data Science Work-flow For Insurance Field | Mohammed Ghesmoune, Hanane Azzag, Mustapha Lebbah, Salima Benbernou, Mourad Ouziri, and Tarn Duong |
N233 | In situ TensorView: In situ Visualization of Convolutional Neural Networks | Xinyu Chen, Qiang Guan, Li-Ta Lo, Simon Su, Zhengyong Ren, James Ahrens, and Trilce Estrada |
N234 | Performance Prediction using Neural Network and Confidence Intervals: a Gas Turbine application. | Silvia Cisotto and Randa Herzallah |
N235 | Spatio-temproal prediction of crimes using network analytic approach | Saroj Dash, Ilya Safro, and Ravisutha Srinivasamurthy |
N236 | Predicting Individual Level Consumer Brand Preferences Using Persistent Mobility Patterns | Aravind Ravi and Kajanan Sangaralingam |
N240 | Big Data Streaming Analytics for QoE Monitoring in Mobile Networks: A Practical Approach | Diego F. Rueda, Dahyr Vergara, and David Reniz |
N242 | A Deterministic Self-Organizing Map Approach and its Application on Satellite Data based Cloud Type Classification | Wenbin Zhang, Jianwu Wang, Daeho Jin, Lazaros Oreopoulos, and Zhibo Zhang |
N245 | E-commerce Product Query Classification Using Implicit User's Feedback from Clicks | Yiu-Chang Lin and Ankur Datta |
N246 | Explainable Text Classification in Legal Document Review: A Case Study of Explainable Predictive Coding | Rishi Chhatwal, Peter Gronvall, Nathaniel Huber-Fliflet, Robert Keeling, Jianping Zhang, and Haozhen Zhao |
N247 | Augmenting Software Project Managers with Predictions from Machine Learning | Kalyan Veeramachaneni and Benjamin Schreck |
N248 | and anticipate: continuous learning to block malicious domains | Ignacio Arnaldo and Kalyan Veeramachaneni, Acquire, adapt |
N249 | A Batched Multi-Armed Bandit Approach to Dynamic News Headline Testing | Yizhi Mao, Miao Chen, Abhinav Wagle, Junwei Pan, Michael Natkovich, and Don Matheson |
N250 | Identifying Distracted and Drowsy Drivers | Sujay Yadawadkar, Brian Mayer, Sanket Lokegaonkar, Mohammad Raihanul Islam, Miao Song, Mike Mollenhauer, and Naren Ramakrishnan |
N253 | Performance Implications of Big Data in Scalable Deep Learning: On the Importance of Bandwidth and Caching | Miro Hodak, David Ellison, Peter Seidel, and Ajay Dholakia |
N254 | ChieF : A Change Pattern based Interpretable Failure Analyzer | Dhaval Patel, Lam Nguyen, Akshay Rangamani, Shrey Shrivastava, and Jayant kalagnanam |
N255 | NetDP: An Industrial-Scale Distributed Network Representation Framework for Default Prediction in Ant Credit Pay | Jianbin Lin, Zhiqiang Zhang, Jun Zhou, Xiaolong Li, Jingli Fang, Yanming Fang, Quan Yu, and Yuan Qi |
N261 | Towards Semantic Simplification of Analytical Workflows at Siemens (Extended Abstract) | Evgeny Kharlamov, Gulnar Mehdi, Ognjen Savkovic, Guohui Xiao, Steffen Lamparter, Arild Waaler, and Ian Horrocks |
N264 | I4TSPS: a Visual-Interactive Web System for Industrial Time Series Pre-processing | Kevin Villalobos, Jon Vadillo, Borja Diez, Borja Calvo, and Arantza Illarramendi |