Exploring Innovative Research in Outlying Aspect Mining
Written on
Chapter 1: Introduction to Outlying Aspect Mining
In this week’s edition (from August 17 to August 23, 2020), we delve into three pivotal research papers that focus on the intriguing field of outlying aspect mining.
Mining Outlying Aspects of Numeric Data
Authors: Lei Duan, Guanting Tang, Jian Pei, James Bailey, Akiko Campbell, Changjie Tang
Venue: Data Mining and Knowledge Discovery Journal
Paper: [Link](#)
Abstract:
This paper tackles the challenge of identifying unusual aspects of an object within a dataset, which may or may not be an outlier itself. The authors present a novel approach to mining outlying aspects in numeric data. Given a query object (o) in a multidimensional numeric dataset (O), the core question is: in which subspace does (o) appear most outlying? The authors propose using the rank of the probability density of an object in a subspace as a measure of its outlyingness. A minimal subspace where the query object ranks highest is classified as an outlying aspect. The process of computing these aspects is complex, especially in high dimensions. The authors have crafted a heuristic method that efficiently navigates datasets with numerous dimensions. Their empirical analysis, conducted on both real and synthetic data, validates the effectiveness and efficiency of their approach.
Discovering Outlying Aspects in Large Datasets
Authors: Nguyen Xuan Vinh, Jeffrey Chan, Simone Romano, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, Jian Pei
Venue: Data Mining and Knowledge Discovery Journal
Paper: [Link](#)
Abstract:
This paper explores the outlying aspects mining problem: given a query object and a reference multidimensional dataset, how can we identify which aspects (subsets of features or subspaces) render the query object most outlying? The techniques discussed can elucidate any point of interest, whether it is an inlier or an outlier. The authors address existing challenges in outlying aspects mining and propose innovative solutions, such as (a) creating effective scoring functions that remain unbiased concerning dimensionality while being computationally efficient, and (b) efficiently navigating the vast search space of potential subspaces. They formalize the notion of dimensionality unbiasedness, an essential characteristic of outlyingness measures, and evaluate various methods for discovering outlying aspects, demonstrating the utility of their proposed solutions on extensive real and synthetic datasets.
Chapter 2: Advancements in Density Estimation
A New Simple and Efficient Density Estimator
Authors: Jonathan R. Wells, Kai Ming Ting
Venue: Pattern Recognition Letters
Paper: [Link](#)
Abstract:
This paper presents a straightforward and efficient density estimator that facilitates rapid systematic searches. To illustrate its superiority over traditional kernel density estimators, the authors apply it to the realm of outlying aspects mining. This process involves uncovering feature subsets (or subspaces) that highlight how a query differs from the overall dataset. The task necessitates a systematic exploration of subspaces. The authors pinpoint that existing outlying aspect mining methods are often confined to smaller datasets due to their reliance on kernel density estimators, which are computationally intensive for subspace evaluations. By substituting the conventional density estimator with the one proposed, a recent outlying aspects miner can operate significantly faster, enabling the analysis of extensive datasets with thousands of dimensions that would be otherwise unmanageable.
Previous Weeks' Reading Lists:
- Weekly Reading List #1
- Weekly Reading List #2
About Me:
I am Durgesh Samariya, a third-year Ph.D. student specializing in Machine Learning at FedUni, Australia. Online, I am recognized as TheMLPhDStudent.
Subscribe to my newsletter for weekly insights.
Social Media:
Follow me on [Facebook](#), [Instagram](#), [Twitter](#), and [Medium](#).
Thank you for reading!