Absent Data Generating Classifier for Imbalanced Class Sizes

Arash Pourhabib, Bani K. Mallick, Yu Ding

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.
Original languageEnglish (US)
Pages (from-to)2695-2724
Number of pages30
JournalJournal of Machine Learning Research
Volume16
StatePublished - Dec 2015
Externally publishedYes

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Statistics and Probability
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Absent Data Generating Classifier for Imbalanced Class Sizes'. Together they form a unique fingerprint.

Cite this