Wednesday, May 30, 2018

Consensus Clustering Via Apache Spark

Introduction

In this article, we will discuss a technique called Consensus Clustering to assess the stability of clusters generated by a clustering algorithm with respect to small perturbations in the data set. We will review a sample application built using the Apache Spark machine learning library to show how consensus clustering can be used with K-means, Bisecting K-means, and Gaussian Mixture, three distinct clustering algorithms.

Cluster analysis [1] in machine learning aims to partition data into separate, nonoverlapping sets based on a similarity measure between the data points. The data points in the same cluster must be as close (similar) to each other as possible and the data points in different clusters must be as distant (dissimilar) as possible. Cluster analysis has many applications in various scientific disciplines including biology, bioinformatics, medicine, business, computer science, and social sciences [1]. Below are some examples.



from DZone.com Feed https://ift.tt/2LDDkma

No comments:

Post a Comment