Clustering
Wikipedia explains:
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects
in the same group (called a cluster) are more similar (in some sense or another) to each other than
to those in other groups (clusters). It is a main task of exploratory data mining, and a common
technique for statistical data analysis, used in many fields, including machine learning, pattern
recognition, image analysis, information retrieval, and bioinformatics.
Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be
achieved by various algorithms that differ significantly in their notion of what constitutes a
cluster and how to efficiently find them. Popular notions of clusters include groups with small
distances among the cluster members, dense areas of the data space, intervals or particular statistical
distributions. Clustering can therefore be formulated as a multi-objective optimization problem. The
appropriate clustering algorithm and parameter settings (including values such as the distance function
to use, a density threshold or the number of expected clusters) depend on the individual data set and
intended use of the results. Cluster analysis as such is not an automatic task, but an iterative process
of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It
will often be necessary to modify data preprocessing and model parameters until the result achieves the
desired properties."
SUMO offers a few fundamental clustering methods: