Hierarchical Clustering

Let **E** be the a set of *k* *N*-dimensional objects.

where *e _{i}* is the

An *m*-partition **P** of **E** breaks **E** into
*m* subsets
satisfying the following:

A hierarchical clustering may be organized as a tree structure: Let
*P _{i}* be a component of

There is a large body of literature dealing with hierarchical cluster construction. The actual method of tree construction is however not relevant to this paper. Any method that builds a tree which abides by the above definitions could in principle be used as the tree construction scheme in our system.

However, most clustering algorithms are not appropriate for large datasets because they do not consider the case where the dataset can be too large to fit into memory. In such cases, there is a need to work with limited resources to perform clustering as accurately as possible while keeping I/O costs low. In recent years, a number of algorithms for clustering large datasets have been proposed [2,10,27]. We have adopted the Birch clustering algorithm [27] as our primary clustering technique, although our visualization would work equally well with other methods.