scipy.cluster.vq.kmeans2¶
- scipy.cluster.vq.kmeans2(data, k, iter=10, thresh=1e-05, minit='random', missing='warn', check_finite=True)[source]¶
- Classify a set of observations into k clusters using the k-means algorithm. - The algorithm attempts to minimize the Euclidian distance between observations and centroids. Several initialization methods are included. - Parameters: - data : ndarray - A ‘M’ by ‘N’ array of ‘M’ observations in ‘N’ dimensions or a length ‘M’ array of ‘M’ one-dimensional observations. - k : int or ndarray - The number of clusters to form as well as the number of centroids to generate. If minit initialization string is ‘matrix’, or if a ndarray is given instead, it is interpreted as initial cluster to use instead. - iter : int, optional - Number of iterations of the k-means algrithm to run. Note that this differs in meaning from the iters parameter to the kmeans function. - thresh : float, optional - (not used yet) - minit : str, optional - Method for initialization. Available methods are ‘random’, ‘points’, and ‘matrix’: - ‘random’: generate k centroids from a Gaussian with mean and variance estimated from the data. - ‘points’: choose k observations (rows) at random from data for the initial centroids. - ‘matrix’: interpret the k parameter as a k by M (or length k array for one-dimensional data) array of initial centroids. - missing : str, optional - Method to deal with empty clusters. Available methods are ‘warn’ and ‘raise’: - ‘warn’: give a warning and continue. - ‘raise’: raise an ClusterError and terminate the algorithm. - check_finite : bool, optional - Whether to check that the input matrices contain only finite numbers. Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs. Default: True - Returns: - centroid : ndarray - A ‘k’ by ‘N’ array of centroids found at the last iteration of k-means. - label : ndarray - label[i] is the code or index of the centroid the i’th observation is closest to. 
