You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is Clustering?
That is to say, given a set of points, We can define a concept of distance between these points. Then, we group the points into some number of clusters, which is known as “簇” in Chinese.
distance : mainly Euclidean or Jaccard
Why is Clustering hard?
Too many dimensions: isolated points
Two main methods:
Hierarchical: bottom up and top down
Assign: assign points to a Existing cluster
Hierarchical(between clusters):
note:
represent a cluster.
determine the nearness of clusters.
when to stop merging clusters.
Euclidean case:
centroid(average)
(1) distance of centroids
(2) shortest distance between two clusters
UnEuclidean case:
Approach 1
choose Clustroid(a exisiting point)
maxium/average/square of dis
various distance and cohesion measures
Approach 2
the collection of points.
define inter-cluster distance.
min of two or avg of all pairs.
Approach 3
the collection of points.
define a notion of cohesion, merge similiar unions.
diamter, avg dis, density
3.
design 1: convex clusters
design 2: concentric clusters.
K-Means(Assignment)
definition: a method, not a algorithm.
method: Before convergence(all the points don’t move), assign points and update centroids
select k: try different k, get the value when avg dis to centroid stop changing dramatically.
The text was updated successfully, but these errors were encountered:
https://ssk015.github.io/dashuju3/
Click here Slide(pdf)
What is Clustering?
That is to say, given a set of points, We can define a concept of distance between these points. Then, we group the points into some number of clusters, which is known as “簇” in Chinese.
distance : mainly Euclidean or Jaccard
Why is Clustering hard?
Too many dimensions: isolated points
Two main methods:
Hierarchical: bottom up and top down
Assign: assign points to a Existing cluster
Hierarchical(between clusters):
note:
represent a cluster.
determine the nearness of clusters.
when to stop merging clusters.
Euclidean case:
centroid(average)
(1) distance of centroids
(2) shortest distance between two clusters
UnEuclidean case:
Approach 1
maxium/average/square of dis
various distance and cohesion measures
Approach 2
the collection of points.
define inter-cluster distance.
min of two or avg of all pairs.
Approach 3
the collection of points.
define a notion of cohesion, merge similiar unions.
diamter, avg dis, density
3.
design 1: convex clusters
design 2: concentric clusters.
K-Means(Assignment)
definition: a method, not a algorithm.
method: Before convergence(all the points don’t move), assign points and update centroids
select k: try different k, get the value when avg dis to centroid stop changing dramatically.
The text was updated successfully, but these errors were encountered: