Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大数据分析笔记3-聚类 #7

Open
SSK015 opened this issue May 29, 2023 · 0 comments
Open

大数据分析笔记3-聚类 #7

SSK015 opened this issue May 29, 2023 · 0 comments

Comments

@SSK015
Copy link
Owner

SSK015 commented May 29, 2023

https://ssk015.github.io/dashuju3/

Click here Slide(pdf)

What is Clustering?
That is to say, given a set of points, We can define a concept of distance between these points. Then, we group the points into some number of clusters, which is known as “簇” in Chinese.

distance : mainly Euclidean or Jaccard

Why is Clustering hard?
Too many dimensions: isolated points

Two main methods:
Hierarchical: bottom up and top down
Assign: assign points to a Existing cluster

Hierarchical(between clusters):
note:

represent a cluster.
determine the nearness of clusters.
when to stop merging clusters.
Euclidean case:

centroid(average)
(1) distance of centroids
(2) shortest distance between two clusters

UnEuclidean case:

Approach 1

choose Clustroid(a exisiting point)

maxium/average/square of dis

various distance and cohesion measures
Approach 2

the collection of points.
define inter-cluster distance.
min of two or avg of all pairs.
Approach 3

the collection of points.
define a notion of cohesion, merge similiar unions.
diamter, avg dis, density
3.
design 1: convex clusters
design 2: concentric clusters.

K-Means(Assignment)

definition: a method, not a algorithm.

method: Before convergence(all the points don’t move), assign points and update centroids

select k: try different k, get the value when avg dis to centroid stop changing dramatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant