aggrigation on clustering k-means #31852
Replies: 2 comments 4 replies
-
If I understand correctly, you are looking for running a DBScan on milvus and clustering all the dataset. (KMeans need to specify K but under your case you don't know how many categories you have). How many embeddings do you have? If less than 10m using faiss could simply solve this problem on one single machine. If the vector numbers are huge we can definitely help on that |
Beta Was this translation helpful? Give feedback.
-
We are working on a distributed DBScan but it should be ideally fits for large dataset. |
Beta Was this translation helpful? Give feedback.
-
unusual requirement this one if you can solve it.
I have a database of unstructured chatty social media posts (no hashtags, even) and want to identify trending stories—up or down. I want to embed each story, group them into clusters of similar stories, and identify whether the clusters are shrinking or growing over time. Similar to an SQL group by query with count.
I am looking at k-means scatter graphs, thinking this clustering is perfect. If I can check over time clusters that are growing or shrinking, then I can look at posts near the centroids to identify the themes of the clusters.
I am not looking for a visualisation scatter chart though, just a list of movers i.e. "vector db milvus up 12% on last week".
Knowing that milvus index stores as k-means cluster, is there a way I can pull the information directly from the index and aggregate the count on the cluster size - or something along those lines? This may be a bad idea, open to suggestions.
Beta Was this translation helpful? Give feedback.
All reactions