Skip to content

Topic Clustering

You can use the get_topic_clusters function to cluster the titles of all pages in a sitemap. This function can help you find similar groups of content on your site.

This function generates a visualization of the clusters like this:

Topic clusters

You can hover over each link in the analysis generated by the get_topic_clusters function to see the URL of the page.

get_topic_clusters

Group content into the provided number of clusters.

Parameters:

Name Type Description Default
topics list

A list of topics to cluster.

required
n_clusters int

The number of clusters to create.

2

Returns:

Name Type Description
dict dict

A dictionary of clusters.

Example
from seotools.app import Analyzer

analyzer = Analyzer("https://jamesg.blog/sitemap.xml", load_from_disk=True)

analyzer.visualize_with_embeddings()
Source code in seotools/topics.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def get_topic_clusters(topics: list, n_clusters: int = 2) -> dict:
    """
    Group content into the provided number of clusters.

    Args:
        topics (list): A list of topics to cluster.
        n_clusters (int): The number of clusters to create.

    Returns:
        dict: A dictionary of clusters.

    Example:
        ```python
        from seotools.app import Analyzer

        analyzer = Analyzer("https://jamesg.blog/sitemap.xml", load_from_disk=True)

        analyzer.visualize_with_embeddings()
        ```
    """
    embeddings = {v: model.encode(v) for v in topics}

    X = list(embeddings.values())

    kmeans = cluster.KMeans(n_clusters=n_clusters)

    kmeans.fit(X)

    labels = kmeans.labels_

    clusters = {}

    for i, label in enumerate(labels):
        if label not in clusters:
            clusters[label] = []

        clusters[label].append(list(embeddings.keys())[i])

    # transpose keys into str
    clusters = {str(k): v for k, v in clusters.items()}

    return clusters