Intent-first clustering: why grouping keywords by intent beats grouping by words

Most keyword clustering tools in 2026 still do something that was acceptable in 2018 but is now a structural error: they group keywords based on term similarity or SERP URL overlap.

BeKnow Editorial
5 min read

Intent-first clustering: why grouping keywords by intent beats grouping by words

Most keyword clustering tools in 2026 still do something that was acceptable in 2018 but is now a structural error: they group keywords based on term similarity or SERP URL overlap. This works fine when you just need quantity. It stops working the moment you want your clusters to rank, get cited by LLMs, and build authority. — Topical Authority in 2026: Why Google Rewards Semantic Coverage Over Individual Keywords (see BeKnow pricing).

At BeKnow, we've rewritten clustering logic from scratch around a different principle: intent first, then words. It's called Intent-First Clustering, and it's why editorial calendars generated by our platform produce articles that don't cannibalize each other and that fill clusters instead of just stacking keywords.

What's wrong with traditional clustering

Traditional clustering follows one of these two approaches:

Lexical clustering: groups keywords that share terms. "Best rank tracker" and "free rank tracker" end up together because they share "rank tracker." Seems obvious, and it's wrong: the intent is completely different (commercial vs informational/transactional).

SERP overlap clustering: groups keywords that share URLs in Google's top 10 positions. More sophisticated than lexical, but fragile: it depends on current SERPs, rewards those already strong (established SERPs are more stable), and fails on new or niche keywords where SERPs are volatile.

Both fail for the same reason: they treat keywords as strings, not as manifestations of user needs. But people searching on Google (or asking Perplexity) aren't typing words: they're expressing intentions. And different intentions, even when dressed in similar words, should never be put in the same cluster.

The four intent families (and why you only need four)

We debated extensively whether to distinguish 6, 8, or 12 intents. In the end, we returned to four, because everything else is a subcategory.

  1. Informational: the user wants to understand something. "What is topical authority", "how does Perplexity work".

  2. Commercial: the user is evaluating options before buying. "Best rank trackers", "Semrush vs Ahrefs".

  3. Transactional: the user is ready to act. "Buy Semrush", "BeKnow free trial".

  4. Navigational: the user is looking for a specific brand or product. "Search Console login", "BeKnow.io".

Three rigid rules derive from this taxonomy:

  • Never mix different intents in the same cluster. An informational page and a commercial page on the same topic are two different things. They should be written separately, even if the keywords seem close.

  • A cluster hub always has one dominant intent. If the hub tries to be both a guide and a comparison, it fails at both.

  • An informational cluster can "feed" a commercial one via internal links, but it remains a distinct cluster with distinct metrics.

How Intent-First works in practice

The workflow we've coded into BeKnow has three mandatory steps, in this order. The order is non-negotiable: reversing it reproduces exactly the problems of traditional clustering.

Step 1 — Intent classification for each keyword

Each keyword is passed to a model (Gemini 2.5 Pro for planning in our stack) that assigns it:

  • one of the four intent families

  • an intent value score (how "decisional" it is)

  • a specificity score (how vertical it is)

This is the step most tools skip or approximate. It's what makes the difference between a cluster that ranks and one that stays stuck halfway.

Step 2 — Semantic grouping within the same intent

Only at this point does semantic similarity come into play. But we only compare keywords with the same intent: an "informational" keyword never gets put in the same cluster as a "commercial" one, even if they share 90% of the words. Clustering happens through vector embeddings, with distance thresholds calibrated by family (commercial keywords tolerate broader clusters, informational ones need to be kept tight).

Step 3 — Hub and spoke selection for clusters

Within each cluster, we select:

  • The hub keyword: the one with the highest intent_value and lowest specificity (the cluster's general intent).

  • The spoke keywords: those with higher specificity (the vertical sub-intents), max 6 as we saw in the Hub & Spoke architecture article.

At this point the cluster is ready to become an editorial plan.

What changes in practice: two examples

Traditional example (lexical clustering):

  • "Rank tracker" cluster contains: "best rank tracker", "free rank tracker", "how does a rank tracker work", "rank tracker for agencies", "rank tracker alternatives"

  • Result: one article tries to cover everything. It becomes a confusing guide that ranks none of the keywords well.

Intent-First example:

  • Informational cluster "rank tracker — understanding": "how does a rank tracker work", "what is a rank tracker"

  • Commercial cluster "rank tracker — choosing": "best rank tracker", "rank tracker for agencies", "rank tracker alternatives"

  • Transactional cluster "rank tracker — trying": "free rank tracker", "rank tracker free trial"

Three clusters instead of one. They seem like more work, they're actually less: each article is vertical, writable in half the time, and each cluster becomes a separate asset that converts (because it speaks to a precise funnel stage).

The merging rules: when two clusters should be combined

Even with Intent-First, edge cases arise: two clusters with the same intent but similar keywords. The operational rule is:

  • Same intent + semantic overlap > 70% → merge them.

  • Same intent + semantic overlap between 40% and 70% → keep them separate but put them in a macro-cluster (with the same hub).

  • Same intent + overlap < 40% → independent clusters.

Below 40% overlap, two clusters even with the same intent answer substantially different questions. Forcing their merger is as much an error as separating clusters with different intents.

Why this model also works for answer engines

Worth closing on this point, because it's the near future. Answer engines (Perplexity, ChatGPT Search, Gemini, Copilot) select sources based on two things:

  1. The semantic relevance of content to the query.

  2. The structure of the site hosting the content: dense, well-organized clusters get "read" better.

A site structured for Intent-First has a systematic advantage in the second point. When Perplexity processes your domain, it recognizes that your informational page sits within a cohesive informational cluster, and your commercial page sits within a separate commercial cluster. For the model, this means: this site knows what it's talking about, and knows who it's talking to. It's exactly the signal that gets rewarded with citations.

In summary

Intent-First isn't a more sophisticated variant of traditional clustering: it's a paradigm shift. It stops asking "which keywords look similar?" and starts asking "which user needs are the same?". The result is smaller, more vertical, more convertible clusters — and an editorial structure that LLMs and Google read as a signal of true expertise, not just volume.

If you have an editorial calendar generated with traditional clustering, revisiting it by applying intent before words is probably the exercise with the best ROI you can do in the next 30 days. Often just separating two poorly merged clusters is enough to unlock rankings that have been stuck for months.

This closes the cluster on topical authority and strategy: you have the framework (what it is and why it matters), the architecture (Hub & Spoke), damage prevention (cannibalization), and well-built raw materials (Intent-First Clustering). The rest is execution.


Ready to Transform Your Content Strategy?

Start creating SEO-optimized content with AI-powered semantic intelligence.

See pricing