
How to Use AI Crawl Logs to Find Content Gaps
The conversation around Generative Engine Optimization (GEO) is full of noise. Many experts are focused on chasing net-new topics or obsessing over cheap tactical plays, like spinning up an llms.txt file hoping for a quick algorithmic win. But if you strip away the gimmicks, you’ll quickly see that GEO still relies heavily on foundational SEO principles. EEAT hasn’t gone anywhere. If anything, trust now carries more weight than any other signal, because AI systems are deciding which sources are credible enough to cite.
What is missing from the current conversation is what teams should be doing with the data they already have. Instead of just guessing what will be discovered, valued and cited by an LLM, marketers should develop a well-informed, data-backed approach. By pulling server logs, we can see exactly how LLMs are interacting with our site right now.
When an AI model repeatedly crawls a specific cluster of your pages, it is showing its hand: it already trusts you as a credible source on that subject. While building out net-new topics is still a necessary strategy, reinforcing the existing topics that already show serious promise is a much higher-leverage play.
I recently ran a hub authority analysis to map this exact behavior. Here is how you can use AI crawl logs to stop guessing and start strategically reinforcing the authority you have already earned.
I find that it is wise to look at your site from both a hub perspective as well as a topic perspective. Reading both layers ensures that a highly successful section doesn’t mask an underperforming one, and conversely, that a single high-demand topic doesn’t get lost inside an otherwise quiet hub.
- Hubs (The Structural Layer): These are your broad site sections or top-level folders. Oftentimes, this reflects a traditional pillar-and-spoke SEO strategy, but it can also just be the natural architecture of your site. In a healthcare or MedTech context, that might be a “Conditions” hub, a “Treatment Options” hub, or a “Patient Resources” hub.
- Topics (The Granular Layer): These are narrower. They are the one- or two-word subjects you can pull directly out of individual URL slugs, regardless of which hub they live in. The AI model reads each slug, and the core subject is usually right there in the text.
| URL | Topic |
| /patient-resources/preventing-surgical-site-infections/ | Surgical infections |
| /patient-resources/advanced-wound-care-techniques/ | Wound care |
| /patient-resources/inpatient-fall-prevention-tips/ | Fall prevention |
Topics are critical because they cut horizontally across your entire site structure. An underserved topic—like wound care in the example above—might be scattered across three different structural hubs, making it completely invisible if you only look at your data at the hub level. Tagging and analyzing at both levels is the only way to see the full picture.
Crawls Per Page: The Metric That Reveals Underserved Demand
When analyzing this data, looking at raw crawl totals is a trap. Raw totals naturally reward whichever section is already the largest, but this approach distracts from otherwise hidden opportunities.
To find the true signal, you have to divide total crawls by your volume of published content:
Crawls per page = total crawls in a hub or topic ÷ number of pages in it
By shifting the metric to a per-page ratio, you reward the sections that are the most under-supplied relative to the demand they are receiving.
Here is a simplified example to show how the story flips once you look at the ratio instead of the raw volume:
Analysis by Hub
| Hub | Pages | Total Crawls | Crawls Per Page |
| Patient Resources | 200 | 30,000 | 150 |
| Post-Op Recovery | 4 | 13,000 | 3,250 |
If you only look at the totals, “Patient Resources” looks like your powerhouse. But when you calculate crawls per page, you realize that just four pages on post-op recovery are each pulling more than twenty times the relative demand of your massive legacy section. The models are practically starving for content there.
Analysis by Topic
| Hub | Pages | Total Crawls | Crawls Per Page |
| Surgical Infections | 3 | 9,800 | 3,267 |
| Wound Care | 1 | 4,100 | 4,100 |
| Fall Prevention | 12 | 2,400 | 200 |
A single page on wound care is pulling massive demand per page. That single-page signal is exactly the kind of massive structural opening that hub-level reporting hides, and it is an opening most healthcare brands completely walk past.
This pattern shows up constantly: educational, top-of-funnel, explainer-style content does the heavy lifting in AI search. The underserved topics in your logs are almost always the informative questions adjacent to what you sell, rather than your primary conversion or product pages.
How to Read the Signal: Grow, Defend, or Consolidate
Once you’ve ranked everything by crawls per page, three patterns tell you what to do.
High crawls per page, few pages. Grow this hub. This is the headline opportunity. The model trusts you here and you’re underserving real demand. Add content and the model is primed to pull from the new pages too. This is where your next few articles should go.
High crawls per page, many pages. Defend and maintain. You’ve already built strong authority in a well-covered area. That’s a position competitors will come for. Keep it fresh, keep it accurate, and don’t let it erode.
Low crawls per page, many pages. Audit and consolidate. A lot of content earning little demand per page usually means one of three things: thin or redundant pages diluting the signal, genuinely low demand, or quality problems spread across too many URLs. This is an opportunity to prune and consolidate.
How to Prioritize Your Content Roadmap by Crawls Per Page
The roadmap writes itself from here. Rank every hub and topic by crawls per page, highest first. The entries at the top with the fewest pages are your highest-leverage opportunities. The model has already decided you belong in the answer for those subjects. You just need to show up more.
Why More Content in a Trusted Hub Earns More AI Citations
Three things have to be true at once for this to work, each of which can be validated through crawl data:
- The model has read your content and found it credible. AI systems lean on sources they’ve encountered repeatedly and judged reliable across many queries. Heavy crawl activity in a section is evidence you’ve cleared that bar.
- Users are asking about that topic at volume. A high crawls-per-page ratio means real demand is flowing through a hub you’ve barely built out. You have demand, but need more supply.
- You have room to expand. When you add depth to an area the model already trusts, it’s primed to surface the new pages. Authority compounds within a hub.
Part of the beauty of this approach is that you’re not trying to win ground a federal agency or a top-tier university will always own. You’re finding the specific places the models have already decided you’re a credible source, then meeting the demand that’s sitting there unanswered.
More content in a trusted hub means more citations and greater visibility in LLMs.
The reason I like this analysis is that it’s honest about leverage. Most content planning starts from a wishlist of topics someone wants to rank for. This starts from evidence of where you’ve already earned inclusion.
Want more strategic insights like this?
Our newsletter explores the strategies, technologies, and approaches that are actually moving the needle for privacy-first brands. No fluff, just actionable insights and real-world lessons from the front lines of performance marketing.


