← All posts

How AI Recommends Local Businesses

Princeton KDD 2024 research found that adding statistics to a webpage increases its AI citation rate by 33%. Here's what that means for local businesses whose websites AI crawlers can't read at all.

By Jackson Gordon
  • AI
  • local search
  • GEO
  • business discovery
  • AI crawlers

Princeton KDD 2024 research found that adding statistics to a webpage increases its AI citation rate by 33%. For local businesses, that optimization is irrelevant if AI crawlers can’t read the page in the first place.

When we audited nowseen.ai — our own site — using the Princeton GEO methodology, it scored 5.7 out of 10 overall, with 0.3 citable claims per 100 words against a benchmark target of 4.0. The culprit wasn’t the content. It was the rendering: a client-side JavaScript SPA that returns an empty HTML shell to any crawler that doesn’t execute JavaScript. ChatGPT, Perplexity, and Google’s AI systems don’t execute JavaScript. They read HTML.

This is the local business AI visibility problem in a single audit.

What GEO Research Actually Found

The 2024 paper “GEO: Generative Engine Optimization” (Aggarwal et al., KDD 2024, Princeton University) is the most rigorous published study on what makes content get cited by AI systems. The researchers tested specific content interventions against generative search engines and measured citation rate changes. Key findings:

  • Adding quotations from authoritative sources: +41% AI citation rate
  • Adding statistics and numerical data: +33% AI citation rate
  • Improving fluency and readability: +29% AI citation rate

These gains assume the content is crawlable. A JavaScript-rendered site gets none of them, because AI systems never reach the content to evaluate it.

The Rendering Problem Is Systematic

We’ve audited 221 local service businesses through the Seen AI platform. Of those 221 audits, 210 had complete scoring data. The average overall AI visibility score was 66.2 out of 100. Most local businesses passed the basic readability test (average crawlAccess score of 73.4 out of 100), but struggled on citability: the average answerQuality score was just 60.0, and 32% of sites scored below 50 on that dimension. The pattern is clear. It’s not that AI can’t read local business websites. It’s that those websites give AI nothing worth quoting back to a searcher.

The pattern is consistent: local businesses — plumbers, restaurants, salons, contractors — have increasingly moved to website builders and template platforms that generate client-side React or Vue apps. These look fine in a browser. To an AI crawler, they’re blank pages.

Traditional Google search worked around this with a JavaScript rendering queue: Googlebot would fetch a page, see the empty shell, and schedule a second crawl to execute the JavaScript and index the rendered content. The process lagged by days or weeks, but it worked eventually.

AI systems don’t have a rendering queue. They process HTML at crawl time. If the content isn’t in the HTML response, it doesn’t exist.

How AI Recommendation Works

When someone asks an AI assistant to recommend a local plumber, the system draws on two sources: its training data (content crawled before the model’s knowledge cutoff) and, in some cases, real-time retrieval (live web search integrated into the response).

In both cases, the content has to have been in static HTML at some point to be usable. Training data is assembled from web crawls that read HTML. Retrieval systems like Bing, which powers Perplexity and ChatGPT’s web search mode, index HTML content.

The GEO research identifies the content signals AI systems use to decide what to cite. In order of measured impact: statistics and numerical claims, direct quotations from named sources, and fluent, readable prose. A business whose website contains specific, factual, well-written claims in static HTML is substantially more likely to appear in AI-generated recommendations than one with generic marketing copy — and both are ahead of a business whose content is locked inside client-side JavaScript.

What Citability Actually Measures

The GEO methodology scores content on claim density: the number of citable, verifiable factual claims per 100 words. The benchmark target is 4.0 per 100 words. Our nowseen.ai audit returned 0.3 — a gap that reflects how most local business websites are written, not just how they’re rendered.

Most local business homepages say something like: “We’re a trusted, family-owned plumbing company serving the Greater Denver area.” That sentence contains zero citable claims. An AI system can’t cite “trusted.” It can cite: “Licensed in Colorado since 2009. 847 jobs completed in 2024. Average response time: 2.3 hours.”

This distinction — crawlable content versus citable content — is the full scope of the problem. Technical rendering fixes are necessary but not sufficient.

The Fix: Static HTML Plus Citable Content

Addressing AI visibility requires solving both layers:

Layer 1 — Rendering: Content must exist as static HTML. This means either building with a static site generator (this blog uses Astro, which outputs plain HTML files), or pre-rendering a JavaScript app at build time so crawlers receive real content instead of an empty shell.

Layer 2 — Citability: Content must contain specific, verifiable claims. Business hours, service areas, license numbers, years in operation, named staff, real pricing ranges, actual customer counts. The Princeton GEO research finding that statistics increase citation rates by 33% only applies when the statistics exist in the content.

What This Blog Is

This blog is built as a working example of the fix. Every post is statically rendered — view source on this page and you’ll find the full article text in the HTML, no JavaScript required. It scores significantly higher on readability (9.7/10 in our own GEO audit) than the nowseen.ai main site did when rendered as a SPA (3.7/10), because the content is actually accessible.

The remaining gap — citability, claim density — is what this kind of editorial work is meant to close. Every post that contains real data, named research, and specific claims is practice for what we help local businesses do with their own content.