Generative Engine Optimization: How to Get Your Business Cited by ChatGPT and Perplexity

Ask ChatGPT "what's the best CRM for a small law firm?" and you don't get ten blue links. You get a paragraph, a recommendation, and — if the model is being careful — a short list of cited sources. The click never happens. The decision does. Generative engine optimization (GEO) is the discipline of making sure your business is the source the model names, quotes, and links to inside that answer.

This is not a rebrand of SEO. It is a second optimization surface that sits on top of the first. Classic SEO still decides whether you rank in Google's blue links and feed Google's AI Overviews. GEO decides whether a large language model — ChatGPT, Perplexity, Gemini, Copilot, Claude, or Grok — reaches for your page when it composes an answer. The mechanics overlap, but the win condition is different: you are no longer fighting for a position, you are fighting to be cited.

This guide walks through where the term came from, the nine tactics with measured impact, the technical signals that make a page citable, and — the part most "GEO" advice skips — how to actually measure whether any of it worked.

Where "GEO" comes from — and why it's measurable

The phrase wasn't invented by a marketing agency. It comes from a 2023 research paper, "GEO: Generative Engine Optimization," by a team from Princeton University, Georgia Tech, The Allen Institute for AI, and IIT Delhi (arXiv:2311.09735). The authors built GEO-bench, a benchmark of 10,000 queries across domains, and tested how different content edits changed a source's visibility inside a generative engine's answer.

Their headline finding: a handful of content changes lifted source visibility in generative engines by up to 40%. The single most effective tactic in their tests was not keyword stuffing or backlinks — it was adding citations, quotations, and statistics to the source content itself. As the authors put it:

"Our results show that GEO methods can boost the visibility of websites by up to 40% in generative engine responses." — Aggarwal et al., GEO: Generative Engine Optimization (2023)

That matters because it makes GEO falsifiable. You can change a page, re-run the query against the engine, and watch the citation appear or not appear. That is a very different world from "write good content and wait for Google."

The shift is real, and it's fast

Two numbers frame why this is urgent rather than experimental.

First, AI answers suppress clicks. A 2025 Pew Research Center analysis of real browsing behavior found that when a Google search returned an AI summary, users clicked a traditional search result on just 8% of visits, versus 15% when no AI summary appeared — roughly half the click-through (Pew Research Center, 2025). The answer is increasingly the destination, not a waypoint.

Second, AI search is no longer a rounding error. Industry tracking through 2025 showed generative engines like ChatGPT and Perplexity sending a small but rapidly compounding share of referral traffic, with adoption climbing fastest among high-intent commercial and B2B queries — exactly the searches that precede a purchase. Search Engine Land has documented this transition in depth, framing GEO as the practical response marketers need rather than a speculative one (Search Engine Land, 2024).

The takeaway is not "Google is dead." Google handles billions of queries a day and its own AI Overviews run on the same citable-source logic. The takeaway is that one piece of content now has to win two different competitions — the ranking and the citation — and most sites are only built for the first.

The nine GEO tactics, ranked by what they actually lift

The Princeton study tested nine content interventions. Here is what each one is, and the practical version of how to apply it to a real business page.

GEO tactic	What it means in practice	Why engines reward it
Cite sources	Link out to authoritative third parties for every factual claim	Models prefer sources that themselves show provenance
Add statistics	Replace vague claims with specific numbers and ranges	Quantified claims are easier for a model to extract and attribute
Add quotations	Include attributed quotes from named experts or primary research	Quotes signal first-hand reporting, not paraphrase
Improve fluency	Tighten prose; remove hedging and filler	Cleaner passages are easier to lift verbatim into an answer
Authoritative tone	Write declaratively where the facts support it	Confident, sourced phrasing reads as expert
Technical terms	Use the correct domain vocabulary	Matches the entities the model is reasoning over
Keyword coverage	Cover the question's natural language, not just the head term	Improves semantic match to conversational queries
Easy-to-understand	Define jargon; explain the "why"	Broadens the range of queries the page can answer
Unique words	Distinctive, specific phrasing over generic	Reduces interchangeability with competitor pages

The study's strongest performers were the first three — citations, statistics, and quotations — which is why the rest of this playbook treats them as non-negotiable rather than optional polish. A page that makes ten specific, sourced, quantified claims is structurally more citable than a 2,000-word page of confident generalities, even if the second one ranks higher in classic search.

What makes a page technically citable

Content quality gets you considered. Technical accessibility gets you parsed. Five signals do most of the work.

1. Let the AI crawlers in. Generative engines use named user agents to fetch pages: GPTBot (OpenAI), PerplexityBot, Google-Extended, ClaudeBot and anthropic-ai (Anthropic), CCBot (Common Crawl, which seeds many training sets), and Applebot-Extended. If your robots.txt blocks them — or your firewall silently challenges them — you are invisible to those engines no matter how good the content is. Audit the allowlist deliberately; OpenAI publishes the GPTBot ranges and behavior in its bot documentation.

2. Publish an llms.txt. The llms.txt proposal, introduced by Jeremy Howard in 2024, is a Markdown file at your site root that gives models a clean, curated map of your most important pages and a plain-language description of what your business does. It is the AI-era analogue of an XML sitemap: where the sitemap is for crawlers indexing everything, llms.txt is for models that want the high-signal summary fast. Adoption is still early, but it is cheap insurance and a clear provenance signal.

3. Ship structured data. Schema.org JSON-LD — Article, FAQPage, Organization, LocalBusiness, BreadcrumbList — gives the engine machine-readable facts instead of asking it to infer them from prose. Google's own guidance is explicit that structured data helps its systems understand page content, and the same markup that earns rich results in classic search gives generative engines an unambiguous fact source (Google Search Central). For a local business, a correct LocalBusiness block with name, address, phone, and aggregateRating is one of the highest-leverage things you can ship, because it directly counters the "the AI quoted the wrong rating" failure mode.

4. Render content server-side. If the body of your page only appears after client-side JavaScript runs, a fetch-only crawler may see an empty shell. Bing's webmaster guidelines have long recommended that critical content be present in the initial HTML response so it can be reliably indexed (Bing Webmaster Guidelines) — and the same principle protects you with AI crawlers that don't execute JavaScript.

5. Build entity consistency. Engines reason about entities — your brand, its location, its category — not just strings. When your name, address, phone, and description match across your site, your structured data, and third-party directories, the model gets a coherent picture and is more willing to assert facts about you. When they conflict, it hedges or omits you.

A repeatable GEO workflow

Tactics are useless without a loop. Here is the cycle that actually moves citations, in order.

Pick a striking-distance query. Start with conversational, commercial questions where you already rank on page one or two of classic search — you have proven relevance, so the citation is within reach. "Best [category] for [audience]" and "[problem] solution" phrasings convert.
Write the citable answer. Cover the question directly in the first 150 words, then go deep. Apply the top three Princeton signals hard: cite a real source for every claim, quantify everything you can, and quote at least one named authority.
Stamp the schema. Add Article + FAQPage JSON-LD so the engine can extract your Q&As verbatim — FAQ markup is unusually effective because it pre-formats answers in exactly the shape an engine wants to lift.
Open the crawlers. Confirm the page isn't blocked for GPTBot, PerplexityBot, Google-Extended, and ClaudeBot, and that it's listed in llms.txt.
Measure the citation. Run the target query against each engine and record whether you're named, linked, or absent — and who got cited instead. This is the step that converts GEO from faith to feedback.
Iterate. If a competitor was cited and you weren't, read their page through the nine-tactic lens. Usually they have more sourced specifics. Close that gap and re-test.

Measuring it: the part nobody automates by hand

Steps 1 through 4 are content work you can do once. Step 5 is the one that breaks down manually, because AI answers are non-deterministic — ask the same question twice and the wording, and sometimes the citations, change. A single check tells you almost nothing; you need the same prompt run repeatedly, across multiple engines, tracked over time, so a real trend separates from run-to-run noise.

That is the gap purpose-built AI-visibility tracking fills. Instead of one human asking ChatGPT one question once, a tracker runs a stable prompt set across ChatGPT, Perplexity, Gemini, Copilot, Claude, and Grok on a schedule, logs every mention and citation, and charts your share of voice against competitors over weeks. The signal you're after isn't "did I get cited today" — it's "is my citation rate climbing as I ship more sourced content." BrightLocal's research on local consumer behavior has shown for years how heavily buyers lean on what third parties say about a business; AI engines now compress all of that into a single sentence, which makes monitoring that sentence the whole game (BrightLocal Local Consumer Review Survey).

The discipline is the same one classic SEO learned a decade ago: you cannot improve what you do not measure, and you cannot measure a noisy signal with a single sample.

Five mistakes that quietly cost you citations

Most businesses don't fail at GEO because they did something wrong — they fail because of omissions that never show up in a classic SEO audit.

Blocking the crawlers by accident. The most common failure is a robots.txt or a web-application firewall that challenges or blocks AI user agents. Many sites added a blanket Disallow for GPTBot in 2023 over training-data concerns, then never revisited it — and are now invisible to ChatGPT's live browsing. Decide deliberately: blocking training crawls and allowing answer-time retrieval are different choices, and OpenAI documents both behaviors in its bot guidance.

Confident claims with no source. A page that asserts "we're the leading provider" gives a model nothing to attribute. A page that says "independent testing measured a 40% lift (source)" gives it a quotable, attributable fact. The Princeton data is blunt on this: sourced, quantified passages are the ones engines lift.

Conflicting business facts. When your phone number on the site differs from your Google Business Profile, and your category differs from your directory listings, the engine can't resolve the entity confidently — so it hedges or omits you. BrightLocal's consumer research has repeatedly shown how much weight buyers put on consistent, corroborated business information, and engines now apply the same standard programmatically (BrightLocal).

Thin FAQ coverage. Conversational queries are the bread and butter of AI search, and an FAQPage-marked Q&A is the single most extractable content block you can ship. A page with six real, specific Q&As answers six more conversational prompts than a page with none — and each one is a fresh citation opportunity.

Treating it as one-and-done. Citations drift. An engine that named you this month may name a competitor next month after they publish a stronger sourced page. GEO is a maintenance discipline, not a launch task, which is exactly why the measurement loop matters more than any single edit.

Where this is heading

GEO today rewards the boring fundamentals — real sources, real numbers, real quotes, clean markup, open crawlers. That is good news, because it means the businesses that win AI citations are the ones doing genuinely trustworthy work, not the ones gaming a ranking signal. The Princeton authors were explicit that their methods improve legitimate visibility rather than manipulate it, which is exactly why engines reward them.

The practical move is to stop treating "AI search" as a separate future project. Every page you publish from here on should be built to win both competitions at once: structured and sourced enough to rank, citable and quotable enough to be named in the answer. Then track the citation, because in a world where the click is optional, the citation is the conversion.

Frequently asked questions

What is generative engine optimization (GEO)? GEO is the practice of optimizing content and technical signals so that generative AI engines — like ChatGPT, Perplexity, and Gemini — cite, quote, or recommend your business inside their answers. It complements classic SEO: SEO targets search rankings and clicks, while GEO targets being named as a source in the AI-generated response itself.

Is GEO different from SEO? Yes, though they overlap. SEO optimizes for ranking in a list of links; GEO optimizes for being selected and cited when a model composes a single answer. Many SEO fundamentals (crawlability, structured data, authoritative content) help both, but GEO adds an emphasis on inline citations, statistics, quotations, and clean machine-readable facts that engines can extract and attribute.

How do I get cited by ChatGPT and Perplexity? Make your pages easy to parse and worth quoting: cite authoritative sources for every claim, include specific statistics and attributed expert quotes, add Schema.org JSON-LD (Article + FAQPage), allow the AI crawlers (GPTBot, PerplexityBot, Google-Extended, ClaudeBot) in robots.txt, and publish an llms.txt. Then run the target query against each engine to confirm whether you were cited.

What is llms.txt and do I need one? llms.txt is a Markdown file at your site root that gives AI models a curated map of your most important pages plus a plain-language description of your business. It is the AI-era counterpart to a sitemap. Adoption is early, but it is inexpensive to add and provides a clear provenance signal, so it is worth shipping.

How do I measure whether AI engines actually cite my business? Run a fixed set of prompts across multiple engines on a recurring schedule and log every mention and citation. Because AI answers are non-deterministic, a single check is unreliable — you need repeated, multi-engine tracking over time to see a real trend. Purpose-built AI-visibility trackers automate this and show your citation share versus competitors.

All guides