Engagement Foundation Review | Tonic.ai

GEO Readiness

Where You Stand Today

Before we measure citation visibility in the unstructured data de-identification space, these three signals tell us whether AI crawlers can access and trust Tonic.ai's content.

Technical Readiness

Needs Attention

1 high-severity finding: stale content on high-value content marketing pages (9 of 15 pages scored 0.2 or below on freshness). 6 medium-severity findings across heading hierarchy, sitemap structure, thin content, and content duplication.

Content Freshness

At Risk

Critical finding: 15 content marketing pages average 0.32 freshness, outside the 2–3 month citation window where AI platforms concentrate 76.4% of citations. 9 pages older than 180 days, 3 pages older than 365 days. 3 pages updated within 90 days. 27 product pages with no detectable date — verify manually.

Crawl Coverage

Good

All 7 major AI crawlers (GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Googlebot, Bytespider) are allowed via robots.txt. Sitemap accessible with 1,710 URLs indexed.

Executive Summary

What You Need to Know

AI search is reshaping how buyers discover unstructured data de-identification solutions. Companies establishing GEO visibility now gain a compounding advantage as AI platforms learn to trust cited domains. Tonic Textual operates in a market with 5 primary competitors — Private AI, Microsoft Presidio, Google Cloud DLP, Amazon Comprehend/Macie, and Nightfall AI — and 5 buyer personas spanning security, compliance, privacy, data science, and government, with the CISO and VP of Compliance holding veto authority over purchase decisions.

Layer 1 reveals one high-severity finding: "Stale Content on High-Value Content Marketing Pages" — 60% of content marketing pages scored 0.2 or below on freshness, meaning competitors with fresher comparison and guide content will be preferentially cited in AI responses to evaluation-stage queries. Six medium-severity findings include multiple H1 tags on commercial pages, a sitemap missing lastmod dates across all 1,710 URLs, and thin content on core product pages. The technical foundation is otherwise sound — all major AI crawlers are allowed and the site renders substantive content.

Two actions before the validation call: (1) Validate the three newly added personas — DPO/Head of Privacy, VP of Data Science, and Government Records Officer — at the call, as each drives a distinct query cluster (GDPR unstructured compliance, AI training data decontamination, FOIA redaction); if any are irrelevant to Tonic Textual's actual deal motion, we remove their query clusters entirely. (2) Engineering should start on sitemap lastmod dates and the multiple-H1 CMS template fix now — these are structural improvements that don't depend on validation call decisions.

TL;DR — Action Items

🟡 High: Stale Content on High-Value Content Marketing Pages — Content team should prioritize refreshing the 3 pages over 365 days old and adding visible dates to all 4 case studies to restore freshness signal for AI citation.
🟣 Validate at the Call: DPO/Head of Privacy, VP of Data Science, Government Records Officer — These three personas each generate 15-20 intent-specific queries; if any don't appear in actual deals, removing them sharpens the query set and avoids diluting results across irrelevant buyer types.
🟣 Validate at the Call: Linda Park (VP of Compliance) confidence level — This persona is sourced from llm_inference at medium confidence. If her role, department, or veto power are wrong, the compliance-focused query cluster gets mistargetted.
✅ Start Now: Sitemap lastmod dates on all 1,710 URLs — Engineering can add lastmod timestamps without waiting for the call — this restores crawl prioritization for recently updated content across the entire site.
✅ Start Now: Fix multiple H1 tags on 8+ commercial pages — The CMS template issue that outputs multiple H1s per page dilutes topical authority; engineering can audit and fix the template independently.
📋 Validation Call: Are the 5 new competitors the right competitive set for Tonic Textual? — Confirming Private AI, Presidio, Google Cloud DLP, Amazon Comprehend/Macie, and Nightfall AI as the primary set determines which 30-40 head-to-head comparison queries drive the audit.

How This Works

Reading This Document

Three things to know before you start.

What This Is This document maps the competitive landscape, buyer personas, feature taxonomy, and technical baseline for Tonic.ai in the unstructured data de-identification and document redaction market. Every element directly feeds the query set that the audit will test against AI platforms. If something is wrong here, the audit tests the wrong questions.

What We Need From You Purple boxes like this one appear throughout the document. Each one asks a specific question whose answer changes how the audit runs. Collect your answers before the validation call — or bring your team leads who can answer on the spot.

Confidence Badges Every data point carries a confidence badge: High means sourced from public data (competitor sites, review platforms, product pages). Medium means inferred from category patterns or partial data. Low means best-guess — needs validation. Focus your review time on medium and low confidence items.

Company Profile

Tonic.ai

The company profile anchors every query in the audit. If the category or product focus is wrong, queries target the wrong buying conversation.

Company Details

Company Name Tonic.ai High

Domain tonic.ai

Name Variants Tonic, TonicAI, Tonic AI, Tonic.ai Inc., Tonic.ai Inc

Category Unstructured data de-identification, document redaction, PII detection and masking for free text and documents, and AI training data privacy

Segment Startup

Key Products Tonic Textual

Validate Tonic.ai offers multiple products (Structural, Textual, Fabricate) but this audit focuses exclusively on Tonic Textual. Does the buying conversation for Textual happen independently from Structural/Fabricate, or do enterprise buyers evaluate the full Tonic platform as a bundle? If bundled, we should add platform-level comparison queries alongside the Textual-specific set.

Buyer Personas

Who Buys Tonic Textual

5 personas: 2 decision-makers, 3 evaluators. Each persona drives a distinct query cluster targeting their specific buying concerns in the unstructured data de-identification purchase decision.

Critical Review Area Personas are the highest-leverage input in this document. Adding or removing a persona changes the query set by 15-20 queries. Changing a persona's influence level changes whether we test evaluation-stage or approval-stage queries for that role.

Data Sourcing Name, role, department, seniority, influence level, veto power, and technical level are sourced from the knowledge graph. Buying jobs, query focus areas, and role descriptions are synthesized from the KG data to illustrate how each persona maps to audit queries. Review the KG-sourced fields for accuracy; the synthesized fields will update automatically.

Priya Sharma

Chief Information Security Officer

Decision-maker High

C-Suite security leader responsible for document and unstructured data privacy, PII exposure risk in free text and documents. Evaluates whether de-identification solutions meet security standards before sensitive data workflows are approved.

Veto power: Yes — can block any solution that doesn't meet security and data privacy requirements

Technical level: High

Primary buying jobs: Assess PII exposure risk across unstructured data, validate that de-identification is complete and auditable, ensure compliance with security frameworks

Query focus areas: Unstructured data PII detection accuracy, document redaction security, de-identification completeness verification, data privacy compliance for free text

Source: Review mining

→ Does the CISO evaluate Tonic Textual independently, or does InfoSec delegate unstructured data privacy to a dedicated Data Privacy team? If delegated, we should add a Director of Data Privacy persona and shift security-specific queries to that role.

Linda Park

VP of Compliance & Data Governance

Decision-maker Med

VP-level compliance leader focused on document redaction at scale and proving unstructured data de-identification for regulatory compliance. Drives the business case when regulatory pressure makes manual redaction untenable.

Veto power: Yes — can block deployment if compliance audit trail requirements aren't met

Technical level: Low

Primary buying jobs: Prove to auditors that unstructured PII was properly removed, build the compliance case for automated redaction over manual processes, evaluate audit trail completeness

Query focus areas: HIPAA/GDPR document de-identification proof, automated redaction audit trails, compliance reporting for unstructured data, regulatory document processing

Source: LLM inference

→ Is the VP of Compliance a real buyer in Textual deals, or does procurement flow through Security (CISO)? If compliance is advisory rather than decision-making, we reclassify as evaluator and reduce validation-stage query weight for this persona.

DPO / Head of Privacy

Data Protection Officer / Head of Privacy

Evaluator

Director-level privacy specialist in Privacy & Compliance, focused on GDPR compliance and unstructured PII identification and remediation. Evaluates whether solutions can handle the specific PII patterns and regulatory requirements of their jurisdiction.

Primary buying jobs: Evaluate PII detection accuracy for GDPR-regulated data types, assess cross-border data handling, validate right-to-erasure compliance for unstructured data stores

Query focus areas: GDPR unstructured data compliance tools, PII detection for European data, right-to-erasure automation, cross-border document redaction

→ Does a dedicated DPO or Head of Privacy appear in Tonic Textual deals, or is GDPR compliance handled by the CISO or VP of Compliance? If this role doesn't exist in deals, we merge GDPR queries into the CISO cluster and remove the dedicated persona.

VP/Director of Data Science

VP of Data Science / ML Engineering

Evaluator

VP-level data science leader in Data Science & ML, focused on AI training data decontamination and removing PII from unstructured training corpora. Needs de-identification that preserves data utility for model training while guaranteeing PII removal.

Primary buying jobs: Validate PII removal from training datasets without destroying data utility, assess batch processing throughput for large corpora, evaluate integration with ML pipelines

Query focus areas: AI training data PII removal, LLM fine-tuning data privacy, unstructured data decontamination for ML, PII-safe training corpus generation

→ Do data science/ML teams evaluate Tonic Textual for AI training data decontamination, or is this use case sold through the CISO or compliance path? If data science teams are the primary driver, we weight AI training queries higher in the query set.

Government Records Officer

Government Records / FOIA Officer

Evaluator

Director-level government specialist responsible for FOIA redaction, public records processing, and government document de-identification. Evaluates automated redaction tools that can handle high-volume FOIA request backlogs with compliance-grade audit trails.

Primary buying jobs: Evaluate automated FOIA redaction tools, assess government compliance (FedRAMP, FISMA), validate audit trail requirements for public records, measure throughput for large document volumes

Query focus areas: FOIA redaction software, government document de-identification, public records automation, FedRAMP-compliant redaction tools

→ Is government/public sector a real vertical for Tonic Textual today, or an aspirational market? If Textual doesn't have active government deals or case studies, we deprioritize FOIA-specific queries and reallocate to enterprise verticals where deals are actually closing.

Missing Personas? These roles sometimes appear in unstructured data de-identification deals — do they show up in yours? General Counsel / Head of Legal (if legal drives redaction procurement separately from compliance). VP of Engineering / Platform Engineering (if API integration and pipeline embedding is a distinct evaluation track). Chief Data Officer (if unstructured data governance reports to a dedicated CDO rather than CISO). Who else shows up in Tonic Textual deals?

Competitive Landscape

Who You're Compared Against

5 primary competitors identified. Tier assignments determine which head-to-head comparison queries the audit tests.

Competitive GEO Context Tier assignments determine which queries test direct competitive differentiation. Each primary competitor generates 6-8 head-to-head queries like "Tonic Textual vs Private AI for document redaction" or "best PII detection tool for unstructured data." Getting these tiers right determines which approximately 30-40 queries test direct competitive positioning vs. category awareness. All 5 competitors are new additions focused on the Textual market — this is a complete reset from the previous Structural-focused competitive set.

Primary Competitors

Private AI

Primary

Direct competitor in unstructured data de-identification. Specializes in PII detection and redaction for documents and free text with a privacy-first approach.

Microsoft Presidio

Primary

Open-source PII detection and de-identification framework from Microsoft. Strong developer adoption and Azure integration, competes on extensibility and cost.

Google Cloud DLP

Primary

Google Cloud's data loss prevention service with PII detection and de-identification capabilities. Enterprise cloud-native solution with deep GCP ecosystem integration.

Amazon Comprehend / Macie

Primary

AWS's NLP and data security services offering PII detection (Comprehend) and sensitive data discovery (Macie). Competes on AWS ecosystem lock-in and scale.

Nightfall AI

Primary

Cloud-native DLP platform specializing in detecting and remediating sensitive data across SaaS applications, APIs, and data flows. Focuses on real-time PII detection.

Validate This competitive set is a complete rebuild focused on the Tonic Textual market. Three questions: (1) Are there vendors we missed who appear in actual Textual deals — particularly specialized document redaction tools or emerging AI-native privacy platforms? (2) Should any of the cloud-native solutions (Google Cloud DLP, Amazon Comprehend/Macie) be moved to secondary tier if buyers don't directly compare Textual against cloud-platform tools? (3) Is Microsoft Presidio a real deal competitor or primarily an open-source alternative that buyers evaluate differently from commercial solutions?

Feature Taxonomy

What Buyers Evaluate

8 buyer-level capabilities mapped. Feature strength ratings determine which capability queries emphasize Tonic Textual's advantages vs. areas where competitors may lead.

Free-Text & Document De-identification Strong High

Detect and redact sensitive information in documents, PDFs, free-text fields, and files before using them for AI training or testing

AI & LLM Training Data Decontamination Strong High

Prepare safe, realistic training datasets for AI models and LLM fine-tuning without exposing production PII

Redaction Audit Trails & Compliance Reporting Moderate Med

Generate privacy reports and audit trails proving data was properly de-identified for HIPAA, GDPR, and SOC 2 audits

Named Entity Recognition & PII Detection

Automated identification of named entities and PII across unstructured text and documents

Document & PDF Redaction

Automated redaction of sensitive information from PDFs and structured documents

Guided Human-in-the-Loop Redaction

Interactive redaction workflows where humans review and confirm automated PII detection before redaction is applied

Multi-Language PII Detection & Redaction

PII detection and redaction across multiple languages and character sets

Bulk & Batch Document Processing

Processing large volumes of documents for de-identification at enterprise scale

Incomplete Strength Ratings 5 of 8 features have no strength rating or confidence badge — these are newly added capabilities that haven't been assessed against competitors yet. At the validation call, we need Tonic.ai's assessment of where Textual is strong, moderate, or weak on each unrated feature relative to Private AI, Presidio, Google Cloud DLP, Amazon Comprehend/Macie, and Nightfall AI. Without strength ratings, the audit can't differentiate between capability queries where Textual should lead vs. where it needs to play defense.

Validate Three questions for the call: (1) For the 5 unrated features — NER & PII Detection, Document & PDF Redaction, Guided Redaction, Multi-Language Support, Bulk Processing — where does Textual honestly stand relative to the cloud-native competitors (Google DLP, Amazon Comprehend) who have massive scale advantages? (2) Are there capabilities we're missing that differentiate Textual from open-source alternatives like Presidio? (3) Should "Guided Human-in-the-Loop Redaction" and "Free-Text & Document De-identification" be merged, or do buyers evaluate interactive review as a separate capability?

Pain Point Taxonomy

What Buyers Struggle With

6 pain points: 6 high severity. Pain point buyer language shapes how queries are phrased — these are the words real buyers use when searching for solutions.

Unstructured Data Is the Biggest Privacy Blind Spot High High

"We masked the database columns but our documents and free-text fields still have customer names and SSNs all over them"

Personas: CISO, VP of Compliance & Data Governance

AI Teams Blocked by Unstructured PII in Training Data High High

"Our data science team is blocked because legal won't let them train models on production data, and the synthetic alternatives they tried don't preserve the patterns they need"

Personas: CISO, VP of Compliance & Data Governance

Cannot Prove Unstructured Data Was Properly De-identified High High

"Our compliance team can't prove to auditors that test environments don't contain real PHI — we're manually spot-checking and hoping for the best"

Personas: VP of Compliance & Data Governance, CISO

Document Redaction at Scale Is a Manual Nightmare High

"Legal and compliance teams are manually redacting sensitive information from thousands of documents — slow, error-prone, and impossible to scale"

FOIA & Public Records Redaction Overwhelms Government Teams High

"Government agencies face growing FOIA request volumes but redaction is manual, creating massive backlogs and compliance risk"

Manual Redaction Is Slow, Expensive, and Misses PII High

"Human reviewers manually redacting documents miss PII, introduce inconsistencies, and cannot keep up with volume — every missed redaction is a potential breach"

Validate All 6 pain points are rated high severity. Three questions: (1) Is "FOIA & Public Records Redaction" actually a pain point Tonic Textual's current customers experience, or is this aspirational? If aspirational, we should reduce severity or remove to avoid testing queries for a market Textual isn't actively serving. (2) Do buyers distinguish between "Document Redaction at Scale" and "Manual Redaction Is Slow, Expensive, and Misses PII" as separate problems, or are these the same pain expressed differently — should we merge? (3) Are there pain points we're missing around data residency / cross-border PII handling or multi-format document support (images, scanned PDFs, handwritten notes) that Textual buyers frequently cite?

Site Analysis

Layer 1 Technical Findings

9 findings from the technical site analysis. These are engineering items — several can start before the validation call.

Engineering Action Required The top finding — "Stale Content on High-Value Content Marketing Pages" — affects 9 of 15 content marketing pages, all scoring 0.2 or below on freshness. The content team should prioritize refreshing the 3 pages over 365 days old and adding visible dates to case studies. Engineering should independently start on two structural items: (1) add lastmod dates to all 1,710 sitemap URLs and (2) fix the CMS template that outputs multiple H1 tags on 8+ commercial pages. These don't require the validation call.

🟡 Stale Content on High-Value Content Marketing Pages

What we found: 9 of 15 content marketing pages (60%) scored 0.2 or below on freshness, indicating content older than 180 days or missing date signals entirely. Three pages are confirmed over 365 days old: the K2View entity modeling blog (March 2024), the enterprise test data strategy guide (March 2025), and the data de-identification guide (April 2024). All four case studies lack visible publication dates, defaulting to the minimum freshness score. The category-weighted freshness average across content marketing is 0.32.

Why it matters: AI platforms heavily weight content freshness when selecting sources to cite. Research shows 76.4% of ChatGPT's most-cited pages were updated within 30 days. Content marketing pages (comparisons, guides, case studies) compete directly for informational and evaluation queries — stale content in this category means competitors with fresher content get cited instead.

Business consequence: Queries like "best PII detection tool for unstructured data" or "document redaction software comparison" may preferentially cite competitors with fresher guide and comparison content, giving Private AI, Nightfall AI, and others a citation advantage in evaluation-stage queries where Tonic Textual's stale content is deprioritized.

Recommended fix: Prioritize refreshing the three pages over 365 days old with updated data, current product capabilities, and fresh dates. Add visible publication and last-updated dates to all case studies. Establish a 90-day review cadence for comparison and guide content to maintain freshness within the dominant AI citation window.

Impact: High Effort: 1-2 weeks Owner: Content Affected: 9 content marketing pages including 3 guides, 2 comparison pages, and 4 case studies

🔵 Multiple H1 Tags on Commercial Pages

What we found: At least 8 commercially important pages have multiple H1 tags: the homepage (6 H1s), Tonic Datasets product page (6 H1s), government redaction capability page (7 H1s), Salesforce integration page (5 H1s), clinical notes for AI page (5 H1s), K2View comparison page (multiple H1s), PrivateAI comparison page (multiple H1s), and Tonic Subset (2 H1s). This appears to be a CMS template issue where each section hero block outputs its own H1.

Why it matters: AI crawlers and search engines use the H1 tag to identify the primary topic of a page. Multiple H1s dilute topical authority and make passage extraction unreliable — the AI system cannot determine which H1 represents the page's primary topic. This directly reduces the page's probability of being cited in response to topic-specific queries.

Business consequence: When a buyer asks "Tonic Textual vs Private AI for document redaction," AI platforms may struggle to extract the correct comparison narrative from a page with 7 competing H1 headings, potentially citing a competitor's cleaner comparison page instead.

Recommended fix: Audit all page templates in the CMS and ensure each page renders exactly one H1 tag. Convert secondary hero headings to H2 or styled div elements. Prioritize the homepage, Salesforce integration, and government redaction pages as they carry the most heading violations.

Impact: Medium Effort: 1-3 days Owner: Engineering Affected: 8+ pages across product, comparison, capability, and integration page types

🔵 Sitemap Missing lastmod Dates on All 1,710 URLs

What we found: The sitemap at https://www.tonic.ai/sitemap.xml contains 1,710 URLs, none of which include lastmod timestamps. The sitemap is a flat file (not a sitemap index), mixing product pages, blog posts, release notes, and guides without date differentiation.

Why it matters: AI crawlers use sitemap lastmod dates to prioritize which pages to re-crawl and to assess content freshness without fetching each page. Without lastmod, crawlers must either fetch every URL to check for updates or rely on HTTP headers alone. This means recently updated content gets no crawl priority advantage over stale content, reducing the freshness signal available to AI citation algorithms.

Business consequence: Even after Tonic Textual refreshes comparison and guide content, AI crawlers have no sitemap signal to re-crawl those updated pages faster than the other 1,700 URLs — delaying the freshness benefit of content updates.

Recommended fix: Add lastmod dates to all sitemap URLs, sourced from the CMS's actual last-modified timestamp for each page. Consider splitting the monolithic sitemap into a sitemap index with separate child sitemaps for pages, blog posts, guides, and release notes — this helps crawlers identify commercially relevant content faster.

Impact: Medium Effort: 1-3 days Owner: Engineering Affected: All 1,710 URLs in the sitemap

🔵 Thin Content on Core Product and Capability Pages

What we found: Six commercially important pages scored below 0.4 on content depth: Tonic Validate (0.20), Tonic Datasets (0.25), Tonic Subset (0.30), Tonic NoSQL (0.30), the partners listing page (0.30), and the compliance solution page (0.40). These pages rely on marketing language and template-driven layouts with minimal substantive content.

Why it matters: AI models need substantive, specific content to generate accurate citations. Pages scoring below 0.4 content depth lack sufficient detail for an LLM to answer specific buyer questions. Competitors with deeper content on the same topics will be preferentially cited.

Business consequence: When buyers query "how does Tonic Textual handle document redaction at scale," AI platforms may not find enough substantive detail on Tonic.ai's thin capability pages to generate a citation, defaulting to competitors with richer technical content.

Recommended fix: Expand thin product pages with technical detail: specific capabilities with explanations, benchmarks or performance data, customer use case examples, and differentiated content per page. Prioritize Tonic Validate (open-source RAG evaluation) and Tonic Subset (patented subsetting) with technical explanations and getting-started content.

Impact: Medium Effort: 2-4 weeks Owner: Content Affected: 6 pages: /products/validate, /products/tonic-datasets, /products/tonic-subset, /products/tonic-nosql, /partners, /solutions/use-case/compliance

🔵 Near-Duplicate Content Between Capability Pages

What we found: The government redaction page (/capabilities/government-redaction) and enterprise guided redaction page (/capabilities/guided-redaction-enterprise) share near-identical capability descriptions for their core workflow features (AI detection, human-in-the-loop, collaboration, audit trails, scale). The shared content blocks appear to be the same CMS components rendered on both pages.

Why it matters: Near-duplicate content creates a cannibalization risk for AI citation. When two pages contain substantially similar text, AI systems may reduce confidence in both or arbitrarily select one, rather than citing the most contextually appropriate page.

Business consequence: Queries like "FOIA document redaction software" and "enterprise guided redaction tool" should each return the contextually appropriate Tonic Textual page, but near-duplicate content means AI platforms may cite neither or the wrong one.

Recommended fix: Differentiate the two pages with unique, vertical-specific content. The government page should include FOIA-specific workflows, FedRAMP/FISMA compliance language, and agency case studies. The enterprise page should develop finance, legal, and healthcare verticals with vertical-specific examples.

Impact: Medium Effort: 1-2 weeks Owner: Content Affected: 2 pages: /capabilities/government-redaction and /capabilities/guided-redaction-enterprise

🔵 Missing H1 Tag on eBay Case Study

What we found: The eBay case study page renders its title as an H2 rather than an H1. All other case study pages use H1 for the title.

Why it matters: The H1 tag signals the page's primary topic to AI crawlers. Without it, the page's topical authority is weakened. The eBay case study contains a strong enterprise proof point (8 PB to 1 GB subsetting) from a VP of Engineering — this content deserves full structural support for AI extraction.

Business consequence: The eBay case study is a powerful enterprise proof point, but without a proper H1, AI platforms may not extract it as confidently when responding to queries like "enterprise PII de-identification case studies" or "Tonic.ai customer results."

Recommended fix: Update the eBay case study template to render the page title as an H1 tag, consistent with other case study pages.

Impact: Medium Effort: < 1 day Owner: Engineering Affected: 1 page: /case-study/getting-ebay-developers-the-data-theyre-looking-for-with-tonic

Manual Verification Checklist

The following items could not be assessed through our analysis method (rendered markdown). We recommend your engineering team verify these manually before the validation call.

Schema Markup Cannot Be Assessed

What to check: JSON-LD structured data (schema.org markup) is not visible in the rendered markdown output. Verify whether product pages use Product schema, blog posts use Article schema, case studies use CaseStudy schema, and FAQ sections use FAQPage schema.

Recommended action: Audit all page types using Google's Rich Results Test or Schema Markup Validator. Ensure Product schema on product pages, Article schema with datePublished/dateModified on blog/guide pages, FAQPage schema on pages with FAQ sections, Organization schema on the about page.

Effort: 1-3 days Owner: Engineering

Client-Side Rendering Status Cannot Be Assessed

What to check: The site appears to be built on Webflow or a similar platform. Test 3-5 representative pages with JavaScript disabled. If content is absent or significantly reduced, AI crawlers that don't execute JavaScript may see empty pages.

Recommended action: Test with JavaScript disabled in a browser. If content is absent, implement server-side rendering (SSR) or static site generation (SSG) for commercially important pages.

Effort: < 1 day Owner: Engineering

Meta Descriptions and OG Tags Cannot Be Assessed

What to check: Verify that all commercially important pages have unique, descriptive meta descriptions (150-160 characters) and complete OG tags (og:title, og:description, og:image).

Recommended action: Use a social preview tool or view-source to audit all commercially relevant pages for meta descriptions and OG tags.

Effort: 1-3 days Owner: Content

Site Analysis Summary

Total Pages Analyzed 45

Commercially Relevant Pages 45

Avg Heading Hierarchy 0.64

Avg Content Depth 0.55

Freshness (weighted) 0.32 (blog: 0.32, product: unable to assess, structural: unable to assess)

Avg Passage Extractability 0.58

Schema Coverage Unable to assess (45 pages unscored)

Partial Assessment Schema coverage could not be assessed for any of the 45 pages through the rendered markdown analysis method. 30 pages (27 product + 3 structural) have no freshness score due to missing date signals. Engineering should verify schema markup and publication dates across these pages before the validation call.

Next Steps

What Happens Next

Why Now

• AI search adoption is accelerating — buyer discovery patterns are shifting quarter over quarter
• Early citations compound: domains that AI platforms learn to trust now get cited more frequently as training data accumulates
• Competitors who establish GEO visibility first create a structural disadvantage for late movers
• Unstructured data de-identification is still early-innings in GEO optimization — acting now means competing against inaction, not against entrenched strategies

The full audit will measure citation visibility across buyer queries in the unstructured data de-identification space, including queries like "best PII detection tool for documents," "FOIA redaction software comparison," and "how to remove PII from AI training data." You'll see exactly which queries return results that include Private AI, Microsoft Presidio, Google Cloud DLP, or Nightfall AI but not Tonic Textual — and what it would take to appear in them. Fixing the technical items identified in Layer 1 now improves your baseline visibility before the audit measures it.

01

Validation Call

45-60 minute call to walk through this document together. Confirm personas, competitors, features, and pain points. Every correction sharpens the query set.

02

Query Generation & Execution

Buyer queries generated from the validated KG and executed across selected AI platforms. Each persona and competitor combination produces targeted queries testing Tonic Textual's visibility.

03

Full Audit Delivery

Complete visibility analysis with competitive positioning, content gap prioritization based on actual query data, and a three-layer action plan targeting the highest-impact opportunities.

Start Now — Before the Call These don't depend on the rest of the audit and will improve your baseline visibility before we even measure it:

• Add lastmod dates to all 1,710 sitemap URLs — restores crawl prioritization so recently updated content gets re-crawled faster
• Fix the multi-H1 CMS template issue — audit page templates and ensure each page renders exactly one H1 tag; prioritize homepage, government redaction, and Salesforce integration pages
• Fix the eBay case study H1 — change the title from H2 to H1 to match other case study pages
• Verify schema markup — use Rich Results Test to confirm Product, Article, and FAQPage schema are present on relevant page types
• Verify client-side rendering — test 3-5 pages with JavaScript disabled to confirm content is accessible to AI crawlers

Before the Call

Your Pre-Call Checklist

Two jobs before we meet. The questions on the left require your judgment — no one knows your business better than you. The engineering tasks on the right don't require the call at all.

Questions for You

Are Private AI, Presidio, Google Cloud DLP, Amazon Comprehend/Macie, and Nightfall AI the right primary competitive set for Tonic Textual?

If wrong: 30-40 head-to-head comparison queries target the wrong competitors

Does the Tonic Textual buying conversation happen independently from Structural/Fabricate, or do enterprise buyers evaluate the full platform as a bundle?

If wrong: we need platform-level comparison queries alongside Textual-specific ones

Does the CISO evaluate Tonic Textual independently, or does InfoSec delegate unstructured data privacy to a dedicated Data Privacy team?

If wrong: we add a Director of Data Privacy persona and shift security queries to that role

Is the VP of Compliance a real decision-maker in Textual deals, or does procurement flow through the CISO?

If wrong: we reclassify as evaluator and reduce validation-stage queries for compliance

Does a dedicated DPO or Head of Privacy appear in Textual deals, or is GDPR compliance handled by the CISO?

If wrong: we merge GDPR queries into the CISO cluster and remove the dedicated persona

Do data science/ML teams evaluate Tonic Textual for AI training data decontamination, or is this sold through the CISO/compliance path?

If wrong: we reweight AI training queries or remove the VP of Data Science persona

Is government/public sector a real vertical for Tonic Textual today, or an aspirational market?

If wrong: we deprioritize FOIA-specific queries and reallocate to enterprise verticals

For the 5 unrated features, where does Textual honestly stand relative to cloud-native competitors on NER, PDF redaction, guided redaction, multi-language, and bulk processing?

If wrong: audit can't differentiate capability queries where Textual leads vs. plays defense

Is "FOIA & Public Records Redaction" a real customer pain point, or aspirational?

If wrong: queries test a market Textual isn't actively serving

Should Google Cloud DLP and Amazon Comprehend/Macie be secondary tier if buyers don't directly compare Textual against cloud-platform tools?

If wrong: ~12-16 head-to-head queries test matchups that don't reflect real deal dynamics

For Engineering — Start Now

Add lastmod dates to all 1,710 sitemap URLs

Restores crawl prioritization so updated content gets re-crawled faster

Fix the CMS template that outputs multiple H1 tags on 8+ commercial pages

Restores topical authority for AI passage extraction on commercial pages

Update the eBay case study to render title as H1 (currently H2)

Ensures enterprise proof point gets proper structural support for AI extraction

Verify schema markup across all page types using Rich Results Test

Confirms whether Product, Article, and FAQPage schema are present

Test 3-5 representative pages with JavaScript disabled

Confirms content is accessible to AI crawlers that don't execute JavaScript

Alignment

We're Aligned On

This isn't a contract — it's a shared understanding. The audit runs against what's below. If something changes between now and the call, we adjust. The goal is to make sure we're asking the right questions for the right buyers against the right competitors.

Already Confirmed

Competitive set — 5 primary competitors in the unstructured data de-identification market

Persona set — 5 personas: 2 decision-makers, 3 evaluators

Feature taxonomy — 8 buyer-level capabilities mapped (3 with strength ratings, 5 pending assessment)

Pain point set — 6 buyer frustrations, all rated high severity

Layer 1 technical audit — 9 findings logged (1 high, 6 medium diagnostic + 3 verification items), engineering notified

Decided at the Call

Feature strength ratings for 5 unrated capabilities — without these, the audit can't differentiate strength vs. defense queries

Government/FOIA vertical validation — if not a real market for Textual, removes ~15 government-specific queries and the Government Records Officer persona

Competitor tier validation — whether cloud-native tools (Google DLP, Amazon Comprehend/Macie) belong in primary or secondary tier

Top 3 features to emphasize in competitive differentiation queries

Pain point prioritization — top 3 buyer problems to test first, and whether to merge the two manual-redaction pain points

Any persona corrections — DPO/Head of Privacy and VP of Data Science presence in actual deals

Client

Date

Tonic.ai Audit Foundation

Where You Stand Today

What You Need to Know

Reading This Document

Tonic.ai

Company Details

Who Buys Tonic Textual

Who You're Compared Against

Primary Competitors

Private AI

Microsoft Presidio

Google Cloud DLP

Amazon Comprehend / Macie

Nightfall AI

What Buyers Evaluate

Free-Text & Document De-identification Strong High

AI & LLM Training Data Decontamination Strong High

Redaction Audit Trails & Compliance Reporting Moderate Med

Named Entity Recognition & PII Detection

Document & PDF Redaction

Guided Human-in-the-Loop Redaction

Multi-Language PII Detection & Redaction

Bulk & Batch Document Processing

What Buyers Struggle With

Unstructured Data Is the Biggest Privacy Blind Spot High High

AI Teams Blocked by Unstructured PII in Training Data High High

Cannot Prove Unstructured Data Was Properly De-identified High High

Document Redaction at Scale Is a Manual Nightmare High

FOIA & Public Records Redaction Overwhelms Government Teams High

Manual Redaction Is Slow, Expensive, and Misses PII High

Layer 1 Technical Findings

🟡 Stale Content on High-Value Content Marketing Pages

🔵 Multiple H1 Tags on Commercial Pages

🔵 Sitemap Missing lastmod Dates on All 1,710 URLs

🔵 Thin Content on Core Product and Capability Pages

🔵 Near-Duplicate Content Between Capability Pages

🔵 Missing H1 Tag on eBay Case Study

Manual Verification Checklist

Schema Markup Cannot Be Assessed

Client-Side Rendering Status Cannot Be Assessed

Meta Descriptions and OG Tags Cannot Be Assessed

Site Analysis Summary

What Happens Next

Validation Call

Query Generation & Execution

Full Audit Delivery

Your Pre-Call Checklist

We're Aligned On