Your AI agent is only as good as the information it has. The Documents page is where you build your knowledge base — upload files, import web pages, and manage what your AI knows. The Knowledge Gaps page shows you what it doesn't know, so you can fill the holes.
Uploading Documents
Navigate to Documents in the sidebar. The upload zone at the top supports two methods:
File Upload
Drag and drop files onto the upload area, or click to browse. Supported formats:
| Format | Extension | Notes |
|---|---|---|
| Extracts text from all pages. Page count tracked. | ||
| Word | .docx | Extracts raw text from Word documents. |
| Plain Text | .txt | Direct text extraction. |
| Markdown | .md | Text extracted, YAML frontmatter stripped. |
Maximum file size: 25 MB per file. Total knowledge base size is limited by your plan's page allocation (1 page = roughly 2,500 characters).
URL Import
Paste a URL into the URL field and click Add URL. SupportHQ fetches the page, strips navigation and scripts, and extracts the article content. This is great for importing:
- Existing help center articles
- Product documentation pages
- FAQ pages
- Blog posts with product information
Auto-Sync for URLs (Pro and Scale)
On Pro and Scale plans, you can set an auto-sync interval for imported URLs: every 24 hours, 48 hours, or 7 days. When the interval is reached, SupportHQ automatically re-fetches the URL, updates the extracted content, and re-processes the embeddings. This keeps your AI's knowledge current without manual updates.
You can also click Sync due URLs now to trigger an immediate re-sync of all URLs that are past their scheduled sync time.
Categories
Before uploading, you can optionally assign a category to organize your documents: FAQ, Policy, Product, Technical, Pricing, or Other. Categories help you manage a large knowledge base but don't affect how the AI searches — all documents are searched equally.
How Documents Are Processed
When you upload a file or import a URL, here's what happens behind the scenes:
- Text extraction — the content is parsed into plain text (PDF pages, DOCX paragraphs, HTML article content)
- Chunking — the text is split into ~800-character chunks with 150-character overlap between chunks. This ensures the AI can find relevant passages without missing context at chunk boundaries.
- Embedding — each chunk is converted into a 1,536-dimension vector using an AI embedding model. These vectors capture the semantic meaning of the text.
- Storage — the vectors are stored in PostgreSQL with pgvector, enabling fast similarity search across your entire knowledge base.
Document Statuses
| Status | Meaning |
|---|---|
| Pending | Queued for processing |
| Processing | Currently being parsed, chunked, and embedded |
| Ready | Successfully processed — the AI can use this document |
| Error | Something went wrong. Click Retry to re-process. |
How the AI Uses Your Documents
When a customer asks a question, the AI:
- Converts the question into a vector embedding
- Searches your knowledge base for the 5 most similar chunks
- Filters out chunks below a similarity threshold (0.30)
- Uses the relevant chunks as context to generate an accurate, grounded answer
If no chunks pass the similarity threshold, the AI either answers from its general knowledge or records a knowledge gap (see below).
Managing Documents
The document list shows each document with its name, size, page count, chunk count, category, status, and creation date. For URL imports, it also shows the source URL and last sync timestamp.
- Retry — appears on Error documents. Re-triggers the processing pipeline.
- Delete — removes the document, its stored file, and all chunks. The AI will no longer reference this content.
- Clear All — bulk-deletes every document in your knowledge base. Use with caution.
Knowledge Gaps
Navigate to Knowledge Gaps in the sidebar. This page shows questions your AI couldn't answer because no relevant documents were found.
How Gaps Are Detected
A knowledge gap is recorded when all three conditions are true during a conversation:
- The AI found no relevant document chunks for the customer's question (similarity below threshold)
- The question requires domain-specific knowledge (not a general greeting or chitchat)
- The customer is not asking for a live agent or changing the language
When the same question (or a very similar one) is asked again, the existing gap's frequency is incremented instead of creating a duplicate. This tells you which missing topics are asked about most.
The Knowledge Gaps List
Each gap shows:
- Question — the customer's original question (up to 500 characters)
- Frequency badge — how many times this question has been asked (e.g., "5x asked")
- Last seen — when the question was most recently asked
Gap Statuses
| Status | Meaning |
|---|---|
| Open | Needs attention — the AI still can't answer this |
| Resolved | You've addressed it (uploaded a document, wrote an article, etc.) |
| Ignored | You've decided this isn't relevant to your knowledge base |
Working with Gaps
For each open gap, you have three actions:
- Mark as Resolved (green checkmark) — use this after you've uploaded a document or created a help center article that answers the question
- Mark as Ignored (eye-off icon) — use this for questions that are irrelevant, off-topic, or not something you want the AI to answer
- Delete (trash icon) — permanently remove the gap record
Resolved and Ignored gaps can be reopened if needed by clicking the eye icon.
The Feedback Loop
Knowledge gaps and documents work together as a continuous improvement cycle:
- Customer asks a question the AI can't answer
- A knowledge gap is recorded with the question text
- You see the gap on the Knowledge Gaps page, sorted by frequency
- You create content — upload a document, import a URL, or write a help center article
- You mark the gap as resolved
- Next time the question is asked, the AI finds the new content and answers correctly
Plan Limits
| Plan | Document Pages | Auto-Resync |
|---|---|---|
| Starter | 5,000 pages | No |
| Pro | 50,000 pages | Yes (24h / 48h / 7d) |
| Scale | Unlimited | Yes (24h / 48h / 7d) |
Tips
- Start with your FAQ — your most frequently asked questions are the highest-value content. Upload your FAQ document first and the AI covers the most ground immediately.
- One topic per document — smaller, focused documents produce better search results than one massive file covering everything.
- Use URL import for existing docs — if you already have a help center or documentation site, import the URLs. With auto-sync, updates propagate automatically.
- Check gaps weekly — sort by frequency to find the most-asked unanswered questions. Address the high-frequency ones first for maximum impact.
- Don't ignore everything — it's tempting to ignore gaps that seem off-topic, but some may reveal customer confusion about your product that's worth addressing.
- Retry failed documents — if a document errors during processing, click Retry. Transient issues (network timeouts, temporary API errors) usually resolve on retry.
- Use categories for organization — as your knowledge base grows, categories help you manage content. They don't affect AI search quality, but they make the Documents page easier to navigate.
- Monitor chunk counts — a document with 0 chunks means no searchable content was extracted. Check the source file for issues (scanned PDFs without OCR, empty pages, etc.).