How to Train Your AI Chatbot: Upload Documents, Import URLs, and Fix Knowledge Gaps

Your AI agent is only as good as the information it has. The Documents page is where you build your knowledge base — upload files, import web pages, and manage what your AI knows. The Knowledge Gaps page shows you what it doesn't know, so you can fill the holes.

Uploading Documents

Navigate to Documents in the sidebar. The upload zone at the top supports two methods:

File Upload

Drag and drop files onto the upload area, or click to browse. Supported formats:

Format	Extension	Notes
PDF	.pdf	Extracts text from all pages. Page count tracked.
Word	.docx	Extracts raw text from Word documents.
Plain Text	.txt	Direct text extraction.
Markdown	.md	Text extracted, YAML frontmatter stripped.

Maximum file size: 25 MB per file. Total knowledge base size is limited by your plan's page allocation (1 page = roughly 2,500 characters).

URL Import

Paste a URL into the URL field and click Add URL. SupportHQ fetches the page, strips navigation and scripts, and extracts the article content. This is great for importing:

Existing help center articles
Product documentation pages
FAQ pages
Blog posts with product information

Auto-Sync for URLs (Pro and Scale)

On Pro and Scale plans, you can set an auto-sync interval for imported URLs: every 24 hours, 48 hours, or 7 days. When the interval is reached, SupportHQ automatically re-fetches the URL, updates the extracted content, and re-processes the embeddings. This keeps your AI's knowledge current without manual updates.

You can also click Sync due URLs now to trigger an immediate re-sync of all URLs that are past their scheduled sync time.

How Documents Are Processed

When you upload a file or import a URL, here's what happens behind the scenes:

Text extraction — the content is parsed into plain text (PDF pages, DOCX paragraphs, HTML article content)
Chunking — the text is split into ~800-character chunks with 150-character overlap between chunks. This ensures the AI can find relevant passages without missing context at chunk boundaries.
Embedding — each chunk is converted into a 1,536-dimension vector using an AI embedding model. These vectors capture the semantic meaning of the text.
Storage — the vectors are stored in PostgreSQL with pgvector, enabling fast similarity search across your entire knowledge base.

Document Statuses

Status	Meaning
Pending	Queued for processing
Processing	Currently being parsed, chunked, and embedded
Ready	Successfully processed — the AI can use this document
Error	Something went wrong. Click Retry to re-process.

How the AI Uses Your Documents

When a customer asks a question, the AI:

Converts the question into a vector embedding
Searches your knowledge base for the 5 most similar chunks
Filters out chunks below a similarity threshold (0.30)
Uses the relevant chunks as context to generate an accurate, grounded answer

If no chunks pass the similarity threshold, the AI either answers from its general knowledge or records a knowledge gap (see below).

Managing Documents

The document list shows each document with its name, size, page count, chunk count, category, status, and creation date. For URL imports, it also shows the source URL and last sync timestamp.

Retry — appears on Error documents. Re-triggers the processing pipeline.
Delete — removes the document, its stored file, and all chunks. The AI will no longer reference this content.
Clear All — bulk-deletes every document in your knowledge base. Use with caution.

Knowledge Gaps

Navigate to Knowledge Gaps in the sidebar. This page shows questions your AI couldn't answer because no relevant documents were found.

How Gaps Are Detected

A knowledge gap is recorded when all three conditions are true during a conversation:

The AI found no relevant document chunks for the customer's question (similarity below threshold)
The question requires domain-specific knowledge (not a general greeting or chitchat)
The customer is not asking for a live agent or changing the language

When the same question (or a very similar one) is asked again, the existing gap's frequency is incremented instead of creating a duplicate. This tells you which missing topics are asked about most.

The Knowledge Gaps List

Each gap shows:

Question — the customer's original question (up to 500 characters)
Frequency badge — how many times this question has been asked (e.g., "5x asked")
Last seen — when the question was most recently asked

Gap Statuses

Status	Meaning
Open	Needs attention — the AI still can't answer this
Resolved	You've addressed it (uploaded a document, wrote an article, etc.)
Ignored	You've decided this isn't relevant to your knowledge base

Working with Gaps

For each open gap, you have three actions:

Mark as Resolved (green checkmark) — use this after you've uploaded a document or created a help center article that answers the question
Mark as Ignored (eye-off icon) — use this for questions that are irrelevant, off-topic, or not something you want the AI to answer
Delete (trash icon) — permanently remove the gap record

Resolved and Ignored gaps can be reopened if needed by clicking the eye icon.

The Feedback Loop

Knowledge gaps and documents work together as a continuous improvement cycle:

Customer asks a question the AI can't answer
A knowledge gap is recorded with the question text
You see the gap on the Knowledge Gaps page, sorted by frequency
You create content — upload a document, import a URL, or write a help center article
You mark the gap as resolved
Next time the question is asked, the AI finds the new content and answers correctly

Plan Limits

Plan	Document Pages	Auto-Resync
Starter	5,000 pages	No
Pro	50,000 pages	Yes (24h / 48h / 7d)
Scale	Unlimited	Yes (24h / 48h / 7d)

Tips

Start with your FAQ — your most frequently asked questions are the highest-value content. Upload your FAQ document first and the AI covers the most ground immediately.
One topic per document — smaller, focused documents produce better search results than one massive file covering everything.
Use URL import for existing docs — if you already have a help center or documentation site, import the URLs. With auto-sync, updates propagate automatically.
Check gaps weekly — sort by frequency to find the most-asked unanswered questions. Address the high-frequency ones first for maximum impact.
Don't ignore everything — it's tempting to ignore gaps that seem off-topic, but some may reveal customer confusion about your product that's worth addressing.
Retry failed documents — if a document errors during processing, click Retry. Transient issues (network timeouts, temporary API errors) usually resolve on retry.
Use categories for organization — as your knowledge base grows, categories help you manage content. They don't affect AI search quality, but they make the Documents page easier to navigate.
Monitor chunk counts — a document with 0 chunks means no searchable content was extracted. Check the source file for issues (scanned PDFs without OCR, empty pages, etc.).