Import website content into your AI Agent
If you have website content that doesn't live in a Zendesk or Salesforce knowledge base, you can import it into your AI Agent using Ada's web scraper. The scraper goes through your public-facing website content, and saves the text in new articles on your Knowledge page.
Before you begin
Before importing your website, make sure you understand the following limitations, and understand how to adjust your website if you find that your website isn't scraping your content properly.
Limitations
- You can only scrape content on public-facing websites (i.e., you can't require users to log in to see it).
- You can only have one active scrape job going at a time.
- The website scraper can only import articles up to 100KB in size. It will skip any articles that are larger.
- If you import a website, the scraper follows the sitemap to scrape articles up to five levels in (e.g.,
www.website.com/level_1/level_2/level_3/level_4/level_5
). It won't import any articles that are deeper into your page hierarchy. - Your AI Agent can have a maximum of 50,000 articles in it, and a web source can have up to 1,000 articles in it. If your AI Agent reaches either number, the import will stop.
- It may not be possible to import certain types of websites. Web imports work best with websites that are written in static semantic HTML. Some websites may not import properly:
- Websites with web crawlers blocked - if your website has a blocker for web crawlers, Ada's scraper won't be able to access its content.
- Websites that are not written with semantic HTML - articles scraped from these websites may contain content from things like navigation menus, headers, footers, or other page elements that don't belong in your AI Agent's knowledge.
- Your AI Agent checks to let you know if you're importing sources with the same names or URLs, but you can still import duplicate content into your AI Agent. Make sure you manage your sources carefully to avoid any issues.
- You can set availability rules for your imported articles to control which of your customers can see them. However, every time you re-import your website, the availability rules reset and your imported articles become available to everyone again.
Understand HTML elements the scraper ignores
Your website likely contains page elements you don't want to have scraped and saved in your AI Agent (e.g., headers and footers). By default, the scraper is programmed to skip the elements that are least likely to contain relevant page information.
If you have additional page elements that you want your scraper to ignore, you can add any of the roles below to your page elements. For more information on HTML best practices for your knowledge base, see Prepare your knowledge base as a source for AI generated content.
The website scraper is programmed to ignore HTML elements that match the following selectors:
nav
footer
script
style
noscript
svg
img
audio
video
[role="alert"]
[role="banner"]
[role="dialog"]
[role="alertdialog"]
[role=\"region\"][aria-label*=\"skip\" i]
[aria-modal="true"]
Import your website's content
-
On the Ada dashboard, go to Training > Knowledge, then click Import website. The Import website window opens.
-
Under Source name, give your source a name. Each source name in your AI Agent must be unique, so you can identify and filter by the source on your Knowledge page.
-
Under Content to import, choose the pages you want to import.
-
To import your entire website, where you provide a single URL and your AI Agent follows the links on that website and scrapes those pages too, select Every webpage starting from one URL. Then, add the URL you want your AI Agent to start scraping from.
tips- For best results, use a root domain, like
https://mywebsite.com
, instead of a section of your website, likehttps://mywebsite.com/pages
. - Be aware of any redirects in your website. The scraper will import redirected sites, as long as they start with the URL you enter.
- For best results, use a root domain, like
-
To import specific pages on your website, where your AI Agent only scrapes the pages you provide, select A specific list of webpages. Then, add the list of URLs you want your AI to import, separating the URLs with commas. Your list can be up to 5000 characters long.
-
-
Click Import. The Import website window closes, and your AI Agent saves your page source on the External sources tab and starts importing its content.
Large imports may take a few minutes. You can check back to see when your import is done, but your AI Agent will also send you an email to let you know when it's finished.
By default, all of your imported articles are set to Active, but you can change availability settings as needed for any of your articles. For more information, see Manage your knowledge content.
Re-import your website's content
If your website has been updated, you can re-import its content into your AI Agent to ensure your customers are always getting current information.
Your website doesn't update itself automatically, so make sure you re-import your website often!
When you re-import your website, your AI Agent deletes all of the articles from the last import, and imports a fresh version of your website again.
-
On the Ada dashboard, go to Training > Knowledge, then click the Sources tab.
-
Find the website you want to re-import, and click Settings. The Import website window opens, with the import settings pre-populated.
-
Click Import again. The Import website window closes and starts re-importing your website.
Refresh your page to see updates on your import's progress. Large imports may take a while to fully import.
Delete your website's content
You can remove your website as a source from your AI Agent.
Removing a website as a source also deletes all of the articles from that website from your AI Agent.
- On the Ada dashboard, go to Training > Knowledge, then click the Sources tab.
- Find the website you want to re-import, and click Settings. The Import website window opens, with the import settings pre-populated.
- Click Delete source. A confirmation message appears, to remind you that deleting the source also deletes all of the articles from your AI Agent. To proceed, click Delete.