Web import

If you have website content that doesn’t live in a Zendesk or Salesforce knowledge base, you can import it into your AI Agent using Ada’s web scraper. The scraper goes through your public-facing website content, and saves the text in new articles on your Knowledge page.

Before you begin

Before importing your website, make sure you understand the following limitations, and understand how to adjust your website if you find that content isn’t scraping properly.

Limitations

  • You can only scrape content on public-facing websites (i.e., you can’t require users to log in to see it).
  • You can only have one active scrape job going at a time.
  • The website scraper can only import articles up to 100KB in size. It will skip any articles that are larger.
  • The scraper follows links from your starting URL up to five levels deep (e.g., www.website.com/level_1/level_2/level_3/level_4/level_5). It won’t import any articles that are deeper into your page hierarchy.
  • Your AI Agent can have a maximum of 50,000 articles in it, and a web source can have up to 1,000 articles in it. If your AI Agent reaches either number, the import will stop.
  • It may not be possible to import certain types of websites. Web imports work best with websites that are written in static semantic HTML. Some websites may not import properly:
    • Websites with web crawlers blocked - if your website has a blocker for web crawlers, Ada’s scraper won’t be able to access its content.
    • Websites that are not written with semantic HTML - articles scraped from these websites may contain content from things like navigation menus, headers, footers, or other page elements that don’t belong in your AI Agent’s knowledge.
  • URLs cannot exceed 1024 characters.

Understand HTML elements the scraper ignores

Your website likely contains page elements you don’t want to have scraped and saved in your AI Agent (e.g., headers and footers). By default, the scraper is programmed to skip the elements that are least likely to contain relevant page information.

The website scraper is programmed to ignore HTML elements that match the following tags:

  • button
  • img
  • meta
  • nav
  • noscript
  • picture
  • script
  • style
  • svg
  • audio
  • video

Other elements like footers or banners may still be scraped if they contain text. To minimize unwanted content, structure your page so that key content is in dedicated content containers.

For more information on HTML best practices for your knowledge base, see Prepare your knowledge base as a source for AI generated content.

Import your website’s content

  1. On the Ada dashboard, go to Training > Knowledge, then click Add Source and select Website. The Import website window opens.
  2. Under Source name, give your source a name. Each source name in your AI Agent must be unique, so you can identify and filter by the source on your Knowledge page.
  3. Under Content to import, choose the pages you want to import.
    • To import your entire website, where you provide a single URL and your AI Agent follows the links on that website and scrapes those pages too, select Every webpage starting from one URL. Then, add the URL you want your AI Agent to start scraping from.

      • For best results, use a root domain, like https://mywebsite.com, instead of a section of your website, like https://mywebsite.com/pages.
      • Be aware of any redirects in your website. The scraper will import redirected sites, as long as they start with the URL you enter.
    • To import specific pages on your website, where your AI Agent only scrapes the pages you provide, select A specific list of webpages. Then, add the list of URLs you want your AI to import, separating the URLs with commas. Your list can be up to 5000 characters long.

  4. Click Add. The Add website window closes, and your AI Agent saves your page source on the External sources tab and starts importing its content.

By default, all of your imported articles are set to Active, but you can change availability settings as needed for any of your articles. For more information, see Manage your knowledge content.

You can set availability rules for your imported articles to control which of your customers can see them. Availability rules are preserved when your website is re-imported—only the article content is updated.

Keep your website content up-to-date

Ada automatically keeps your website content up-to-date with daily syncs that detect new or updated pages and remove pages that no longer exist.

Daily automatic syncs: Ada re-crawls your site once per day to keep your Knowledge base current.

Manual refresh: You can trigger an immediate re-import at any time from the Sources tab without waiting for the next scheduled sync.

Article updates preserve existing references in your coaching and rules, ensuring your automations continue to work seamlessly.

To manually re-import your website:

  1. On the Ada dashboard, go to Training > Knowledge, then click the Sources tab.
  2. Find the website you want to re-import, and click Settings. The Import website window opens, with the import settings pre-populated.
  3. Click Sync now. The Import website window closes and starts re-importing your website.

Delete your website’s content

You can remove your website as a source from your AI Agent.

Removing a website as a source also deletes all of the articles from that website from your AI Agent.

  1. On the Ada dashboard, go to Training > Knowledge, then click the Sources tab.
  2. Find the website you want to delete, and click Settings. The Import website window opens, with the import settings pre-populated.
  3. Click Delete source. A confirmation message appears, to remind you that deleting the source also deletes all of the articles from your AI Agent. To proceed, click Delete.