Skip to main content

Import website content into your AI Agent

If you have website content that doesn't live in a Zendesk or Salesforce knowledge base, you can import it into your AI Agent using Ada's web scraper. The scraper goes through your public-facing website content, and saves the text in new articles on your Knowledge page.

Before you begin

Before importing your website, make sure you understand the following limitations, and understand how to adjust your website if you find that your website isn't scraping your content properly.

Limitations

  • You can only scrape content on public-facing websites (i.e., you can't require users to log in to see it).
  • The website scraper can only import articles up to 100KB in size. It will skip any articles that are larger.
  • Your AI Agent can have a maximum of 50,000 articles in it. If your AI Agent reaches that number, the import will stop.
  • It may not be possible to import certain types of websites. Web imports work best with websites that are written in static semantic HTML. Some websites may not import properly:
    • Websites that have dynamically generated content - if your website uses scripts to dynamically serve content, the scraper won't have HTML to convert to Markdown and save in your AI Agent.
    • Websites with web crawlers blocked - if your website has a blocker for web crawlers, Ada's scraper won't be able to access its content.
    • Websites that are not written with semantic HTML - articles scraped from these websites may contain content from things like navigation menus, headers, footers, or other page elements that don't belong in your AI Agent's knowledge.
  • Your AI Agent doesn't check for duplicate content. It does check to ensure your sources are distinct from each other (i.e., they have different names and URLs), but it still may be possible to import duplicate sites into your AI Agent. Make sure you manage your sources carefully to avoid any issues.
  • You can set availability rules for your imported articles to control which of your customers can see them. However, every time you re-import your website, the availability rules reset and your imported articles become available to everyone again.

Understand HTML elements the scraper ignores

Your website likely contains page elements you don't want to have scraped and saved in your AI Agent (e.g., headers and footers). By default, the scraper is programmed to skip the elements that are least likely to contain relevant page information.

If you have additional page elements that you want your scraper to ignore, you can add any of the roles below to your page elements. For more information on HTML best practices for your knowledge base, see Prepare your knowledge base as a source for AI generated content.

The website scraper is programmed to ignore HTML elements that match the following selectors:

  • nav
  • footer
  • script
  • style
  • noscript
  • svg
  • img
  • audio
  • video
  • [role="alert"]
  • [role="banner"]
  • [role="dialog"]
  • [role="alertdialog"]
  • [role=\"region\"][aria-label*=\"skip\" i]
  • [aria-modal="true"]

Import your website's content

  1. On the Ada dashboard, go to Training > Knowledge, then click Import website. The Import website window opens.

  2. Under Source name, give your source a name. Each source name in your AI Agent must be unique, so you can identify and filter by the source on your Knowledge page.

  3. Under Content to import, choose the pages you want to import.

    • To import your entire website, where you provide a single URL and your AI Agent follows the links on that website and scrapes those pages too, select Every webpage starting from one URL. Then, add the URL you want your AI Agent to start scraping from.

      tips
      • For best results, use a root domain, like https://mywebsite.com, instead of a section of your website, like https://mywebsite.com/pages.
      • Be aware of any redirects in your website. The scraper will import redirected sites, as long as they start with the URL you enter.
    • To import specific pages on your website, where your AI Agent only scrapes the pages you provide, select A specific list of webpages. Then, add the list of URLs you want your AI to import, separating the URLs with commas.

  4. Click Import. The Import website window closes, and your AI Agent saves your page source on the External sources tab and starts importing its content.

    Refresh your page to see updates on your import's progress. Large imports may take a few minutes.

By default, all of your imported articles are set to Active, but you can change availability settings as needed for any of your articles. For more information, see Manage your knowledge content.

Re-import your website's content

If your website has been updated, you can re-import its content into your AI Agent to ensure your customers are always getting current information.

important

Your website doesn't update itself automatically, so make sure you re-import your website often!

When you re-import your website, your AI Agent deletes all of the articles from the last import, and imports a fresh version of your website again.

  1. On the Ada dashboard, go to Training > Knowledge, then click the External sources tab.

  2. Find the website you want to re-import, and click Settings. The Import website window opens, with the import settings pre-populated.

  3. Click Import again. The Import website window closes and starts re-importing your website.

    Refresh your page to see updates on your import's progress. Large imports may take a while to fully import.

Delete your website's content

You can remove your website as a source from your AI Agent.

caution

Removing a website as a source also deletes all of the articles from that website from your AI Agent.

  1. On the Ada dashboard, go to Training > Knowledge, then click the External sources tab.
  2. Find the website you want to re-import, and click Settings. The Import website window opens, with the import settings pre-populated.
  3. Click Delete source. A confirmation message appears, to remind you that deleting the source also deletes all of the articles from your AI Agent. To proceed, click Delete.

Have any questions? Contact your Ada team—or email us at .