Create a Data Source

You can create a Data Source in the Gen AI Builder console by specifying the required configuration for your chosen Data Source in the cloud console.

Follow these steps to create a data source. For this example, we will create a data source from a web page.

  1. Navigate to the Data Sources screen.
  2. Click Create data source.
  3. Select a type of data source. For this example, choose Web Page.
  4. Give your data source a name and a description (optional).
  5. Enter the URL of a web page that you want to use as a data source, for example https://www.griptape.ai.
  6. Click Create to submit the form and create your data source.

Data Source Types

Web Page

You can scrape and ingest a single, public web page by providing a URL. If you wish to scrape multiple pages, you must create multiple Data Sources. However, you can then add all of the pages to the same Knowledge Base if you wish to access all content from the pages together.

Amazon S3

You can connect Amazon S3 buckets, objects, and prefixes by providing their S3 URI(s). Supported file extensions include .pdf, .csv, .md, and most text-based file types.

Google Drive

Connect individual Google Drive files or entire folders. Supported file types include Google Apps file types such as Docs, Sheets, and Slides, as well as most text-based file types such as PDF, CSV, and Markdown.

Atlassian Confluence

You can connect to your personal or company Confluence by providing a URL, Atlassian API Token, and the email address for the token holder's account. Each Confluence Data Source can be limited to a single Space in Confluence by specifying the specific URL for that Space.

Gen AI Builder Data Lake

You can connect a Bucket and a list of Asset Paths as a Data Source. Supported file types include PDF, CSV, Markdown, and most text-based file types.

Custom Data Sourcses via Structures (Experimental)

You can specify a Structure to run as a Data Source as long as your Structure returns a TextArtifact or ListArtifact from the Gen AI Builder SDK. You can use this as a way to build custom Data Sources.

Other Data Source Types

If you do not see a Data Source configuration you'd wish to use, you can submit a request via Discord or hello@griptape.ai.

Adding Structure as Transform to Data Source (Experimental)

When creating any Data Source, you can optionally specify a Structure to run as a transform step of your data ingetstion before loading into the vector store. Ensure the Structure you select to run as a transform is configured to take in a ListArtifact as its first positional argument and returns either a TextArtifact or ListArtifact.

Take a look at the Find and Replace Sample Structure for more details on how to implement this for your own Structure.


Could this page be better? Report a problem or suggest an addition!