Search datasources

With Dust, you can ask your assistants to search through your entire selected Data Sources and pick the most relevant documents to tap into to answer. Dust provides _RAG _capabilities and assistants are able search your data semantically before sending your message to the chosen model (cf. Understanding Retrieval Augmented Generation (RAG) in Dust).

To be able to use this, you need to have created some Datasources first - Connections, Folders and/or websites.

Screenshot 2024-06-20 at 11.00.26.png

You will then need to select from your data sources exactly what you want this assistant to be able to search from. This means any of your Connections, Folders or Websites and you can even pick sub-pages or specific channels.

Pick from any of your data sources

Pick from any of your data sources

Once you picked a datasource, you can fine-tune the pages available to your assistant

Once you picked a datasource, you can fine-tune the pages available to your assistant

Searching my data, how exactly?

When this method is activated, the assistant will search your data for the right information by understanding the context of the user's question. This is possible through a process called Semantic search.

πŸ“˜

Semantic search

Semantic search in RAG is a method for finding relevant information based on meaning and context, not just keywords. It uses advanced techniques like embeddings and similarity measures to understand query intent and retrieve contextually appropriate information from a knowledge base. This enhances RAG models' ability to generate more accurate and informed responses by augmenting the language model with pertinent, up-to-date information.

Let’s take an example:

Imagine you are creating an assistant whose goal is to expose HR documentation to your employees. They can search for anything regarding their holidays (PTO), hiring policies, parental leave, etc. They already have access to your documentation (Confluence, Notion, etc.) but this information is lost somewhere in a sea of articles and a lot of them can't find it easily. The assistant will:

  1. Understand the user question (e.g. "what should I prepare for a parental leave?") and search for the right articles and the right chunks inside of the articles that will actually have the answer.
  2. It will then feed this content to the model with the full query of the user.
  3. The model will will generate a response to the user's question, taking into account the results from the data source and giving the links to the right sources.

When should I use the Search option?

This method is particularly useful in scenarios where the data might be coming from datasources that you have synced with Dust: Google drive, Confluence, Notion, Slack, etc.

Granularity

While it is tempting to add all of your datasources to an assistant, it is not always the right choice! (And you can already query all of your datasources using the @dust assistant if your workspace is configured with it.

LLMs perform best when provided with the right data. So assistants that are plugged to a constrained number of data will usually perform better than the ones with access to everything.

And remember when creating an assistant, try different options to check which one suits you best!