Chunks and documents concepts

The Chunk model

The Chunk model represents a chunk from a document. See the Data Sources to better understand how documents are chunked as part of a data source to enable semantic search.

Properties

hash:string
A hash of the chunk text as well parent document information.

text:string
The text of the chunk as it was embedded.

offset:integer
The offset of the chunk in the parent document.

score:float
The similarity score of the chunk as returned by the semantic search.

vector: []float
The embedding vector associated with the chunk.

The Document model

The Document model represents a Data Source document

Properties

document_id: string
The document ID as specified at insertion.

created: integer
Epoch in ms at which the document was inserted.

timestamp: integer
User specified timestamp (epoch in ms) for the document. Can be used to filter documents when querying the Data Source based on their timestamp. If not specified, defaults to the value of created.

tags: []string
User specified list of string tags. Can be used to filter the results by tags when querying the Data Source. See the data_source block for more details. If not specified, defaults to the empty list.

source_url: string
User specified URL for the document.

text_size: integer
The size in bytes of the document's text.

chunk_count: integer
The number of chunks that were generated from the document's original text for embedding.

chunks: []Chunk
The document's chunks. When searching, only includes relevant chunks. When creating a document, includes all the chunks that were generated.

text: string
The document's full text. When searching, only preset if the full_text is true. Always set when retrieving a document by API.