Skip to content

Commit

Permalink
Fix heading hierarchy for Semantic Search pages. (#72)
Browse files Browse the repository at this point in the history
  • Loading branch information
cjcenizal committed Aug 17, 2023
1 parent 19d1f4d commit 4c10205
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 28 deletions.
21 changes: 12 additions & 9 deletions www/docs/common-use-cases/semantic-search/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,32 @@ sidebar_label: Overview
import {Config} from '@site/docs/definitions.md';

A common use case in <Config v="names.product"/> is to build a semantic,
LLM-powered search application. This page outlines what <Config v="names.product"/>
LLM-powered search application. This page outlines what <Config v="names.product"/>
can do for this use case as well as why and how to employ these features for the
best overall end-user experience.

# Large Language Models (LLMs)
## Large Language Models (LLMs)

[LLMs](https://en.wikipedia.org/wiki/Large_language_model) are deep neural nets
that are built with the task of specifically understanding human language. These
models can be a great asset to many different use cases, including search and
language generation.

These generally work by reading immense amounts of text to build a model and
then using that model to convert text into vectors, both at index and at query
time. For many use cases, this obviates the need for many language rules of
time. For many use cases, this obviates the need for many language rules of
traditional keyword systems like synonym management, stemming, and phrase parsing
because the LLM can inherently understand what the user is asking.

The team behind <Config v="names.product"/> has built LLMs that work across a
wide variety of languages and verticals. When you index data into <Config v="names.product"/>
wide variety of languages and verticals. When you index data into <Config v="names.product"/>
or perform a search, the text is converted to one or more vectors via a LLM
and then used to answer questions that your users have.

# Zero-shot models
## Zero-shot models

[Zero-shot](https://en.wikipedia.org/wiki/Zero-shot_learning) models are models
which have an excellent understanding of language in general. They can understand
which have an excellent understanding of language in general. They can understand
and respond to the semantic meaning of questions without any additional tuning.
This obviates much of the need for fine-tuning/specialized training on a
particular dataset or in a particular vertical.
Expand All @@ -40,11 +42,12 @@ that have been developed by the team to allow your end users to query using
the language and verbiage of their choosing and find the right documents,
regardless of the domain your documents are in.

# Hybrid search
## Hybrid search

While zero-shot LLMs work very well in the vast majority of search use cases,
there are some occasions where they suffer. In particular, many zero-shot LLMs
there are some occasions where they suffer. In particular, many zero-shot LLMs
don't work as well when users perform queries for things which have little
semantic meaning. For example, a UPC code, barcode number, or particular named
semantic meaning. For example, a UPC code, barcode number, or particular named
configuration setting has little/no semantic meaning, and if you expect your
users to perform this type of search, it's best to look into our
[hybrid search](/docs/api-reference/search-apis/lexical-matching) documentation.
43 changes: 24 additions & 19 deletions www/docs/common-use-cases/semantic-search/scores.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@ sidebar_label: Relevance Tuning
import {Config} from '@site/docs/definitions.md';

By default, <Config v="names.product"/> uses a form of "question-answer"
similarity to produce the scoring. This provides a very robust ability to
answer your users questions. By default, scores go from -1 to 1 where a
similarity to produce the scoring. This provides a very robust ability to
answer your users questions. By default, scores go from -1 to 1 where a
score of -1 would be "completely irrelevant" and a score of 1 would be a
near/exact match. There are several controls which affect these scores and
near/exact match. There are several controls which affect these scores and
the associated result rankings.

# Custom dimensions
## Custom dimensions

[Custom dimensions](custom-dimensions) are a fixed set of additional "dimensions"
that contain user-defined numerical values and are stored in addition to the
vectors that <Config v="names.product"/> automatically extracts and stores from
Expand All @@ -24,42 +25,46 @@ Custom dimensions are great to hold metadata like "upvotes" of a post, number
of times a product has been purchased, and similar measures of business/relevance
value.

# Hybrid search
## Hybrid search

By default, <Config v="names.product"/> uses purely semantic similarity when
evaluating whether a document/section is responsive to a particular search.
However, we often find that with a *slight* introduction of keyword-focused
algorithms, the relevance can be much better. <Config v="names.product"/>
However, we often find that with a _slight_ introduction of keyword-focused
algorithms, the relevance can be much better. <Config v="names.product"/>
supports this out of the box via [hybrid search](/docs/api-reference/search-apis/lexical-matching).

# Alternative similarity measures
## Alternative similarity measures

While <Config v="names.product"/> uses question-answer style similarity by
default, sometimes it's advantageous to use document-document similarity. For
example, think of a case where a user asks "where can I find great tacos?" You
typically wouldn't want to match the *closest* document to that question (e.g.
example, think of a case where a user asks "where can I find great tacos?" You
typically wouldn't want to match the _closest_ document to that question (e.g.
one that just has the text "where can I find great tacos") but instead a document
that *answers* that question (e.g. "you can find the best tacos at _______").
that _answers_ that question (e.g. "you can find the best tacos at **\_\_\_**").

However, there are times when finding the most semantically similar documents
is advantageous. In particular, [recommendation systems](/docs/common-use-cases/recommendation-systems/recommender-overview)
tend to make heavy use of document similarity metrics. However, these can be
is advantageous. In particular, [recommendation systems](/docs/common-use-cases/recommendation-systems/recommender-overview)
tend to make heavy use of document similarity metrics. However, these can be
useful in other use cases as well, including [matching questions](/docs/common-use-cases/question-answer/question-answer-overview)
in FAQ search systems.

# Interpreting scores
## Interpreting scores

If you want to understand a bit more about why <Config v="names.product"/>
produced a particular score, have a look at our
[interpreting scores](/docs/api-reference/search-apis/interpreting-responses/intepreting-scores)
documentation.

# Low-level indexing controls
## Low-level indexing controls

Sometimes, the best solution to changing relevance is by adjusting the low-level
indexing controls. <Config v="names.product"/> supports fine-grained tuning of
indexing controls. <Config v="names.product"/> supports fine-grained tuning of
this in the [low-level](/docs/api-reference/indexing-apis/core_indexing) API.
There, you can pre-segment your documents into sections, and
tell <Config v="names.product"/> what the context is around the documents.

Note that we do consider that anyone that *needs* to use this API as a bit of a
failure on our side to providing robust-enough APIs! If you find that you need
Note that we do consider that anyone that _needs_ to use this API as a bit of a
failure on our side to providing robust-enough APIs! If you find that you need
to use this API because you're getting poor quality without it, please do
[let us know](https://discuss.vectara.com) about your use case so we can consider
adding structured APIs around it.
adding structured APIs around it.

0 comments on commit 4c10205

Please sign in to comment.