Skip to content

Commit

Permalink
Merge branch 'main' into classic-token-filter-docs
Browse files Browse the repository at this point in the history
Signed-off-by: AntonEliatra <[email protected]>
  • Loading branch information
AntonEliatra authored Oct 3, 2024
2 parents 82a44bd + 2bb90a9 commit 9a65e3a
Show file tree
Hide file tree
Showing 234 changed files with 11,561 additions and 1,778 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh
* @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @stephen-crawford @epugh
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
_Describe what this change achieves._

### Issues Resolved
_List any issues this PR will resolve, e.g. Closes [...]._
Closes #[_insert issue number_]

### Version
_List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._
Expand Down
7 changes: 6 additions & 1 deletion .github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,11 @@ Levenshtein
[Mm]ultivalued
[Mm]ultiword
[Nn]amespace
[Oo]versamples?
[Oo]ffline
[Oo]nboarding
[Oo]versamples?
pebibyte
p\d{2}
[Pp]erformant
[Pp]laintext
[Pp]luggable
Expand All @@ -101,8 +103,10 @@ pebibyte
[Rr]eenable
[Rr]eindex
[Rr]eingest
[Rr]eprovision(ed|ing)?
[Rr]erank(er|ed|ing)?
[Rr]epo
[Rr]escor(e|ed|ing)?
[Rr]ewriter
[Rr]ollout
[Rr]ollup
Expand All @@ -126,6 +130,7 @@ stdout
[Ss]ubvector
[Ss]ubwords?
[Ss]uperset
[Ss]uperadmins?
[Ss]yslog
tebibyte
[Tt]emplated
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr_checklist.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
const prOwners = ['Naarcha-AWS', 'kolchfa-aws', 'vagimeli', 'natebower'];
if (!prOwners.includes(assignee)) {
assignee = 'hdhalter'
assignee = 'kolchfa-aws'
}
github.rest.issues.addAssignees({
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ Gemfile.lock
*.iml
.jekyll-cache
.project
vendor/bundle
10 changes: 10 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ Follow these steps to set up your local copy of the repository:

1. Navigate to your cloned repository.

##### Building using locally installed packages

1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but you can use any method you prefer:

```
Expand All @@ -98,6 +100,14 @@ Follow these steps to set up your local copy of the repository:
bundle install
```

##### Building using containerization

Assuming you have `docker-compose` installed, run the following command:

```
docker compose -f docker-compose.dev.yml up
```

#### Troubleshooting

Try the following troubleshooting steps if you encounter an error when trying to build the documentation website:
Expand Down
9 changes: 7 additions & 2 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@ This document lists the maintainers in this repo. See [opensearch-project/.githu

| Maintainer | GitHub ID | Affiliation |
| ---------------- | ----------------------------------------------- | ----------- |
| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon |
| Fanit Kolchina | [kolchfa-aws](https://github.com/kolchfa-aws) | Amazon |
| Nate Archer | [Naarcha-AWS](https://github.com/Naarcha-AWS) | Amazon |
| Nathan Bower | [natebower](https://github.com/natebower) | Amazon |
| Melissa Vagi | [vagimeli](https://github.com/vagimeli) | Amazon |
| Miki Barahmand | [AMoo-Miki](https://github.com/AMoo-Miki) | Amazon |
| David Venable | [dlvenable](https://github.com/dlvenable) | Amazon |
| Stephen Crawford | [scraw99](https://github.com/scrawfor99) | Amazon |
| Stephen Crawford | [stephen-crawford](https://github.com/stephen-crawford) | Amazon |
| Eric Pugh | [epugh](https://github.com/epugh) | OpenSource Connections |

## Emeritus

| Maintainer | GitHub ID | Affiliation |
| ---------------- | ----------------------------------------------- | ----------- |
| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon |
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ The following resources provide important guidance regarding contributions to th

If you encounter problems or have questions when contributing to the documentation, these people can help:

- [hdhalter](https://github.com/hdhalter)
- [kolchfa-aws](https://github.com/kolchfa-aws)
- [Naarcha-AWS](https://github.com/Naarcha-AWS)
- [vagimeli](https://github.com/vagimeli)
Expand Down
52 changes: 30 additions & 22 deletions _about/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards.

## Getting started

- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/)
- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/)
To get started, explore the following documentation:

- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/):
- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/)
- [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/)
- [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/)
- [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/)
- [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/)
- [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/)
- [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/)
- [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/)
- [See the FAQ](https://opensearch.org/faq)
- [FAQ](https://opensearch.org/faq)

## Why use OpenSearch?

With OpenSearch, you can perform the following use cases:

<table style="table-layout: auto ; width: 100%;">
<tbody>
<tr style="text-align: center; vertical-align:center;">
Expand All @@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases:
<td><img src="{{site.url}}{{site.baseurl}}/images/4_tracking.png" class="no-border" alt="Operational health tracking" height="100"/></td>
</tr>
<tr style="text-align: left; vertical-align:top; font-weight: bold; color: rgb(0,59,92)">
<td>Fast, Scalable Full-text Search</td>
<td>Application and Infrastructure Monitoring</td>
<td>Security and Event Information Management</td>
<td>Operational Health Tracking</td>
<td>Fast, scalable full-text search</td>
<td>Application and infrastructure monitoring</td>
<td>Security and event information management</td>
<td>Operational health tracking</td>
</tr>
<tr style="text-align: left; vertical-align:top;">
<td>Help users find the right information within your application, website, or data lake catalog. </td>
<td>Easily store and analyze log data, and set automated alerts for underperformance.</td>
<td>Easily store and analyze log data, and set automated alerts for performance issues.</td>
<td>Centralize logs to enable real-time security monitoring and forensic analysis.</td>
<td>Use observability logs, metrics, and traces to monitor your applications and business in real time.</td>
<td>Use observability logs, metrics, and traces to monitor your applications in real time.</td>
</tr>
</tbody>
</table>

**Additional features and plugins:**
## Key features

OpenSearch provides several features to help index, secure, monitor, and analyze your data:

OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface.
- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications
- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data
- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster
- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data
- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations
- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models
- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background
- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters
- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications.
- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data.
- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations.
- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case.
- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads.
- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks.
- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster.
- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background.
- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters.


## The secure path forward
OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords.

OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/).

## Looking for the Javadoc?

Expand Down
4 changes: 4 additions & 0 deletions _about/version-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ permalink: /version-history/

OpenSearch version | Release highlights | Release date
:--- | :--- | :---
[2.17.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.1.md) | Includes bug fixes for ML Commons, anomaly detection, k-NN, and security analytics. Adds various infrastructure and maintenance updates. For a full list of release highlights, see the Release Notes. | 1 October 2024
[2.17.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.0.md) | Includes disk-optimized vector search, binary quantization, and byte vector encoding in k-NN. Adds asynchronous batch ingestion for ML tasks. Provides search and query performance enhancements and a new custom trace source in trace analytics. Includes application-based configuration templates. For a full list of release highlights, see the Release Notes. | 17 September 2024
[2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024
[2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024
[2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024
[2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024
Expand All @@ -30,6 +33,7 @@ OpenSearch version | Release highlights | Release date
[2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022
[2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022
[2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022
[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024
[1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024
[1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024
[1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024
Expand Down
24 changes: 23 additions & 1 deletion _analyzers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,28 @@ The response provides information about the analyzers for each field:
}
```

## Normalizers
Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical.

### Normalization techniques

The following normalization techniques can help address variations in token forms:
1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello".

2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run".

3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run".

### Normalization

A search for `Hello` will match documents containing `hello` because of case normalization.

A search for `cars` will also match documents containing `car` because of stemming.

A query for `running` can retrieve documents containing `jogging` using synonym handling.

Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`.

## Next steps

- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
4 changes: 2 additions & 2 deletions _analyzers/token-filters/apostrophe.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Apostrophe
parent: Token filters
nav_order: 110
nav_order: 10
---

# Apostrophe token filter
Expand All @@ -22,7 +22,7 @@ PUT /custom_text_index
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard", // splits text into words
"tokenizer": "standard",
"filter": [
"lowercase",
"apostrophe"
Expand Down
Loading

0 comments on commit 9a65e3a

Please sign in to comment.