Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-STF 1.5.3 Documentation Walkthrough and Cleanup #517

Merged
merged 10 commits into from
Nov 30, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ endif::[]
ifeval::["{SupportedOpenShiftVersion}" != "{NextSupportedOpenShiftVersion}"]
* An {OpenShift} version inclusive of {SupportedOpenShiftVersion} through {NextSupportedOpenShiftVersion} is running.
endif::[]
* You have prepared your {OpenShift} environment and ensured that there is persistent storage and enough resources to run the {ProjectShort} components on top of the {OpenShift} environment. For more information, see https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling].
* You have prepared your {OpenShift} environment and ensured that there is persistent storage and enough resources to run the {ProjectShort} components on top of the {OpenShift} environment. For more information about {ProjectShort} performance, see the Red Hat Knowledge Base article https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling].
* Your environment is fully connected. {ProjectShort} does not work in a {OpenShift}-disconnected environments or network proxy environments.

ifeval::["{build}" == "downstream"]
Expand All @@ -40,7 +40,8 @@ endif::[]

* For more information about Operators, see the https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/operators/understanding/olm-what-operators-are.html[_Understanding Operators_] guide.
* For more information about Operator catalogs, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/operators/understanding/olm-rh-catalogs.html[_Red Hat-provided Operator catalogs_].
//* For more information about how to remove {ProjectShort} from the {OpenShift} environment, see xref:assembly-removing-stf-from-the-openshift-environment_{}[].
* For more information about the cert-manager Operator for Red Hat, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/security/cert_manager_operator/index.html[_cert-manager Operator for Red Hat OpenShift overview_].
* For more information about {ObservabilityOperator}, see https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/monitoring/cluster_observability_operator/cluster-observability-operator-overview.html[_Cluster Observability Operator Overview_].

include::../modules/con_deploying-stf-to-the-openshift-environment.adoc[leveloffset=+1]

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
[role="_abstract"]
You can configure multiple {OpenStack} ({OpenStackShort}) clouds to target a single instance of {Project} ({ProjectShort}). When you configure multiple clouds, every cloud must send metrics and events on their own unique message bus topic. In the {ProjectShort} deployment, Smart Gateway instances listen on these topics to save information to the common data store. Data that is stored by the Smart Gateway in the data storage domain is filtered by using the metadata that each of Smart Gateways creates.

[WARNING]
====
Be sure that every cloud deployment has a unique cloud domain configuration. For more information about configuring the domain for your cloud deployment, see xref:setting-a-unique-cloud-domain_assembly-completing-the-stf-configuration[].
====

[[osp-stf-multiple-clouds]]
.Two {OpenStackShort} clouds connect to {ProjectShort}
image::363_OpenStack_STF_updates_0923_topology_2.png[An example of two {OpenStackShort} clouds connecting to {ProjectShort}]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
= High availability

[role="_abstract"]
With high availability, {Project} ({ProjectShort}) can rapidly recover from failures in its component services. Although {OpenShift} restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of {ProjectShort} components, which reduces recovery time to approximately 2 seconds. To protect against failure of an {OpenShift} node, deploy {ProjectShort} to an {OpenShift} cluster with three or more nodes.

[WARNING]
====
{ProjectShort} high availability (HA) mode is deprecated and is not supported in production environments. {OpenShift} is a highly-available platform, and you can cause issues and complicate debugging in {ProjectShort} if you enable HA mode.
====

With high availability, {Project} ({ProjectShort}) can rapidly recover from failures in its component services. Although {OpenShift} restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of {ProjectShort} components, which reduces recovery time to approximately 2 seconds. To protect against failure of an {OpenShift} node, deploy {ProjectShort} to an {OpenShift} cluster with three or more nodes.

Enabling high availability has the following effects:

* The following components run two pods instead of the default one:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ The following values are available:
| No storage or alerting components are deployed
|===

[NOTE]
====
Newly deployed {ProjectShort} environments as of 1.5.3 default to `use_redhat`. Existing {ProjectShort} deployments created before 1.5.3 default to `use_community`.
====

To migrate an existing {ProjectShort} deployment to `use_redhat`, see https://access.redhat.com/articles/7011708[Migrating STF to fully supported operators].
To migrate an existing {ProjectShort} deployment to `use_redhat`, see the Red Hat Knowledge Base article link:https://access.redhat.com/articles/7011708[Migrating {Project} to fully supported operators].
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ spec:
Older versions of {ProjectShort} would manage Elasticsearch objects for the community supported Elastic Cloud on Kubernetes Operator (ECK). Elasticsearch management functionality is deprecated as of {ProjectShort} 1.5.3. Future versions of Service Telemetry Operator will continue to support forwarding to an existing Elasticsearch instance (which can be deployed and managed by ECK), but will not manage the creation of Elasticsearch objects. When upgrading an {ProjectShort} deployment, any existing Elasticsearch object and deployment will remain intact, but will no longer be managed by {ProjectShort}.

ifeval::["{build}" == "downstream"]
Refer to this article for additional information about https://access.redhat.com/articles/7031236[Using Service Telemetry Framework with Elasticsearch]
For more information about using Elasticsearch with {ProjectShort}, see the Red Hat Knowledge Base article https://access.redhat.com/articles/7031236[Using Service Telemetry Framework with Elasticsearch].
endif::[]

====
Expand Down Expand Up @@ -249,6 +249,11 @@ Use the `graphing` parameter to control the creation of a Grafana instance. By d
[discrete]
== The highAvailability parameter

[WARNING]
====
{ProjectShort} high availability (HA) mode is deprecated and is not supported in production environments. {OpenShift} is a highly-available platform, and you can cause issues and complicate debugging in {ProjectShort} if you enable HA mode.
====

Use the `highAvailability` parameter to control the instantiation of multiple copies of {ProjectShort} components to reduce recovery time of components that fail or are rescheduled. By default, `highAvailability` is disabled. For more information, see xref:high-availability_assembly-advanced-features[].

[id="transports_{context}"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ The amount of resources that you require to run {Project} ({ProjectShort}) depen

.Additional resources

* For recommendations about sizing for metrics collection, see https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling].
* For recommendations about sizing for metrics collection, see the Red Hat Knowledge Base article https://access.redhat.com/articles/4907241[Service Telemetry Framework Performance and Scaling].
24 changes: 15 additions & 9 deletions doc-Service-Telemetry-Framework/modules/con_stf-architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,6 @@ For client side metrics, collectd provides infrastructure metrics without projec

When you collect and store events, collectd and Ceilometer deliver event data to the server side by using the {MessageBus} transport. Another Smart Gateway forwards the data to a user-provided Elasticsearch datastore.

[NOTE]
====
In previous releases of {ProjectShort}, the Service Telemetry Operator requested instances of Elasticsearch from the Elastic Cloud on Kubernetes (ECK) Operator. {ProjectShort} now uses a forwarding model, where events are forwarded from a Smart Gateway instance to a user-provided instance of Elasticsearch. The management of an Elasticsearch instance by Service Telemetry Operator is deprecated.

In new `ServiceTelemetry`deployments, the `observabilityStrategy` parameter has a value of `use_redhat`, that does not request Elasticsearch instances from ECK. Deployments of `ServiceTelemetry` that are version {ProjectShort} 1.5.3 or older have the `observabilityStrategy` parameter set to `use_community`, which matches the previous architecture. If a user deployed an Elasticsearch instance with {ProjectShort}, the Service Telemetry Operator updates the `ServiceTelemetry` custom resource object to have the `observabilityStrategy` parameter set to `use_community`, and functions similar to previous releases. For more information about observability strategies, see xref:observability-strategy-in-service-telemetry-framework_assembly-preparing-your-ocp-environment-for-stf[].

For more information about migration to the `use_redhat` observability strategy, see link:https://access.redhat.com/articles/7011708[Migrating Service Telemetry Framework to fully supported operators].
====

Server-side {ProjectShort} monitoring infrastructure consists of the following layers:

* {Project} {ProductVersion}
Expand All @@ -103,3 +94,18 @@ endif::[]
[[osp-stf-server-side-monitoring]]
.Server-side STF monitoring infrastructure
image::363_OpenStack_STF_updates_0923_deployment_prereq.png[Server-side STF monitoring infrastructure]

== {ProjectShort} Architecture Changes

In releases of {ProjectShort} prior to 1.5.3, the Service Telemetry Operator requested instances of Elasticsearch from the Elastic Cloud on Kubernetes (ECK) Operator. {ProjectShort} now uses a forwarding model, where events are forwarded from a Smart Gateway instance to a user-provided instance of Elasticsearch.

[NOTE]
====
The management of an Elasticsearch instances by Service Telemetry Operator is deprecated.
====

In new `ServiceTelemetry` deployments, the `observabilityStrategy` parameter has a value of `use_redhat`, that does not request Elasticsearch instances from ECK. Deployments of `ServiceTelemetry` with {ProjectShort} version 1.5.2 or older and were updated to 1.5.3 will have the `observabilityStrategy` parameter set to `use_community`, which matches the previous architecture.

If a user previously deployed an Elasticsearch instance with {ProjectShort}, the Service Telemetry Operator updates the `ServiceTelemetry` custom resource object to have the `observabilityStrategy` parameter set to `use_community`, and functions similar to previous releases. For more information about observability strategies, see xref:observability-strategy-in-service-telemetry-framework_assembly-preparing-your-ocp-environment-for-stf[].

It is recommended that users of {ProjectShort} migrate to the `use_redhat` observability strategy. For more information about migration to the `use_redhat` observability strategy, see the Red Hat Knowledge Base article link:https://access.redhat.com/articles/7011708[Migrating Service Telemetry Framework to fully supported operators].
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,5 @@ smart-gateway-operator-58d77dcf7-6xsq7 1/1 Running 0

.Additional resources

For more information about configuring additional clouds or to change the set of supported collectors, see xref:deploying-smart-gateways_assembly-completing-the-stf-configuration[]
* For more information about configuring additional clouds or to change the set of supported collectors, see xref:deploying-smart-gateways_assembly-completing-the-stf-configuration[].
* To migrate an existing {ProjectShort} deployment to `use_redhat`, see the Red Hat Knowledge Base article link:https://access.redhat.com/articles/7011708[Migrating {Project} to fully supported operators].
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,6 @@ If you use the `collectd-write-qdr.yaml` file with a custom `CollectdAmqpInstanc

. Deploy the {OpenStack} overcloud.

ifdef::include_when_17[]
include::con_ansible-based-deployment.adoc[leveloffset=+1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the content of this. It's been awhile since I thought about this work, but I guess the TLDR on the whole thing is that ansible-based tripleo deployments were tech-preview and are never going to be anything more, so we're removing the mention of it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you noticed this, because I kinda felt bad about removing it.

I saw it and it felt a bit awkward removing it, but since it won't ever move out of tech-preview, and we don't really do any significant testing with that deployment method + STF, I thought it might be best to just avoid having customers try and use it, then report issues, and cause us some undue support issues.

I can certainly put it back in, but I feel like we might not want to encourage the use in 17.1.

endif::include_when_17[]

.Additional resources

* For information about how to validate the deployment, see xref:validating-clientside-installation_assembly-completing-the-stf-configuration[].
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,7 @@ EOF
+
[source,bash,options="nowrap",role="white-space-pre"]
----
oc get csv --namespace cert-manager-operator --selector=operators.coreos.com/openshift-cert-manager-operator.cert-manager-operator
oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=cert-manager-operator --selector=operators.coreos.com/openshift-cert-manager-operator.cert-manager-operator

NAME DISPLAY VERSION REPLACES PHASE
cert-manager-operator.v1.12.1 cert-manager Operator for Red Hat OpenShift 1.12.1 cert-manager-operator.v1.12.0 Succeeded
clusterserviceversion.operators.coreos.com/cert-manager-operator.v1.12.1 condition met
----
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

[role="_abstract"]
// https://access.redhat.com/articles/7011708 covers migration to COO from community-operators Prometheus Operator.
The Cluster Observability Operator (COO) must be pre-installed before creating an instance of Service Telemetry Framework (STF) if the `observabilityStrategy` is set to `use_redhat` and the `backends.metrics.prometheus.enabled` is set to `true` in the `ServiceTelemetry` object. For more information about COO, see link:https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/monitoring/cluster_observability_operator/cluster-observability-operator-overview.html[Cluster Observability Operator overview].
The Cluster Observability Operator (COO) must be pre-installed before creating an instance of Service Telemetry Framework (STF) if the `observabilityStrategy` is set to `use_redhat` and the `backends.metrics.prometheus.enabled` is set to `true` in the `ServiceTelemetry` object. For more information about COO, see link:https://docs.openshift.com/container-platform/{NextSupportedOpenShiftVersion}/monitoring/cluster_observability_operator/cluster-observability-operator-overview.html[Cluster Observability Operator overview] in the _OpenShift Container Platform Documentation_.

.Procedure

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,13 @@ grafanadashboard.integreatly.org/rhos-dashboard-1 created
. Import the cloud dashboard:
+
[WARNING]
In the `enable-stf.yaml` file, ensure you set the value of the collectd `virt` plugin parameter `hostname_format` to `name uuid hostname`, otherwise some of the panels on the cloud dashboard display no information. For more information about the `virt` plugin, see link:{defaultURL}/operational_measurements/collectd-plugins_assembly[collectd plugins].
In the `enable-stf.yaml` file, ensure you set the value of the collectd `virt` plugin parameter `hostname_format` to `name uuid hostname`, otherwise some of the panels on the cloud dashboard display no information.
ifdef::include_before_17[]
For more information about the `virt` plugin, see link:{defaultURL}/operational_measurements/collectd-plugins_assembly[collectd plugins].
endif::include_before_17[]
ifdef::include_when_17[]
For more information about the `virt` plugin, see link:{defaultURL}/managing_overcloud_observability/collectd-plugins_assembly[collectd plugins].
endif::include_when_17[]
+
[source,bash,options="nowrap"]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
= Setting a unique cloud domain

[role="_abstract"]
To ensure that {MessageBus} router connections from {OpenStack} ({OpenStackShort}) to {Project} ({ProjectShort}) are unique and do not conflict, configure the `CloudDomain` parameter.
To ensure that telemetry from different {OpenStack} ({OpenStackShort}) clouds to {Project} ({ProjectShort}) can be uniquely identified and do not conflict, configure the `CloudDomain` parameter.

WARNING: Ensure that you do not change host or domain names in an existing deployment. Host and domain name configuration is supported in new cloud deployments only.

Expand Down