Istio standard metrics do not increase after 1hour or so even though real time traffic is flowing #51100

PGpmg · 2024-05-13T07:20:49Z

Hi,

We have recently upgraded Istio from 1.16.4 to 1.19.1 and started noticing an issue where Istio standard metrics such as istio_tcp_received_bytes_total, istio_tcp_sent_bytes_total counters do not increase after 1 hour or so of deployment.

Once we restart the application k8s pods or update the istio configmap by removing and adding back defaultProviders:metrics:prometheus, istio standard metrics start working again but stops after sometime.

current Istio version: 1.19.1
K8s version: 1.26

We have noticed the same issue with Istio 1.21.2 version in our environment.

Any leads to resolve this issue would be highly appreciated.

SanjayaKumarSahoo · 2024-05-15T08:24:33Z

Request to help on this since the issue happenning in sporadic manner.

kyessenov · 2024-05-15T17:33:35Z

CC @zirain This sounds like related to metric rotation? We need prometheus to scrape often enough.

zirain · 2024-05-15T17:52:21Z

metric rotation disabled by default.

SanjayaKumarSahoo · 2024-05-16T10:10:12Z

Thanks for the input,

We tried to put env variable METRIC_ROTATION_INTERVAL as 10s in pilot config, intially the metrics started flowing in then observed that the metrics stopped coming. So we have to removet the above env variable.

When we do port-forwarding to envoy metrics of pod "http://{host}:15090/stats/prometheus", we observed that the TCP sent / received bytes counters are not incresing, even if the data processing is happenning.

Request to help on this.

zirain · 2024-05-16T12:07:48Z

istio/api#3121 (comment)

PGpmg · 2024-05-27T10:12:52Z

While investigating this issue in detail, discovered a behavioural change between the 1.18.7 and 1.19.0 versions.
Tested with a sample application which writes data to the DB continuously in loop.

Observations with 1.18.7 and below versions:
As application writes data to the DB, we could see the Istio tcp standard metrics getting populated immediately and keeps increasing.

PG@C02G40YWMD6M istio-1.18.7 % bin/istioctl x es pvos-switch-state-publisher-c4f47bf8d-6r7qn.acp-system -oprom | grep _bytes_total
TYPE envoy_cluster_upstream_cx_rx_bytes_total counter
envoy_cluster_upstream_cx_rx_bytes_total{cluster_name="xds-grpc"} 102980
TYPE envoy_cluster_upstream_cx_tx_bytes_total counter
envoy_cluster_upstream_cx_tx_bytes_total{cluster_name="xds-grpc"} 38003
TYPE istio_tcp_received_bytes_total counter
istio_tcp_received_bytes_total{reporter="source",source_workload="pvos-switch-state-publisher",source_canonical_service="pvos-switch-state-publisher",source_canonical_revision="latest",source_workload_namespace="acp-system",source_principal="unknown",sou
rce_app="pvos-switch-state-publisher",source_version="",source_cluster="Kubernetes",destination_workload="cnx-arango-cluster-crdn-4e7dkxzr-53ae10",destination_workload_namespace="default",destination_principal="unknown",destination_app="unknown",destinat
ion_version="unknown",destination_service="cnx-arango-cluster.default.svc.cluster.local",destination_canonical_service="arangodb",destination_canonical_revision="latest",destination_service_name="cnx-arango-cluster",destination_service_namespace="default
",destination_cluster="Kubernetes",request_protocol="tcp",response_flags="-",connection_security_policy="unknown"} 81077
TYPE istio_tcp_sent_bytes_total counter
istio_tcp_sent_bytes_total{reporter="source",source_workload="pvos-switch-state-publisher",source_canonical_service="pvos-switch-state-publisher",source_canonical_revision="latest",source_workload_namespace="acp-system",source_principal="unknown",source_app="pvos-switch-state-publisher",source_version="",source_cluster="Kubernetes",destination_workload="cnx-arango-cluster-crdn-4e7dkxzr-53ae10",destination_workload_namespace="default",destination_principal="unknown",destination_app="unknown",destination_version="unknown",destination_service="cnx-arango-cluster.default.svc.cluster.local",destination_canonical_service="arangodb",destination_canonical_revision="latest",destination_service_name="cnx-arango-cluster",destination_service_namespace="default",destination_cluster="Kubernetes",request_protocol="tcp",response_flags="-",connection_security_policy="unknown"} 110884

Observations with 1.19.0 and above versions till 1.22.0:
As application writes data to the DB, we could see the Istio tcp standard metrics getting populated only after connection gets added to cleanup list. As part of istio proxy debug logs, we could see log "adding to cleanup list".
Until the connection is terminate, we don't see the Istio standard metrics.

PG@C02G40YWMD6M istio-1.19.0 % bin/istioctl x es pvos-switch-state-publisher-c4f47bf8d-l72wr.acp-system -oprom | grep _bytes_total
TYPE envoy_cluster_upstream_cx_rx_bytes_total counter
envoy_cluster_upstream_cx_rx_bytes_total{cluster_name="xds-grpc"} 167743
TYPE envoy_cluster_upstream_cx_tx_bytes_total counter
envoy_cluster_upstream_cx_tx_bytes_total{cluster_name="xds-grpc"} 41407

NOTE: till Istio 1.18.7, in our clusters, stats are being captured properly. There is no issues of metrics not being reported.
But with 1.19.0 version onwards, Istio stops reporting after 1hour or so.

Could you please help understand this behavioural change in detail and could this be a concern in getting the proper stats?

SanjayaKumarSahoo · 2024-05-27T14:51:17Z

Hi @kyessenov, in 1.19 we can see there is PR (istio/proxy#4887) raised on metadat exchange, do you see this could introduce the above bevaior ? Request your input on this.

PGpmg · 2024-05-31T10:45:40Z

Found an issue in the istio proxy code. Please check.

Issue:
As part of updating the peerId in case of metadata not found, key used is "was.envoy.wasm.metadata_exchange.peer_unknown" (Here, wasm prefix is added later in updatePeerId() function). But while fetching the peerInfo, "envoy.wasm.metadata_exchange.peer_unknown" key is used without "wasm" prefix.

Updating peer id:
https://github.com/istio/proxy/blob/1.19.0/source/extensions/filters/network/metadata_exchange/metadata_exchange.cc#L314

Fetching the peerInfo:
https://github.com/istio/proxy/blob/1.19.0/source/extensions/filters/http/istio_stats/istio_stats.cc#L97

This setting of "envoy.wasm.metadata_exchange.peer_unknown" changes were added in 1.19.0, before the keys were either downstream_peer_id/upstream_peer_id. Hence, stats were coming with till 1.18.7 versions and stopped from 1.19.0 version.

As @SanjayaKumarSahoo pointed out, this got introduced as part of PR istio/proxy#4887.
@kyessenov , Could you please help fix this issue ASAP?

PGpmg · 2024-06-06T03:21:50Z

any update on this issue please?

zirain · 2024-06-06T03:40:34Z

kuat is out, will take a looks if I have bandwith, I think it happen after 1h because the idle time is default to 1h.

zirain added the area/extensions and telemetry label May 15, 2024

zirain pinned this issue May 16, 2024

zirain unpinned this issue May 16, 2024

zirain transferred this issue from istio/istio.io May 16, 2024

zirain linked a pull request Jun 6, 2024 that will close this issue

correct peer_unkown key istio/proxy#5592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Istio standard metrics do not increase after 1hour or so even though real time traffic is flowing #51100

Istio standard metrics do not increase after 1hour or so even though real time traffic is flowing #51100

PGpmg commented May 13, 2024

SanjayaKumarSahoo commented May 15, 2024

kyessenov commented May 15, 2024

zirain commented May 15, 2024

SanjayaKumarSahoo commented May 16, 2024 •

edited

zirain commented May 16, 2024

PGpmg commented May 27, 2024 •

edited

SanjayaKumarSahoo commented May 27, 2024 •

edited

PGpmg commented May 31, 2024 •

edited

PGpmg commented Jun 6, 2024

zirain commented Jun 6, 2024 •

edited

Istio standard metrics do not increase after 1hour or so even though real time traffic is flowing #51100

Istio standard metrics do not increase after 1hour or so even though real time traffic is flowing #51100

Comments

PGpmg commented May 13, 2024

SanjayaKumarSahoo commented May 15, 2024

kyessenov commented May 15, 2024

zirain commented May 15, 2024

SanjayaKumarSahoo commented May 16, 2024 • edited

zirain commented May 16, 2024

PGpmg commented May 27, 2024 • edited

SanjayaKumarSahoo commented May 27, 2024 • edited

PGpmg commented May 31, 2024 • edited

PGpmg commented Jun 6, 2024

zirain commented Jun 6, 2024 • edited

SanjayaKumarSahoo commented May 16, 2024 •

edited

PGpmg commented May 27, 2024 •

edited

SanjayaKumarSahoo commented May 27, 2024 •

edited

PGpmg commented May 31, 2024 •

edited

zirain commented Jun 6, 2024 •

edited