Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA Unable to Retrieve correct Kafka Metrics from ScaledObject on GKE #5730

Open
converge opened this issue Apr 23, 2024 · 5 comments
Open
Labels
bug Something isn't working

Comments

@converge
Copy link

Report

KEDA is unable to retrieve metrics correctly from a ScaledObject/ScaleTarget using a Kafka trigger when deployed to a GKE cluster (It works locally)

Expected Behavior

When HPA calculates the current metric value, it should not return -9223372036854775808m, but a valid Kafka lag.

Actual Behavior

When the Kafka ScaledObject is deployed to GKE:

  • The HorizontalPodAutoscaler reports a negative metric value (9223372036854775808m)

Steps to Reproduce the Problem

  1. deploy Keda in a GKE cluster
  2. create a ScaledObject with Kafka Source as Scaled Target
  3. kubectl get hpa -A (will show the metric with the negative value)

Logs from KEDA operator

There is no error or warning in the Keda operator.

KEDA Version

2.13.1

Kubernetes Version

1.27

Platform

Google Cloud

Scaler Details

Kafka

Anything else?

No response

@converge converge added the bug Something isn't working label Apr 23, 2024
@SpiritZhou
Copy link
Contributor

Could you change the log-level to debug in operator and send the operator logs?

@dttung2905
Copy link
Contributor

In addition to the log, could you provide the ScaledObject yaml config too ?

@converge
Copy link
Author

converge commented Jun 5, 2024

sorry for the delay, I still have this issue and created a new GCP/GKE cluster specially to debug it. I was able to reproduce the issue and got the logs from the exact moment where the current metric value switch in the HPA from <unknown> to -9223372036854775808m.

the logs from the controller when this switch happened:

2024-06-05T19:43:42Z	DEBUG	kafka_scaler	Kafka scaler: Providing metrics based on totalLag 1, topicPartitions 1, threshold 60	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:43:42Z	DEBUG	scale_handler	Getting metrics and activity from scaler	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics", "metrics": [{"metricName":"s0-kafka-knative-group-topics","metricLabels":null,"timestamp":"2024-06-05T19:43:42Z","value":"1"}], "activity": true, "scalerError": null}

2024-06-05T19:43:42Z	DEBUG	scale_handler	Scaler for scaledObject is active	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics"}
2024-06-05T19:43:42Z	INFO	scaleexecutor	Successfully updated ScaleTarget	{"scaledobject.Name": "kafka-scaledobject", "scaledObject.Namespace": "app", "scaleTarget.Name": "kafka-source", "Original Replicas Count": 0, "New Replicas Count": 1}

2024-06-05T19:43:42Z	DEBUG	events	Scaled sources.knative.dev/v1beta1.kafkasource app/kafka-source from 0 to 1	{"type": "Normal", "object": {"kind":"ScaledObject","namespace":"app","name":"kafka-scaledobject","uid":"01537a1d-1543-48ea-adce-e7c3069fff86","apiVersion":"keda.sh/v1alpha1","resourceVersion":"314645"}, "reason": "KEDAScaleTargetActivated"}
2024-06-05T19:44:12Z	DEBUG	kafka_scaler	with topic name [knative-demo-topic] the list of topic metadata is [0xc000a1b400]	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:44:12Z	DEBUG	kafka_scaler	Kafka scaler: Providing metrics based on totalLag 1, topicPartitions 1, threshold 60	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:44:12Z	DEBUG	scale_handler	Getting metrics and activity from scaler	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics", "metrics": [{"metricName":"s0-kafka-knative-group-topics","metricLabels":null,"timestamp":"2024-06-05T19:44:12Z","value":"1"}], "activity": true, "scalerError": null}
2024-06-05T19:44:12Z	DEBUG	scale_handler	Scaler for scaledObject is active	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics"}
2024-06-05T19:44:42Z	DEBUG	kafka_scaler	with topic name [knative-demo-topic] the list of topic metadata is [0xc000ae9400]	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:44:42Z	DEBUG	kafka_scaler	Kafka scaler: Providing metrics based on totalLag 1, topicPartitions 1, threshold 60	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:44:42Z	DEBUG	scale_handler	Getting metrics and activity from scaler	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics", "metrics": [{"metricName":"s0-kafka-knative-group-topics","metricLabels":null,"timestamp":"2024-06-05T19:44:42Z","value":"1"}], "activity": true, "scalerError": null}
2024-06-05T19:44:42Z	DEBUG	scale_handler	Scaler for scaledObject is active	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "scaler": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics"}
2024-06-05T19:44:52Z	DEBUG	kafka_scaler	with topic name [knative-demo-topic] the list of topic metadata is [0xc000460050]	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:44:52Z	DEBUG	kafka_scaler	Kafka scaler: Providing metrics based on totalLag 1, topicPartitions 1, threshold 60	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}

2024-06-05T19:44:52Z	DEBUG	scale_handler	Getting metrics from trigger	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "trigger": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics", "metrics": [{"metricName":"s0-kafka-knative-group-topics","metricLabels":null,"timestamp":"2024-06-05T19:44:52Z","value":"1"}], "scalerError": null}
2024-06-05T19:44:52Z	DEBUG	grpc_server	Providing metrics	{"scaledObjectName": "kafka-scaledobject", "scaledObjectNamespace": "app", "metrics": "&ExternalMetricValueList{ListMeta:{   <nil>},Items:[]ExternalMetricValue{ExternalMetricValue{MetricName:s0-kafka-knative-group-topics,MetricLabels:map[string]string{},Timestamp:2024-06-05 19:44:52.805581005 +0000 UTC m=+3545.120570672,WindowSeconds:nil,Value:{{1000 -3} {<nil>}  DecimalSI},},},}"}
2024-06-05T19:45:08Z	DEBUG	kafka_scaler	with topic name [knative-demo-topic] the list of topic metadata is [0xc000550c30]	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:45:08Z	DEBUG	kafka_scaler	Kafka scaler: Providing metrics based on totalLag 1, topicPartitions 1, threshold 60	{"type": "ScaledObject", "namespace": "app", "name": "kafka-scaledobject"}
2024-06-05T19:45:08Z	DEBUG	scale_handler	Getting metrics from trigger	{"scaledObject.Namespace": "app", "scaledObject.Name": "kafka-scaledobject", "trigger": "kafkaScaler", "metricName": "s0-kafka-knative-group-topics", "metrics": [{"metricName":"s0-kafka-knative-group-topics","metricLabels":null,"timestamp":"2024-06-05T19:45:08Z","value":"1"}], "scalerError": null}
2024-06-05T19:45:08Z	DEBUG	grpc_server	Providing metrics	{"scaledObjectName": "kafka-scaledobject", "scaledObjectNamespace": "app", "metrics": "&ExternalMetricValueList{ListMeta:{   <nil>},Items:[]ExternalMetricValue{ExternalMetricValue{MetricName:s0-kafka-knative-group-topics,MetricLabels:map[string]string{},Timestamp:2024-06-05 19:45:08.16711765 +0000 UTC m=+3560.482107310,WindowSeconds:nil,Value:{{1000 -3} {<nil>}  DecimalSI},},},}"}

the scaled object:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaledobject
  namespace: app
  annotations:
    scaledobject.keda.sh/transfer-hpa-ownership: "true"
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      name: notify-friends
  scaleTargetRef:
    apiVersion: sources.knative.dev/v1beta1
    name: kafka-source
    kind: kafkasource
  pollingInterval: 30
  minReplicaCount: 1
  maxReplicaCount: 11
  idleReplicaCount: 0
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: pkc-<...>.gcp.confluent.cloud:9092
        consumerGroup: knative-group
        topic:
        lagThreshold: '60'
        offsetResetPolicy: earliest
        scaleToZeroOnInvalidOffset: "false"
        tls: enable
        sasl: plaintext
      authenticationRef:
        name: keda-kafka-credentials

@converge
Copy link
Author

converge commented Jun 5, 2024

found people with similar issue https://kubernetes.slack.com/archives/CKZJ36A5D/p1709761505122509

@converge
Copy link
Author

converge commented Jun 7, 2024

@SpiritZhou @dttung2905 yesterday I created an EKS cluster with the same GCP setup/versions, and it works perfectly. Can you think of anything that could be different for GCP and AWS? any kind of blocker or anything that could be causing the issue?

I have also followed the k8s events and couldn't find anything bad in GCP.
Wanted to look for horizontal pod autoscaler logs, but couldn't find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants