You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you're trying to connect to your SageMaker Hyperpod cluster and you see an error "An error occurred (TargetNotConnected)", there's a couple of common causes:
An error occurred (TargetNotConnected) when calling the StartSession operation: sagemaker-cluster:..._controller-machine-i-... is not connected.
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535
To troubleshoot do a few things:
Check your aws credentials are configured for the right account:
aws sts get-caller-identity --query Account --output text
Check to see the region is correct:
aws configure get region
If those don't work, try and ssm into a compute node, you'll need the cluster-id, worker-group name and instance-id which you can get from the aws sagemaker list-cluster-nodes --cluster-name <cluster-name> CLI call.
If you're trying to connect to your SageMaker Hyperpod cluster and you see an error "An error occurred (TargetNotConnected)", there's a couple of common causes:
To troubleshoot do a few things:
If those don't work, try and ssm into a compute node, you'll need the
cluster-id
,worker-group
name andinstance-id
which you can get from theaws sagemaker list-cluster-nodes --cluster-name <cluster-name>
CLI call.Once you're there you can get the ip address of the controller node by running:
That'll show:
Use the CustomerIpAddress
10.1.39.83
to SSH into headnode from that compute node:The text was updated successfully, but these errors were encountered: