Synthetic private minion cannot be launched in EKS

I tried to update the synthetics minion using helm for private location ver 1.0.34 and the statefulset is going in CrashLoopBackOff. The error message I see on pod is this - I could see only 1 volume is bound to the minion

2021-07-26 17:58:07,308 - Minion will use 2 heavy workers

2021-07-26 17:58:07,315 - Minion will use 50 lightweight workers

2021-07-26 17:58:10,519 - Minion Container System: KUBERNETES

2021-07-26 17:58:12,889 - Minion deployment mode: PRIVATE_MINION_POD_KUBERNETES

2021-07-26 17:58:12,983 - One and only one volume is expected to be bound to the minion appdev-nr-private-cpm-synthetics-minion-0 - volumes found: [Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=minion-volume, nfs=null, persistentVolumeClaim=PersistentVolumeClaimVolumeSource(claimName=minion-volume-appdev-nr-private-cpm-synthetics-minion-0, readOnly=null, additionalProperties={}), photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=kube-api-access-vd6k8, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=ProjectedVolumeSource(defaultMode=420, sources=[VolumeProjection(configMap=null, downwardAPI=null, secret=null, serviceAccountToken=ServiceAccountTokenProjection(audience=null, expirationSeconds=3607, path=token, additionalProperties={}), additionalProperties={}), VolumeProjection(configMap=ConfigMapProjection(items=[KeyToPath(key=ca.crt, mode=null, path=ca.crt, additionalProperties={})], name=kube-root-ca.crt, optional=null, additionalProperties={}), downwardAPI=null, secret=null, serviceAccountToken=null, additionalProperties={}), VolumeProjection(configMap=null, downwardAPI=DownwardAPIProjection(items=[DownwardAPIVolumeFile(fieldRef=ObjectFieldSelector(apiVersion=v1, fieldPath=metadata.namespace, additionalProperties={}), mode=null, path=namespace, resourceFieldRef=null, additionalProperties={})], additionalProperties={}), secret=null, serviceAccountToken=null, additionalProperties={})], additionalProperties={}), quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={})]

I am having this problem after I upgraded my cluster to 1.21.1 in EKS. Volume attach is successful -
AttachVolume.Attach succeeded for volume “pvc-081647ec-4830-40a2-b5c3-d7c886d8715e”. Configuration looks good since I see the volume attachment to the node as well as to the pod.
But it still does not see the volume when the minion starts up.
Exec into the pod the volume is mounted even though Pod says it is -

groups: cannot find name for group ID 3729
I have no name!@appdev-nr-private-cpm-synthetics-minion-0:/opt/newrelic/synthetics$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 20G 4.6G 16G 24% /
tmpfs 64M 0 64M 0% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme1n1 9.8G 39M 9.7G 1% /tmp
/dev/nvme0n1p1 20G 4.6G 16G 24% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 3.9G 12K 3.9G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 3.9G 0 3.9G 0% /proc/acpi
tmpfs 3.9G 0 3.9G 0% /sys/firmware

Logs with debug level -

2021-07-26 19:14:10,905 [main] c.n.s.minion.MinionApplication DEBUG Private Location mode activated
2021-07-26 19:14:10,969 [main] c.n.s.c.configs.PopulateFromEnv INFO Minion will use 2 heavy workers
2021-07-26 19:14:10,970 [main] c.n.s.c.configs.PopulateFromEnv INFO Minion will use 50 lightweight workers
2021-07-26 19:14:11,676 [main] c.n.s.minion.MinionApplication DEBUG Configuration not yet available, so can’t work out the Location
2021-07-26 19:14:11,886 [main] c.n.s.m.c.PrivateMinionLaunchCommandBase DEBUG Network Healthcheck is enabled, Private Minion Network Healthcheck will be checked.
2021-07-26 19:14:12,383 [main] c.n.s.c.c.ValidateConfigCommand DEBUG Passed configuration validation check
2021-07-26 19:14:13,992 [main] c.n.s.m.g.ContainerSystemDriverModule INFO Minion Container System: KUBERNETES
2021-07-26 19:14:14,505 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes Version : 1.21+ (v1.21.2-13+d2965f0db10712) - linux/amd64
2021-07-26 19:14:14,568 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes API Version : v1
2021-07-26 19:14:14,568 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes Master URL : https://172.20.0.1:443/
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### MINION POD INFO ##############################################
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Name : appdev-nr-private-cpm-synthetics-minion-0
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Scheduled Node : ip-xxxxxxxxx.ec2.internal
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Available CPUs : 1
2021-07-26 19:14:15,184 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Volumes : [minion-volume, kube-api-access-clwpv]
2021-07-26 19:14:15,185 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Containers Info : [{
Container Name : synthetics-minion
Image : quay.io/newrelic/synthetics-minion:3.0.48
Volume Mounts : [
VolumeName : ‘minion-volume’
subPath : ‘appdev-nr-private-cpm-synthetics-minion/tmp’
MountPath : ‘/tmp’,

VolumeName : ‘kube-api-access-clwpv’
subPath : ‘null’
MountPath : ‘/var/run/secrets/kubernetes.io/serviceaccount’]
}]
2021-07-26 19:14:15,185 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### JOBS IN NAMESPACE appdev ##############################################
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Jobs Info : []
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### MINION KUBERNETES CONFIG ##############################################
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Runner AppArmor profile: null
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG {
“minionPodName” : “appdev-nr-private-cpm-synthetics-minion-0”,
“minionKubernetesNamespace” : “appdev”
}
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG #######################################################################################################
2021-07-26 19:14:16,168 [main] c.n.s.minion.MinionApplication INFO Minion deployment mode: PRIVATE_MINION_POD_KUBERNETES
2021-07-26 19:14:16,279 [main] c.n.s.m.c.k.ContainerSystemDriverKubernetes ERROR One and only one volume is expected to be bound to the minion appdev-nr-private-cpm-synthetics-minion-0 - volumes found: [Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=minion-volume, nfs=null, persistentVolumeClaim=PersistentVolumeClaimVolumeSource(claimName=minion-volume-appdev-nr-private-cpm-synthetics-minion-0, readOnly=null, additionalProperties={}), photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=kube-api-access-clwpv, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=ProjectedVolumeSource(defaultMode=420, sources=[VolumeProjection(configMap=null, downwardAPI=null, secret=null, serviceAccountToken=ServiceAccountTokenProjection(audience=null, expirationSeconds=3607, path=token, additionalProperties={}), additionalProperties={}), VolumeProjection(configMap=ConfigMapProjection(items=[KeyToPath(key=ca.crt, mode=null, path=ca.crt, additionalProperties={})], name=kube-root-ca.crt, optional=null, additionalProperties={}), downwardAPI=null, secret=null, serviceAccountToken=null, additionalProperties={}), VolumeProjection(configMap=null, downwardAPI=DownwardAPIProjection(items=[DownwardAPIVolumeFile(fieldRef=ObjectFieldSelector(apiVersion=v1, fieldPath=metadata.namespace, additionalProperties={}), mode=null, path=namespace, resourceFieldRef=null, additionalProperties={})], additionalProperties={}), secret=null, serviceAccountToken=null, additionalProperties={})], additionalProperties={}), quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={})]

Hey @szd2013,

Thanks for writing about this issue! It is going to be something more and more people face.

We have discovered a compatibility issue with Kubernetes version 1.21+. The only known workaround is to use cluster version 1.20.x with EKS.

In clusters with access to the control plane, i.e. not EKS:

A workaround is available by disabling the BoundServiceAccountTokenVolume feature gate on the cluster.

When this feature is activated, it creates a “projected volume instead of a secret-based volume”. We require the use of a secret-based volume. This may be why the minion pod keeps restarting. More details in K8s docs

Describing the minion pod:

Volumes:
  kube-api-access-t84dr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true

Our K8s CPM requirements doc also mentions this issue.

I’m currently exploring the possibility of using eksctl to set the feature-gates on an EKS cluster, but so far I’ve been unable to affect any change to the feature-gates that get set on cluster creation.

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: kmullaney-k8s-cpm-1-21-test
  region: us-west-2
  version: "1.21"

nodeGroups:
  - name: ng1
    instanceType: m5.large
    minSize: 1
    desiredCapacity: 1
    maxSize: 2
    volumeSize: 50
    volumeType: gp3
    volumeEncrypted: false
    availabilityZones: ["us-west-2c", "us-west-2b", "us-west-2d"]
    ssh:
      allow: true
    kubeletExtraConfig:
      featureGates:
        RotateKubeletServerCertificate: true
        BoundServiceAccountTokenVolume: false

I would have thought the above yaml would do the trick, but no dice.

Note: To see feature-gates set in EKS, enable the api logging type:

cloudWatch:
  clusterLogging:
    enableTypes:
      - "api"

Then scroll to the very beginning of the CW log group and you’ll see something like:

2021-07-30T09:26:59.000-07:00	I0730 16:26:59.786979 1 flags.go:59] FLAG: --feature-gates="CSIServiceAccountToken=true,ExternalKeyService=true,TTLAfterFinished=true"

If you find a way to set the feature-flags in EKS, I’m sure others would love to discover your method here!

I just found this thread. I’m facing the exact same issue trying to deploy the private minion into one of my EKS clusters. I’m going to work with our team to see if we can disable BoundServiceAccountTokenVolume in our environment however it would be great to have a fix for this sooner rather than later.

We will keep the community posted if there is any update on this but in the meantime, please let us know how that works out for you and your team!

Hi,
Any update? I am facing similar issue.

Thanks.

Hi @cchokshi,

Other than the two options mentioned earlier (disable the BoundServiceAccountTokenVolume feature gate, or downgrade the K8s cluster to 1.20), our product manager does not anticipate having a fix for this issue until sometime next year. Though it’s possible it could arrive sooner if more and more people are affected by it, which I anticipate will happen as people upgrade to K8s 1.21+.

Since this issue is a blocker for some, especially those using EKS or OpenShift where the feature gate cannot be disabled, I wanted to be as clear as I could about the timeframe. Though it’s as clear as mud, sometime next year is our current best estimate based on the workload already assigned to the engineering team.

Please contact your account rep, if you have one, to try and escalate this issue. I will continue to do the same from within New Relic Support to help prioritize a fix.

Hi @kmullaney . This is a total blocker for us. We’re using AKS where you cannot switch off the BoundServiceAccountTokenVolume feature gate. AKS ver 1.20 will be end of life Feb 2022 (https://docs.microsoft.com/en-us/azure/aks/supported-kubernetes-versions?tabs=azure-cli#aks-kubernetes-release-calendar) , nevertheless we already switched to 1.21 due to volume snapshot feature, so we’re totally screwed. Any chance to prioritize this issue ?

Hi @kmullaney, This has been a blocker for us where we have upgraded all our EKS clusters to 1.21. I have already escalated this to our account rep but there is no assurance of fix in the near future.
We have moved to having an EC2 instance with docker CPM but we lose scalability and the features that StatefulSet and pod minion offers. We expect a fix ASAP. I have reported this a long time ago. It should be in your roadmap at the top.

Thanks!

Hey @cin and @szd2013,

That’s totally reasonable. The escalations you’ve made via your account reps are also a good step to help prioritize this.

I couldn’t agree more that this kind of environmental blocker should be addressed quickly. I’ve been advocating for it and bringing it up in every meeting I have with the Synthetics engineering team.

If I hear anything new, I’ll keep ya’ll posted.

I appreciate that this is on your product manager, but this is a totally unacceptable timeline. 1.21 has been GA for 6 months and is only a few months out from being in maintenance mode.

Hi @christopher.duffin,

Thanks for contributing here! I’ve made sure everyone involved is aware of the Kubernetes EOL timeline for 1.20+.

1 Like

We are experiencing this issue also. We are trying to upgrade to 3.0.57 to mitigate CVE-2021-44228. Has anyone found a workaround?

Hi @pgrant1 ,

Hope the week has been going well so far! Thanks for posting!
Currently as mentioned earlier in this post that there is active advocation for this and those in the Synthetics engineering team are aware of the end of life for Kubernetes 1.20+

v3.0.57 does remediate CVE-2021-45046 - there are some great steps on updating the CPM here

Understood, I have read the steps on updating the CPM but we are using EKS 1.21, so this documentation is useless to us or am I missing something?

Hey @pgrant1 - hope the new year has been going well so far!

As @kmullaney mentioned we are actively advocating every-time we can as posted in earlier posts to unblock for EKS 1.21 - the Synthetics Engineering team is also still aware of Kuberenetes EOL timeline for 1.20+.

1 Like

That’s great news.

…Do you have an ETA on this perhaps?

Thanks,
Paul

An ETA would be good to have since we’re seeing the same issue as well.

Also blocked by this, any update appreciated.

+1 for not able to deploy private minion on EKS