Synthetic private minion cannot be launched in EKS

I tried to update the synthetics minion using helm for private location ver 1.0.34 and the statefulset is going in CrashLoopBackOff. The error message I see on pod is this - I could see only 1 volume is bound to the minion

2021-07-26 17:58:07,308 - Minion will use 2 heavy workers

2021-07-26 17:58:07,315 - Minion will use 50 lightweight workers

2021-07-26 17:58:10,519 - Minion Container System: KUBERNETES

2021-07-26 17:58:12,889 - Minion deployment mode: PRIVATE_MINION_POD_KUBERNETES

2021-07-26 17:58:12,983 - One and only one volume is expected to be bound to the minion appdev-nr-private-cpm-synthetics-minion-0 - volumes found: [Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=minion-volume, nfs=null, persistentVolumeClaim=PersistentVolumeClaimVolumeSource(claimName=minion-volume-appdev-nr-private-cpm-synthetics-minion-0, readOnly=null, additionalProperties={}), photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=kube-api-access-vd6k8, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=ProjectedVolumeSource(defaultMode=420, sources=[VolumeProjection(configMap=null, downwardAPI=null, secret=null, serviceAccountToken=ServiceAccountTokenProjection(audience=null, expirationSeconds=3607, path=token, additionalProperties={}), additionalProperties={}), VolumeProjection(configMap=ConfigMapProjection(items=[KeyToPath(key=ca.crt, mode=null, path=ca.crt, additionalProperties={})], name=kube-root-ca.crt, optional=null, additionalProperties={}), downwardAPI=null, secret=null, serviceAccountToken=null, additionalProperties={}), VolumeProjection(configMap=null, downwardAPI=DownwardAPIProjection(items=[DownwardAPIVolumeFile(fieldRef=ObjectFieldSelector(apiVersion=v1, fieldPath=metadata.namespace, additionalProperties={}), mode=null, path=namespace, resourceFieldRef=null, additionalProperties={})], additionalProperties={}), secret=null, serviceAccountToken=null, additionalProperties={})], additionalProperties={}), quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={})]

I am having this problem after I upgraded my cluster to 1.21.1 in EKS. Volume attach is successful -
AttachVolume.Attach succeeded for volume “pvc-081647ec-4830-40a2-b5c3-d7c886d8715e”. Configuration looks good since I see the volume attachment to the node as well as to the pod.
But it still does not see the volume when the minion starts up.
Exec into the pod the volume is mounted even though Pod says it is -

groups: cannot find name for group ID 3729
I have no name!@appdev-nr-private-cpm-synthetics-minion-0:/opt/newrelic/synthetics$ df -h
Filesystem Size Used Avail Use% Mounted on
overlay 20G 4.6G 16G 24% /
tmpfs 64M 0 64M 0% /dev
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/nvme1n1 9.8G 39M 9.7G 1% /tmp
/dev/nvme0n1p1 20G 4.6G 16G 24% /etc/hosts
shm 64M 0 64M 0% /dev/shm
tmpfs 3.9G 12K 3.9G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 3.9G 0 3.9G 0% /proc/acpi
tmpfs 3.9G 0 3.9G 0% /sys/firmware

Logs with debug level -

2021-07-26 19:14:10,905 [main] c.n.s.minion.MinionApplication DEBUG Private Location mode activated
2021-07-26 19:14:10,969 [main] c.n.s.c.configs.PopulateFromEnv INFO Minion will use 2 heavy workers
2021-07-26 19:14:10,970 [main] c.n.s.c.configs.PopulateFromEnv INFO Minion will use 50 lightweight workers
2021-07-26 19:14:11,676 [main] c.n.s.minion.MinionApplication DEBUG Configuration not yet available, so can’t work out the Location
2021-07-26 19:14:11,886 [main] c.n.s.m.c.PrivateMinionLaunchCommandBase DEBUG Network Healthcheck is enabled, Private Minion Network Healthcheck will be checked.
2021-07-26 19:14:12,383 [main] c.n.s.c.c.ValidateConfigCommand DEBUG Passed configuration validation check
2021-07-26 19:14:13,992 [main] c.n.s.m.g.ContainerSystemDriverModule INFO Minion Container System: KUBERNETES
2021-07-26 19:14:14,505 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes Version : 1.21+ (v1.21.2-13+d2965f0db10712) - linux/amd64
2021-07-26 19:14:14,568 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes API Version : v1
2021-07-26 19:14:14,568 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Kubernetes Master URL : https://172.20.0.1:443/
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### MINION POD INFO ##############################################
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Name : appdev-nr-private-cpm-synthetics-minion-0
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Scheduled Node : ip-xxxxxxxxx.ec2.internal
2021-07-26 19:14:15,183 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Available CPUs : 1
2021-07-26 19:14:15,184 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Volumes : [minion-volume, kube-api-access-clwpv]
2021-07-26 19:14:15,185 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Containers Info : [{
Container Name : synthetics-minion
Image : quay.io/newrelic/synthetics-minion:3.0.48
Volume Mounts : [
VolumeName : ‘minion-volume’
subPath : ‘appdev-nr-private-cpm-synthetics-minion/tmp’
MountPath : ‘/tmp’,

VolumeName : ‘kube-api-access-clwpv’
subPath : ‘null’
MountPath : ‘/var/run/secrets/kubernetes.io/serviceaccount’]
}]
2021-07-26 19:14:15,185 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### JOBS IN NAMESPACE appdev ##############################################
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Jobs Info : []
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG ################################### MINION KUBERNETES CONFIG ##############################################
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG Runner AppArmor profile: null
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG {
“minionPodName” : “appdev-nr-private-cpm-synthetics-minion-0”,
“minionKubernetesNamespace” : “appdev”
}
2021-07-26 19:14:15,282 [main] c.n.s.m.g.ContainerSystemDriverModule DEBUG #######################################################################################################
2021-07-26 19:14:16,168 [main] c.n.s.minion.MinionApplication INFO Minion deployment mode: PRIVATE_MINION_POD_KUBERNETES
2021-07-26 19:14:16,279 [main] c.n.s.m.c.k.ContainerSystemDriverKubernetes ERROR One and only one volume is expected to be bound to the minion appdev-nr-private-cpm-synthetics-minion-0 - volumes found: [Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=minion-volume, nfs=null, persistentVolumeClaim=PersistentVolumeClaimVolumeSource(claimName=minion-volume-appdev-nr-private-cpm-synthetics-minion-0, readOnly=null, additionalProperties={}), photonPersistentDisk=null, portworxVolume=null, projected=null, quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={}), Volume(awsElasticBlockStore=null, azureDisk=null, azureFile=null, cephfs=null, cinder=null, configMap=null, csi=null, downwardAPI=null, emptyDir=null, fc=null, flexVolume=null, flocker=null, gcePersistentDisk=null, gitRepo=null, glusterfs=null, hostPath=null, iscsi=null, name=kube-api-access-clwpv, nfs=null, persistentVolumeClaim=null, photonPersistentDisk=null, portworxVolume=null, projected=ProjectedVolumeSource(defaultMode=420, sources=[VolumeProjection(configMap=null, downwardAPI=null, secret=null, serviceAccountToken=ServiceAccountTokenProjection(audience=null, expirationSeconds=3607, path=token, additionalProperties={}), additionalProperties={}), VolumeProjection(configMap=ConfigMapProjection(items=[KeyToPath(key=ca.crt, mode=null, path=ca.crt, additionalProperties={})], name=kube-root-ca.crt, optional=null, additionalProperties={}), downwardAPI=null, secret=null, serviceAccountToken=null, additionalProperties={}), VolumeProjection(configMap=null, downwardAPI=DownwardAPIProjection(items=[DownwardAPIVolumeFile(fieldRef=ObjectFieldSelector(apiVersion=v1, fieldPath=metadata.namespace, additionalProperties={}), mode=null, path=namespace, resourceFieldRef=null, additionalProperties={})], additionalProperties={}), secret=null, serviceAccountToken=null, additionalProperties={})], additionalProperties={}), quobyte=null, rbd=null, scaleIO=null, secret=null, storageos=null, vsphereVolume=null, additionalProperties={})]

Hey @szd2013,

Thanks for writing about this issue! It is going to be something more and more people face.

We have discovered a compatibility issue with Kubernetes version 1.21+. The only known workaround is to use cluster version 1.20.x with EKS.

In clusters with access to the control plane, i.e. not EKS:

A workaround is available by disabling the BoundServiceAccountTokenVolume feature gate on the cluster.

When this feature is activated, it creates a “projected volume instead of a secret-based volume”. We require the use of a secret-based volume. This may be why the minion pod keeps restarting. More details in K8s docs

Describing the minion pod:

Volumes:
  kube-api-access-t84dr:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true

Our K8s CPM requirements doc also mentions this issue.

I’m currently exploring the possibility of using eksctl to set the feature-gates on an EKS cluster, but so far I’ve been unable to affect any change to the feature-gates that get set on cluster creation.

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: kmullaney-k8s-cpm-1-21-test
  region: us-west-2
  version: "1.21"

nodeGroups:
  - name: ng1
    instanceType: m5.large
    minSize: 1
    desiredCapacity: 1
    maxSize: 2
    volumeSize: 50
    volumeType: gp3
    volumeEncrypted: false
    availabilityZones: ["us-west-2c", "us-west-2b", "us-west-2d"]
    ssh:
      allow: true
    kubeletExtraConfig:
      featureGates:
        RotateKubeletServerCertificate: true
        BoundServiceAccountTokenVolume: false

I would have thought the above yaml would do the trick, but no dice.

Note: To see feature-gates set in EKS, enable the api logging type:

cloudWatch:
  clusterLogging:
    enableTypes:
      - "api"

Then scroll to the very beginning of the CW log group and you’ll see something like:

2021-07-30T09:26:59.000-07:00	I0730 16:26:59.786979 1 flags.go:59] FLAG: --feature-gates="CSIServiceAccountToken=true,ExternalKeyService=true,TTLAfterFinished=true"

If you find a way to set the feature-flags in EKS, I’m sure others would love to discover your method here!

I just found this thread. I’m facing the exact same issue trying to deploy the private minion into one of my EKS clusters. I’m going to work with our team to see if we can disable BoundServiceAccountTokenVolume in our environment however it would be great to have a fix for this sooner rather than later.

We will keep the community posted if there is any update on this but in the meantime, please let us know how that works out for you and your team!