Kubernetes Control Plane Charm

stale relation data for sdn-ip affects kubelet clusterDNS

Bug #2022151 reported by Kevin W Monroe on 2023-06-01

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Kubernetes Control Plane Charm	Triaged	High	Unassigned	Kubernetes Control Plane Charm 1.29

Bug Description

We observed a problem where kubernetes-control-plane (k-c-p) leadership changes could lead to stale DNS info being available over the kube-control relation, leading to invalid config in kubernetes-worker units.

Consider the scenario where k-c-p/0 is the leader and discovers the cluster dns service ip to be x.y.z.119. It transmits this data over the kube-control relation, which is eventually consumed by the kubernetes-worker units and written to /root/cdk/kubelet/config.yaml as:

...
clusterDNS:
- x.y.z.119
...

This data originates from the send_cluster_dns_detail handler:

https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/main/reactive/kubernetes_control_plane.py#L1399

Which is gated by the `cdk-addons.configured` flag. This flag is only set on current leaders and the leader's data is always valid. However, the consuming side of the relation gets data from all control plane units as a combined view. This view prefers low relids and low unit names as the source of truth for key values:

https://github.com/juju-solutions/charms.reactive/blob/master/charms/reactive/endpoints.py#L783-L784

If leadership changes to k-c-p/1 and the dns service IP changes, k8s worker units will see both the previous and current IPs on the relation and prefer the old leader's value for sdn-ip (k-c-p/0 unit name < k-c-p/1). This will lead to a mis-configuration of the k8s worker kubelet service.

There are a few ways to fix this:
- adjust charms.reactive to detect when a leader is sending data over a relation and prefer that
- clear relation data keys from k-c-p units on leadership change
- fire send_cluster_dns_detail for all k-c-p units regardless of leadership

Option 3 feels the safest to implement to ensure dns info is consistent for all k-c-p units regardless of leadership.

Kevin W Monroe (kwmonroe) on 2023-06-02

Changed in charm-kubernetes-master:
status:	New → Triaged
importance:	Undecided → High

Kevin W Monroe (kwmonroe) on 2023-06-02

Changed in charm-kubernetes-master:
milestone:	none → 1.28

Revision history for this message

Adam Dyess (addyess) wrote on 2023-08-07:

It seems likely that this bug will be fixed with the re-write in the ops framework.

Changed in charm-kubernetes-master:
milestone:	1.28 → 1.28+ck1

Adam Dyess (addyess) on 2023-09-19

Changed in charm-kubernetes-master:
milestone:	1.28+ck1 → 1.29

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.