can not upgrade calico when kubernetes master units running in LXD

Bug #1982738 reported by Ebrar Leblebici
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Calico Charm
New
Undecided
Unassigned

Bug Description

Hi,

When I'm upgrading the charmed kubernetes 1.19 to 1.24, if the master and worker units are running on VMs, I am able to upgrade the cluster directly to 1.24 successfully.

But if master units are running in LXD (not tested for workers but most probably result will be the same), calico upgrade stucks in "waiting" state.

I have deployed the bundle I attached. In this deployment kubernetes-master units are running in LXD. Per the upgrade documentation [1];

First, I have upgraded containerd, etcd, easyrsa and kubeapi-load-balancer successfully.

juju upgrade-charm containerd --switch ch:containerd --channel 1.24/stable
juju upgrade-charm etcd --switch ch:etcd --channel 1.24/stable
juju upgrade-charm easyrsa --switch ch:easyrsa --channel 1.24/stable
juju upgrade-charm api-lb --switch ch:kubeapi-load-balancer --channel 1.24/stable

But after I run the command below:

juju upgrade-charm calico --switch ch:calico --channel 1.24/stable

Calico units got stuck at "waiting" state and the message was "Waiting to retry BGP peer configuration"

I'm also attaching juju debug logs and calico logs.

[1] https://ubuntu.com/kubernetes/docs/1.24/upgrading

Revision history for this message
Ebrar Leblebici (birru2) wrote :
Revision history for this message
Ebrar Leblebici (birru2) wrote :

Attached calico logs.

Revision history for this message
Ebrar Leblebici (birru2) wrote :

Attached juju debug-log

Revision history for this message
Mateusz Kozakowski (hypeitnow) wrote :

Hi Ebrar,

I am facing the same problem right now, have you found any solution yet?

Revision history for this message
George Kraft (cynerva) wrote :

Before anything else, I should mention that direct upgrades from 1.19 to 1.24 are not supported. We only support and test upgrades across single minor versions, so to get from 1.19 to 1.24, you should first upgrade to 1.20, then 1.21, etc.

From the attached juju debug-log, I see that the install_calico_service handler[1] never ran, and that held up other handlers down the line as well. I think that the calico units are stuck waiting for kubernetes-master to provide a kubeconfig-hash[2, 3], which would set the cni.kubeconfig.available flag[4], which would allow install_calico_service to run. These changes were introduced to both kubernetes-master and calico in Charmed Kubernetes 1.22.

Can you try upgrading the kubernetes-master charm? I suspect doing so will allow the calico units to proceed.

[1]: https://github.com/charmed-kubernetes/layer-calico/blob/a164af47a9824e17742d732675d4edeedabfb159/reactive/calico.py#L257
[2]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/54e02bbe6fc9bd37406574b9f1658f9def26d095/reactive/kubernetes_control_plane.py#L2253
[3]: https://github.com/juju-solutions/interface-kubernetes-cni/blob/362c811cfa106d8b3c889dbe091aa8bd2bcc4b88/provides.py#L78-L81
[4]: https://github.com/juju-solutions/interface-kubernetes-cni/blob/362c811cfa106d8b3c889dbe091aa8bd2bcc4b88/requires.py#L18-L19

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.