ceph osd fails on worker nodes not on master-server name not found: ceph-mon-discovery.ceph.svc.cluster.local

Bug #1765014 reported by chinasubbareddy mallavarapu

This bug report was converted into a question: question #668055: ceph osd fails on worker nodes not on master-server name not found: ceph-mon-discovery.ceph.svc.cluster.local.

6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openstack-helm
Invalid
Undecided
Unassigned

Bug Description

when setting up multinode , ceph osd pods getting failed on worker node not on master node .

root@ceph-mon1:~# kubectl get po -n ceph -o wide |grep osd
ceph-osd-default-83945928-5czl6 0/1 CrashLoopBackOff 432 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-9psxt 0/1 CrashLoopBackOff 432 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-osd-default-83945928-kg5t6 1/1 Running 0 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal

here are logs from them :

root@ceph-mon1:~# kubectl logs ceph-osd-default-83945928-5czl6 -n ceph
LAUNCHING OSD: in directory:directory mode
+ echo 'LAUNCHING OSD: in directory:directory mode'
+ exec /tmp/osd-directory.sh
+ export LC_ALL=C
+ LC_ALL=C
+ : ceph2
+ : 'root=default host=ceph2'
+ : /var/lib/ceph/osd/ceph
+ : /var/lib/ceph/journal
+ : /var/lib/ceph/bootstrap-osd/ceph.keyring
+ is_available rpm
+ command -v rpm
+ is_available dpkg
+ command -v dpkg
+ OS_VENDOR=ubuntu
+ source /etc/default/ceph
++ TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=134217728
++ ceph -v
++ egrep -q '12.2|luminous'
++ echo 0
+ [[ 0 -ne 0 ]]
+ [[ ! -d /var/lib/ceph/osd ]]
+ '[' -z ceph2 ']'
++ find /var/lib/ceph/osd -prune -empty
+ [[ -n /var/lib/ceph/osd ]]
+ echo 'Creating osd'
Creating osd
++ uuidgen
+ UUID=5f7a4e0d-3de6-4620-bd94-6f8676a06b6c
++ ceph-authtool --gen-print-key
+ OSD_SECRET=AQCQJtdaO4gCNRAAqLweK/IhObI5EKAvYZ0Rpg==
++ echo '{"cephx_secret": "AQCQJtdaO4gCNRAAqLweK/IhObI5EKAvYZ0Rpg=="}'
++ ceph osd new 5f7a4e0d-3de6-4620-bd94-6f8676a06b6c -i - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring
unable to parse addrs in 'ceph-mon-discovery.ceph.svc.cluster.local'
InvalidArgumentError does not take keyword arguments
+ OSD_ID='server name not found: ceph-mon-discovery.ceph.svc.cluster.local (Temporary failure in name resolution)'

we could resolve the name successfully , not sure why it is failing .

root@ceph-mon1:~# nslookup ceph-mon-discovery.ceph.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53

Non-authoritative answer:
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.5
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.3
Name: ceph-mon-discovery.ceph.svc.cluster.local
Address: 10.142.0.2

infact whole ceph cluster looks like this :

root@ceph-mon1:~# kubectl get po -n ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE
ceph-bootstrap-rcjqn 0/1 CrashLoopBackOff 434 1d 192.168.108.17 ceph2.c.kube5s-199510.internal
ceph-cephfs-provisioner-56cd9948c5-rh2sf 0/1 Init:0/1 0 1d 192.168.193.209 ceph1.c.kube5s-199510.internal
ceph-cephfs-provisioner-56cd9948c5-snqmr 0/1 Init:0/1 0 1d 192.168.108.19 ceph2.c.kube5s-199510.internal
ceph-mds-679f98dd45-w99x4 0/1 Init:0/2 0 1d 192.168.108.15 ceph2.c.kube5s-199510.internal
ceph-mgr-7c66bd658-wbjtx 0/1 CrashLoopBackOff 448 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-mon-9fgt8 0/1 Running 1 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal
ceph-mon-check-74b98c966b-vt9wr 1/1 Running 0 1d 192.168.193.205 ceph1.c.kube5s-199510.internal
ceph-mon-vnfd8 0/1 CrashLoopBackOff 201 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-mon-vxgw9 0/1 CrashLoopBackOff 202 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-5czl6 0/1 CrashLoopBackOff 433 1d 10.142.0.3 ceph2.c.kube5s-199510.internal
ceph-osd-default-83945928-9psxt 0/1 CrashLoopBackOff 432 1d 10.142.0.2 ceph1.c.kube5s-199510.internal
ceph-osd-default-83945928-kg5t6 1/1 Running 0 1d 10.142.0.5 ceph-mon1.c.kube5s-199510.internal
ceph-rbd-pool-qzwr6 0/1 CrashLoopBackOff 409 1d 192.168.108.21 ceph2.c.kube5s-199510.internal
ceph-rbd-provisioner-69c59fb6f6-22nfc 0/1 Init:0/1 0 1d 192.168.193.210 ceph1.c.kube5s-199510.internal
ceph-rbd-provisioner-69c59fb6f6-kcb8f 0/1 Init:0/1 0 1d 192.168.108.16 ceph2.c.kube5s-199510.internal
ceph-rgw-85d66f9658-84rw4 0/1 Init:0/3 0 1d 192.168.193.206 ceph1.c.kube5s-199510.internal

summary: - ceph osd fails on worker nodes not on master
+ ceph osd fails on worker nodes not on master-server name not found:
+ ceph-mon-discovery.ceph.svc.cluster.local
Revision history for this message
chinasubbareddy mallavarapu (chinasubbareddy) wrote :

issue got solved , this is the issue in my network , i have this setup on gcloud and i had to write one firewall rule to connect between nodes .

GCE blocks traffic between hosts by default; run the following command to allow Calico traffic to flow between containers on different hosts (where the source-ranges parameter assumes you have created your project with the default GCE network parameters - modify the address range if yours is different):

gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:

gcloud compute firewall-rules list

Revision history for this message
chinasubbareddy mallavarapu (chinasubbareddy) wrote :

converting this to a question as this is the problem with environment which is running on top of gcloud.

Changed in openstack-helm:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.