ceph osd fails on worker nodes not on master-server name not found: ceph-mon-discovery.ceph.svc.cluster.local
This bug report was converted into a question: question #668055: ceph osd fails on worker nodes not on master-server name not found: ceph-mon-discovery.ceph.svc.cluster.local.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openstack-helm |
Invalid
|
Undecided
|
Unassigned |
Bug Description
when setting up multinode , ceph osd pods getting failed on worker node not on master node .
root@ceph-mon1:~# kubectl get po -n ceph -o wide |grep osd
ceph-osd-
ceph-osd-
ceph-osd-
here are logs from them :
root@ceph-mon1:~# kubectl logs ceph-osd-
LAUNCHING OSD: in directory:directory mode
+ echo 'LAUNCHING OSD: in directory:directory mode'
+ exec /tmp/osd-
+ export LC_ALL=C
+ LC_ALL=C
+ : ceph2
+ : 'root=default host=ceph2'
+ : /var/lib/
+ : /var/lib/
+ : /var/lib/
+ is_available rpm
+ command -v rpm
+ is_available dpkg
+ command -v dpkg
+ OS_VENDOR=ubuntu
+ source /etc/default/ceph
++ TCMALLOC_
++ ceph -v
++ egrep -q '12.2|luminous'
++ echo 0
+ [[ 0 -ne 0 ]]
+ [[ ! -d /var/lib/ceph/osd ]]
+ '[' -z ceph2 ']'
++ find /var/lib/ceph/osd -prune -empty
+ [[ -n /var/lib/ceph/osd ]]
+ echo 'Creating osd'
Creating osd
++ uuidgen
+ UUID=5f7a4e0d-
++ ceph-authtool --gen-print-key
+ OSD_SECRET=
++ echo '{"cephx_secret": "AQCQJtdaO4gCNR
++ ceph osd new 5f7a4e0d-
unable to parse addrs in 'ceph-mon-
InvalidArgument
+ OSD_ID='server name not found: ceph-mon-
we could resolve the name successfully , not sure why it is failing .
root@ceph-mon1:~# nslookup ceph-mon-
Server: 10.96.0.10
Address: 10.96.0.10#53
Non-authoritative answer:
Name: ceph-mon-
Address: 10.142.0.5
Name: ceph-mon-
Address: 10.142.0.3
Name: ceph-mon-
Address: 10.142.0.2
infact whole ceph cluster looks like this :
root@ceph-mon1:~# kubectl get po -n ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE
ceph-bootstrap-
ceph-cephfs-
ceph-cephfs-
ceph-mds-
ceph-mgr-
ceph-mon-9fgt8 0/1 Running 1 1d 10.142.0.5 ceph-mon1.
ceph-mon-
ceph-mon-vnfd8 0/1 CrashLoopBackOff 201 1d 10.142.0.2 ceph1.c.
ceph-mon-vxgw9 0/1 CrashLoopBackOff 202 1d 10.142.0.3 ceph2.c.
ceph-osd-
ceph-osd-
ceph-osd-
ceph-rbd-pool-qzwr6 0/1 CrashLoopBackOff 409 1d 192.168.108.21 ceph2.c.
ceph-rbd-
ceph-rbd-
ceph-rgw-
summary: |
- ceph osd fails on worker nodes not on master + ceph osd fails on worker nodes not on master-server name not found: + ceph-mon-discovery.ceph.svc.cluster.local |
issue got solved , this is the issue in my network , i have this setup on gcloud and i had to write one firewall rule to connect between nodes .
GCE blocks traffic between hosts by default; run the following command to allow Calico traffic to flow between containers on different hosts (where the source-ranges parameter assumes you have created your project with the default GCE network parameters - modify the address range if yours is different):
gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:
gcloud compute firewall-rules list