otherwise live instance goes deaf connection refused

Bug #587340 reported by Walt Corey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
eucalyptus (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I had an instance up and running for hours, quite successfully. I left it alone for awhile and when I went back to that cmd session the window was dead, as when you shell into another machine and drop the vpn. In that case the entire window is unresponsive.

I tried opening another session and logging in from there. no good.
I tried rebooting the instance and then logging back in. no good. I do not believe the instance rebooted at all.
I did a describe-instances and it showed as running.
I did a terminate-instance and it shutdown.
I restarted it and still can not get in. To my knowledge nothing happened at the cluster.

No matter what I do I receive, after a suitable delay....

ssh: connect to host 192.168.3.100 port 22: Connection refused

I tried the private ip address with the same results.

I suspect that isn't enough for anyone to diagnose what happened, unless it is a known bug.

Where should I look for the answer, and better yet, what would prevent that in the future. It had been running fine. I had a mysql as well as tomcat 6 tasksel up and running on it. I had copied over a complete over 2M row database and was accessing it through the instance, until it went silent.

Thanks,

Walt

Revision history for this message
Walt Corey (waltc) wrote :
Download full text (6.8 KiB)

There is more history and documentation in Question #112157

Effectively I am running the cluster controller of a managednvlan UEC on an instance of Ubuntu 10.4 Desktop. What I had noticed, over time is running instances would go deaf, often times with what looks like valid IP addresses listed in the describe-instances output. Both the public and private IP addresses would be unavailable.
What's odd is that in the case of an intervening vpn session after the vpn session was closed, the ip endpoints to that cloud instance were removed.

Even if there were a separate dedicated cc would not one lose connectivity from the client machine? My existing (and previously existing) network was the 192.168.0.xxx served by my wireless router. The segment reserved for the cloud instances public IP address was 192.168.3.0->3.50 or some such limited range.

What is avahi and why is it withdrawing the endpoint IPs to the cloud instance?

More information...consider this:
May 28 18:26:19 cor720 NetworkManager: <info> Maximum Segment Size (MSS): 0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 10.0.0.0/8 Next Hop: 10.0.0.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 192.168.251.0/24 Next Hop: 192.168.251.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 192.168.22.0/24 Next Hop: 192.168.22.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 192.168.23.0/24 Next Hop: 192.168.23.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 192.168.24.0/24 Next Hop: 192.168.24.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 63.131.134.0/24 Next Hop: 63.131.134.0
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 208.111.81.157/32 Next Hop: 208.111.81.157
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 208.111.81.159/32 Next Hop: 208.111.81.159
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 72.20.25.16/32 Next Hop: 72.20.25.16
May 28 18:26:19 cor720 NetworkManager: <info> Static Route: 209.249.222.54/32 Next Hop: 209.249.222.54
May 28 18:26:19 cor720 NetworkManager: <info> Internal IP4 DNS: 10.50.33.21
May 28 18:26:19 cor720 NetworkManager: <info> Internal IP4 DNS: 10.5.4.1
May 28 18:26:19 cor720 NetworkManager: <info> DNS Domain: 'na.global.ad'
May 28 18:26:19 cor720 NetworkManager: <info> Login Banner:
May 28 18:26:19 cor720 NetworkManager: <info> -----------------------------------------
May 28 18:26:19 cor720 NetworkManager: <info> -----------------------------------------
May 28 18:26:19 cor720 vpnc[3537]: can't open pidfile /var/run/vpnc/pid for writing
May 28 18:26:20 cor720 NetworkManager: <info> VPN connection 'Monster (Maynard)' (IP Config Get) complete.
May 28 18:26:20 cor720 NetworkManager: <info> Policy set 'Monster (Maynard)' (tun0) as default for routing and DNS.
May 28 18:26:20 cor720 vmnetBridge: RTM_NEWROUTE: index:5
May 28 18:26:20 cor720 NetworkManager: <info> VPN plugin state changed: 4
May 28 18:26:20 cor720 nm-dispatcher.action: Script '/etc/NetworkManager/dispatcher.d/01ifupdown' exited with error status 1.
May 28 18:27:36 cor720 dhcpd: DHCPREQUEST for 172.19.1.2 from d0:0d:30:cf:06:f7 via eth0
May 28 18:27:36 cor720 dhcpd: DHCPACK on 172.19.1...

Read more...

Revision history for this message
Thierry Carrez (ttx) wrote :

Which version of the eucalyptus packages are you running ?
You mention running the CC on a regular 10.04 desktop, could you give us details of your networking setup ?

Changed in eucalyptus (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Walt Corey (waltc) wrote :
Download full text (4.2 KiB)

I believe it is 1.6.2

I followed https://help.ubuntu.com/community/UEC/PackageInstall to a tee.
The only variation was rather than installing on a 10.4 image of Ubuntu Server I installed it on Ubuntu Desktop at 10.4.

In order to insure the public addresses given the install script of the cluster components did not interfere with anything previously in existence I gave the ip address range 192.168.3.100-192.168.3.130 or maybe it was 150, I think it was 130 though, so there would be 30 available public IPs.

I do have a VPNC configured but I do not know what magic it does with IP addresses.

My Netgear router is on 192.168.0.2->192.168.0.254, the netmask is 255.255.0.0

the output from ifconfig is:

eth0 Link encap:Ethernet HWaddr 00:1e:4f:b4:e2:b5
          inet addr:192.168.0.6 Bcast:192.168.255.255 Mask:255.255.0.0
          inet6 addr: fe80::21e:4fff:feb4:e2b5/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:3928276 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4744355 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2479644119 (2.4 GB) TX bytes:3283396517 (3.2 GB)
          Interrupt:17

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:23077137 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23077137 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:16565722660 (16.5 GB) TX bytes:16565722660 (16.5 GB)

vmnet1 Link encap:Ethernet HWaddr 00:50:56:c0:00:01
          inet addr:192.168.5.1 Bcast:192.168.5.255 Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12230 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

vmnet8 Link encap:Ethernet HWaddr 00:50:56:c0:00:08
          inet addr:172.16.207.1 Bcast:172.16.207.255 Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12239 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

vmnet1 and 8 is, apparently, from VMWare Workstation which I use for a Windows XP image.

According to the instructions for the aforementioned installation from packages, the range I gave UEC was a unused, isolated range.

The UEC dhcp.conf is:
walt@cor720:~$ cat /var/run/eucalyptus/net/euca-dhcp.conf
# automatically generated config file for DHCP server
default-lease-time 1200;
max-lease-time 1200;
ddns-update-style none;

shared-network euca {
subnet 172.19.1.0 netmask 255.255.255.224 {
  option subnet-mask 255.255.255.224;
  option broadcast-address 172.19.1.31;
  option domain-name-servers 19...

Read more...

Revision history for this message
Walt Corey (waltc) wrote :

OK, you asked for more information. I gave you the requested information. Could this bug be placed in a status where it won't self destruct if nobody picks it up in 53 days?

Thanks

Revision history for this message
Walt Corey (waltc) wrote :

What I discovered, which means there are several paths to this problem, is the private address, in this case, remained valid. I can connect via the private address.

I tried to connect then release with the intent of connect the original ip address of 192.168.3.100 but in both cases got the error "Address: Permission denied while trying to release address: 192.168.3.100"

The same error occurred when prefixed by sudo.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for eucalyptus (Ubuntu) because there has been no activity for 60 days.]

Changed in eucalyptus (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.