Comment 3 for bug 1270651

Revision history for this message
Brian Aker (brianaker) wrote : Re: [Bug 1270651] Re: [patch included] libmemcached exhibits odd behaviour on TCP connection close

Investigating

On Jan 19, 2014, at 3:14 PM, Rob Day <email address hidden> wrote:

> https://bugs.launchpad.net/libmemcached/+bug/1251482 seems to be fixing
> a similar issue, but I think my fix has the advantage that if a server
> is genuinely down (e.g. the attempt to reconnect over TCP fails),
> backoff logic will still kick in and protect it, rather than being
> bypassed entirely.
>
> --
> You received this bug notification because you are subscribed to
> libmemcached.
> https://bugs.launchpad.net/bugs/1270651
>
> Title:
> [patch included] libmemcached exhibits odd behaviour on TCP connection
> close
>
> Status in libmemcached - A C and C++ client library for memcached:
> New
>
> Bug description:
> I created a libmemcached client connection to a memcached server over
> TCP. This connection was long-lived and infrequently used. At some
> point, my memcached server was restarted, which caused it to close the
> libmemcached connection with a TCP FIN, which was ACKed.
>
> A short while later (when the memcached server was back up and
> running), I tried to send a request over the libmemcached client
> connection. libmemcached tried to use the previous connection that had
> been closed with a FIN, which resulted in a TCP RST from the other
> end. This failure caused libmemcached to set a backoff timeout on the
> connection, so that even though the TCP FIN had arrived several
> minutes ago, the memcached server was running again and the
> reconnection attempt succeeded, libmemcached wouldn't let me make any
> more requests on that connection for a short while.
>
> I also noticed that the behaviour varied depending on what type of
> request I was making - if I tried to store data in memcached, the
> error code was "MEMCACHED_CONNECTION_FAILED" (which was useful as I
> could simply notice that and retry the request) but when retrieving
> data from memcached the error code was "MEMCACHED_NOT_FOUND" which was
> much less useful.
>
> The attached patch (against the v1.0 trunk, though I was using libmemcached 1.0.10 when I saw this behaviour) fixes both these issues:
> * for TCP connections, an failure to send data will never set a backoff timeout on the server - only a failure to connect will do that. This protects against the case where a remote server drops the connection, but this is not noticed for a short while, during which the server recovers. The request to send data will still fail, but if the request is retried, libmemcached will successfully re-establish the connection and send the data rather than rejecting the request due to backoff logic. (The behaviour for UDP is unchanged.)
>
> * for get/gets requests, if there is no MEMCACHED_SUCCESS result and
> one or more servers returned a MEMCACHED_CONNECTION_FAILED error, the
> return code will be MEMCACHED_CONNECTION_FAILED rather than
> MEMCACHED_NOT_FOUND. This lets the client make informed decisions
> about whether to retry.
>
> I'm afraid my editor has stripped trailing spaces in a couple of
> places - hopefully that's not too confusing.
>
> I have permission from my employer to contribute this back under
> libmemcached's 3-clause BSD license.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/libmemcached/+bug/1270651/+subscriptions