Class FD_SOCK

  • All Implemented Interfaces:
    java.lang.Runnable

    public class FD_SOCK
    extends Protocol
    implements java.lang.Runnable
    Failure detection protocol based on sockets. Failure detection is ring-based. Each member creates a server socket and announces its address together with the server socket's address in a multicast.

    A pinger thread will be started when the membership goes above 1 and will be stopped when it drops below 2. The pinger thread connects to its neighbor on the right and waits until the socket is closed. When the socket is closed by the monitored peer in an abnormal fashion (IOException), the neighbor will be suspected.

    The main feature of this protocol is that no ping messages need to be exchanged between any 2 peers, as failure detection relies entirely on TCP sockets. The advantage is that no activity will take place between 2 peers as long as they are alive (i.e. have their server sockets open). The disadvantage is that hung servers or crashed routers will not cause sockets to be closed, therefore they won't be detected.

    The costs involved are 2 additional threads: one that monitors the client side of the socket connection (to monitor a peer) and another one that manages the server socket. However, those threads will be idle as long as both peers are running.

    Author:
    Bela Ban May 29 2001
    • Field Detail

      • bind_addr

        protected java.net.InetAddress bind_addr
      • external_addr

        protected java.net.InetAddress external_addr
      • external_port

        protected int external_port
      • get_cache_timeout

        protected long get_cache_timeout
      • cache_max_elements

        protected int cache_max_elements
      • cache_max_age

        protected long cache_max_age
      • suspect_msg_interval

        protected long suspect_msg_interval
      • num_tries

        protected int num_tries
      • start_port

        protected int start_port
      • client_bind_port

        protected int client_bind_port
      • port_range

        protected int port_range
      • keep_alive

        protected boolean keep_alive
      • sock_conn_timeout

        protected int sock_conn_timeout
      • num_suspect_events

        protected int num_suspect_events
      • suspect_history

        protected final BoundedList<java.lang.String> suspect_history
      • members

        protected volatile java.util.List<Address> members
      • suspected_mbrs

        protected final java.util.Set<Address> suspected_mbrs
      • pingable_mbrs

        protected final java.util.List<Address> pingable_mbrs
      • srv_sock_sent

        protected volatile boolean srv_sock_sent
      • get_cache_promise

        protected final Promise<java.util.Map<Address,​IpAddress>> get_cache_promise
        Used to rendezvous on GET_CACHE and GET_CACHE_RSP
      • got_cache_from_coord

        protected volatile boolean got_cache_from_coord
      • local_addr

        protected Address local_addr
      • srv_sock

        protected java.net.ServerSocket srv_sock
      • srv_sock_addr

        protected IpAddress srv_sock_addr
      • ping_dest

        protected Address ping_dest
      • ping_sock

        protected java.net.Socket ping_sock
      • ping_input

        protected java.io.InputStream ping_input
      • pinger_thread

        protected volatile java.lang.Thread pinger_thread
      • lock

        protected final java.util.concurrent.locks.Lock lock
      • regular_sock_close

        protected volatile boolean regular_sock_close
      • shuttin_down

        protected volatile boolean shuttin_down
      • log_suspected_msgs

        protected boolean log_suspected_msgs
    • Constructor Detail

      • FD_SOCK

        public FD_SOCK()
    • Method Detail

      • getLocalAddress

        public java.lang.String getLocalAddress()
      • getMembers

        public java.lang.String getMembers()
      • getPingableMembers

        public java.lang.String getPingableMembers()
      • getSuspectedMembers

        public java.lang.String getSuspectedMembers()
      • getNumSuspectedMembers

        public int getNumSuspectedMembers()
      • getPingDest

        public java.lang.String getPingDest()
      • getNumSuspectEventsGenerated

        public int getNumSuspectEventsGenerated()
      • isNodeCrashMonitorRunning

        public boolean isNodeCrashMonitorRunning()
      • isLogSuspectedMessages

        public boolean isLogSuspectedMessages()
      • setLogSuspectedMessages

        public void setLogSuspectedMessages​(boolean log_suspected_msgs)
      • getClientBindPortActual

        public int getClientBindPortActual()
      • printSuspectHistory

        public java.lang.String printSuspectHistory()
      • printCache

        public java.lang.String printCache()
      • startNodeCrashMonitor

        public boolean startNodeCrashMonitor()
      • init

        public void init()
                  throws java.lang.Exception
        Description copied from class: Protocol
        Called after instance has been created (null constructor) and before protocol is started. Properties are already set. Other protocols are not yet connected and events cannot yet be sent.
        Overrides:
        init in class Protocol
        Throws:
        java.lang.Exception - Thrown if protocol cannot be initialized successfully. This will cause the ProtocolStack to fail, so the channel constructor will throw an exception
      • start

        public void start()
                   throws java.lang.Exception
        Description copied from class: Protocol
        This method is called on a JChannel.connect(String). Starts work. Protocols are connected and queues are ready to receive events. Will be called from bottom to top. This call will replace the START and START_OK events.
        Overrides:
        start in class Protocol
        Throws:
        java.lang.Exception - Thrown if protocol cannot be started successfully. This will cause the ProtocolStack to fail, so JChannel.connect(String) will throw an exception
      • stop

        public void stop()
        Description copied from class: Protocol
        This method is called on a JChannel.disconnect(). Stops work (e.g. by closing multicast socket). Will be called from top to bottom. This means that at the time of the method invocation the neighbor protocol below is still working. This method will replace the STOP, STOP_OK, CLEANUP and CLEANUP_OK events. The ProtocolStack guarantees that when this method is called all messages in the down queue will have been flushed
        Overrides:
        stop in class Protocol
      • up

        public java.lang.Object up​(Event evt)
        Description copied from class: Protocol
        An event was received from the protocol below. Usually the current protocol will want to examine the event type and - depending on its type - perform some computation (e.g. removing headers from a MSG event type, or updating the internal membership list when receiving a VIEW_CHANGE event). Finally the event is either a) discarded, or b) an event is sent down the stack using down_prot.down() or c) the event (or another event) is sent up the stack using up_prot.up().
        Overrides:
        up in class Protocol
      • up

        public java.lang.Object up​(Message msg)
        Description copied from class: Protocol
        A single message was received. Protocols may examine the message and do something (e.g. add a header) with it before passing it up.
        Overrides:
        up in class Protocol
      • down

        public java.lang.Object down​(Event evt)
        Description copied from class: Protocol
        An event is to be sent down the stack. A protocol may want to examine its type and perform some action on it, depending on the event's type. If the event is a message MSG, then the protocol may need to add a header to it (or do nothing at all) before sending it down the stack using down_prot.down().
        Overrides:
        down in class Protocol
      • run

        public void run()
        Runs as long as there are 2 members and more. Determines the member to be monitored and fetches its server socket address (if n/a, sends a message to obtain it). The creates a client socket and listens on it until the connection breaks. If it breaks, emits a SUSPECT message. It the connection is closed regularly, nothing happens. In both cases, a new member to be monitored will be chosen and monitoring continues (unless there are fewer than 2 members).
        Specified by:
        run in interface java.lang.Runnable
      • isPingerThreadRunning

        protected boolean isPingerThreadRunning()
      • resetPingableMembers

        protected void resetPingableMembers​(java.util.Collection<Address> new_mbrs)
      • hasPingableMembers

        protected boolean hasPingableMembers()
      • removeFromPingableMembers

        protected boolean removeFromPingableMembers​(Address mbr)
      • printPingableMembers

        protected java.lang.String printPingableMembers()
      • suspect

        protected void suspect​(java.util.Set<Address> suspects)
      • unsuspect

        protected void unsuspect​(Address mbr)
      • handleSocketClose

        protected void handleSocketClose​(java.lang.Exception ex)
      • startPingerThread

        protected boolean startPingerThread()
        Does *not* need to be synchronized on pinger_mutex because the caller (down()) already has the mutex acquired
      • interruptPingerThread

        protected void interruptPingerThread​(boolean sendTerminationSignal)
        Interrupts the pinger thread. The Thread.interrupt() method doesn't seem to work under Linux with JDK 1.3.1 (JDK 1.2.2 had no problems here), therefore we close the socket (setSoLinger has to be set !) if we are running under Linux. This should be tested under Windows. (Solaris 8 and JDK 1.3.1 definitely works).

        Oct 29 2001 (bela): completely removed Thread.interrupt(), but used socket close on all OSs. This makes this code portable and we don't have to check for OSs.

      • stopPingerThread

        protected void stopPingerThread()
      • sendPingTermination

        protected void sendPingTermination()
      • sendPingSignal

        protected void sendPingSignal​(int signal)
      • startServerSocket

        protected void startServerSocket()
                                  throws java.lang.Exception
        Throws:
        java.lang.Exception
      • stopServerSocket

        public void stopServerSocket​(boolean graceful)
      • setupPingSocket

        protected boolean setupPingSocket​(IpAddress dest)
        Creates a socket to dest, and assigns it to ping_sock. Also assigns ping_input
      • teardownPingSocket

        protected void teardownPingSocket()
      • getCacheFromCoordinator

        protected void getCacheFromCoordinator()
        Determines coordinator C. If C is null and we are the first member, return. Else loop: send GET_CACHE message to coordinator and wait for GET_CACHE_RSP response. Loop until valid response has been received.
      • broadcastSuspectMessage

        protected void broadcastSuspectMessage​(Address suspected_mbr)
        Sends a SUSPECT message to all group members. Only the coordinator (or the next member in line if the coord itself is suspected) will react to this message by installing a new view. To overcome the unreliability of the SUSPECT message (it may be lost because we are not above any retransmission layer), the following scheme is used: after sending the SUSPECT message, it is also added to the broadcast task, which will periodically re-send the SUSPECT until a view is received in which the suspected process is not a member anymore. The reason is that - at one point - either the coordinator or another participant taking over for a crashed coordinator, will react to the SUSPECT message and issue a new view, at which point the broadcast task stops.
      • broadcastUnuspectMessage

        protected void broadcastUnuspectMessage​(Address mbr)
      • sendIHaveSockMessage

        protected void sendIHaveSockMessage​(Address dst,
                                            Address mbr,
                                            IpAddress addr)
        Sends or broadcasts a I_HAVE_SOCK response. If 'dst' is null, the reponse will be broadcast, otherwise it will be unicast back to the requester
      • fetchPingAddress

        protected IpAddress fetchPingAddress​(Address mbr)
        Attempts to obtain the ping_addr first from the cache, then by unicasting q request to mbr, then by multicasting a request to all members.
      • determinePingDest

        protected Address determinePingDest()
      • unmarshal

        protected java.util.Map<Address,​IpAddress> unmarshal​(byte[] buffer,
                                                                   int offset,
                                                                   int length)
      • determineCoordinator

        protected Address determineCoordinator()
      • signalToString

        protected static java.lang.String signalToString​(int signal)