Tuesday, February 13, 2018

Analyze broken pipe error in Hive client via tcp dump

         Analyze broken pipe error in Hive client via tcp dump



I recently had a strange issue where hive job would randomly fail for specific source data pipeline with Broken pipe exception,  I didn't see any exception in Metastore logs to indicate why the connection was broken ( Nor any issue in System resource being an issue  ).

So to investigate the issue i took TCP dumps from client node and Metastore node.

client node : tcpdump -i <any or eth0> host <HS meta IP> and port <HS meta Port> -w client.pcap
metastore node :  tcpdump -i <any or eth0> host <Client IP>  -w meta.pcap

Now viewing pcap file wireshark shows connection was terminated gracefully and new connection was only created after ~7 mins difference .




From the first 3 lines we can see client initiated Fin to terminate the session with server on port 11000 and server accepted to terminate the connection gracefully, after which client did acknowledge breaking the connection .


  • host B (Server) sends a data packet to host A (Client)
  • and then host A wants to close the connection.
  • Host A (depending on timing) can respond with [FIN,ACK] indicating that it received the sent packet and wants to close the session.
  • Host B should then respond with a [FIN,ACK] indicating that it received the termination request (the ACK part) and that it too will close the connection (the FIN part).

  • Note : -  

    [ACK] is the acknowledgement that the previously sent data packet was received.
    [FIN] is sent by a host when it wants to terminate the connection; the TCP protocol requires both endpoints to send the termination request (i.e. FIN).

     [PSH] Makes this packet a PUSH packet. In normal flow receiver will not acknowledge each packet after receiving. Receiver will keep the data it get received in a buffer for some time until it gives to the application. PUSH packet will tell the receiver to give the data to the application immediately and then it will acknowledge.

    [RST] Reset the connection. One particular example of sending the RST packet would be in response for a packet received for a closed socket.

    [SYN]: Start the connection, synchronize the sequence numbers. First packet from each end will only have this flag is set.

    Interestingly from client logs the broken pipe exception was seen at 15:04 which means the client code tried to get info from metastore when the connection between client and metastore was already closed .

    FIX : To fix client code was updated/fixed to check for active connection before any calls were made to metastore .


    Note :  Just to understand TCP 3 way handshake which is initiated in earlier screenshot at "2018-02-08 15:07:57" shows 3 way handshake . 

    (#1) Client will send a packet with SYN flag is set and random number(R1) included in the sequence number field.
    (#2) Server will send a packet with SYN flag and ACK flags are set. sequence number field will contain a new random number(R2) and acknowledgement number field will contain clients sequence number +1 (R1+1).(Which is the next sequence number server is expecting from the client)

    (#3) Client will acknowledge servers SYN packet by sending a packet with ACK flag is set and acknowledge number field with R2+1. (Which is the next sequence number client is expecting from the server)



    No comments:

    Post a Comment