Experimental Physics and
Industrial Control System

zhang hao <[email protected]> · Fri, 17 Dec 1999 15:19:15 +0800 (CST)

---------- Forwarded message ----------
Date: Tue, 14 Dec 1999 10:20:55 -0700
From: Jeff Hill <[email protected]>
To: Garrett D. Rinehart <[email protected]>, [email protected]
Subject: RE: CA monitors...

> 
> Then I put everything back on the thinwire.
> 
> iocRCS_2> ifShow
> ei (unit number 0):
>      Flags: (0x63) UP BROADCAST ARP RUNNING 
>      Internet address: 164.54.250.4
>      Broadcast address: 164.54.251.255
>      Netmask 0xffff0000 Subnetmask 0xfffffe00
>      Ethernet address is 08:00:3e:29:5e:af
>      Metric is 0
>      Maximum Transfer Unit size is 1500
>      81109763 packets received; 215219682 packets sent
>      19888 input errors; 88009 output errors
>      4431000 collisions <========================================

The root cause of your problem appears to be that you have a 
large number of collisions on your thinwire Ethernet network. This 
is possibly because there are wiring problems or it could also be 
that your network load is high. With Ethernet there is an exponentially
increasing back off delay associated with collisions. So, if this network
interface card collides several times while trying to transmit the 
same messages, the delays can be quite large. 

This URL has a nice explanation of Ethernet wiring issues I think.

http://wwwhost.ots.utexas.edu/ethernet/ethernet-home.html

> 
> > Was the sniffer attached to the thinwire segment of the network, or
> > was it attached to a different tap off of the switch? Due to the
> > nature of switches, sniffers will generally not see traffic
> > (broadcasts are an exception) on other parts of the switched network.
>  
> Yes, it was attached to the thinwire segment all the iocs were sharing.
> 

It should have seen the collisions if your thin wire network is based
only on hubs and repeaters (not switches). Perhaps your quiescent load is 
low, but there are periodic burst of saturation load which are synchronized
by your timing system. Our sniffer is able to report when stations are
experiencing an excessive number of collision retries, and it was 
complaining about this until we placed our new Solaris OPI hosts on
the other side of a switch from our original cabletron hub based Ethernet 
network.

Note that the typical maximum load on 10Mbps Ethernet is about 20-30% of 
10Mbps if switches are not in use.

> Intermittently, "dbel" showed channels falling behind and "tt" 
> had much different info:
> iocRCS_2> dbel "Qacc"
>  VAL VALUE LOG ALARM
>  VAL VALUE LOG ALARM
> List of events (monitors).
> task 1c95a34 select 5 pfield 1f6eaf4 behind by 0
> task 1db54d8 select 5 pfield 1f6eaf4 behind by 8

This indicates that the CA server has monitor updates in its event queue 
waiting until the TCP virtual circuit will accept additional bytes.

> value = 0 = 0x0
> iocRCS_2> tt 0x1db54d8
>  67dde _vxTaskEntry   +10 : _event_task (1e22b74, 0, 0, 0, 0, 0, 
> 0, 0, 0, 0)
> 1ec6450 _event_task    +a2 : 1ec65c6 (1d86fb0)
> 1ec6644 _event_task    +296: 1ed2818 (1d9b918, 1da802c, 0)
> 1ed29a8 _write_notify_reply+c02: _cas_send_msg (1df4cd4, 0)
> 1ecfb32 _cas_send_msg  +aa : _sendto (1e, 1df4d24, 140, 0, 1df4d14, 10)
>  39f10 _sendto        +42 : _bsdSendto ([1e, 1df4d24, 140, 0, 1df4d14])
>  3a7f6 _bsdSendto     +9e : _sosend ([1d19a1c, 1d1a980, 1df4d24, 
> 8, 1db53a0])
>  74110 _sosend        +1b2: _sbwait (1d19aac)
>  75404 _sbwait        +10 : _semQPut ([1d19acc, 1db5370, 74116, 
> 1d19aac, 0])
> value = 0 = 0x0
> iocRCS_2>
> 
> It should be noted that I NEVER saw this kind of response with 
> the other hookup.

This shows that the CA server's event task is blocking until the 
TCP circuit will accept additional bytes.

> 
> "inetstatShow" also had one task listed that would run up to 
> 16384 in the sendQ
> and seem to hold it for quite a while (relatively) before dumping 
> it. Then the 
> count would shoot right back up there again.

This indicates that 16384 bytes of mbufs in the IP kernel are retaining
CA protocol which has been delivered to a TCP "socket", but the thinwire 
Ethernet isn't ready to accept it.

> 
> The output from "casr" indicated several seconds "since last 
> send" to the display ioc and the OPI workstation. Neither should 
> have been over 1/30th of a second as
> that is the data sampling rate and it changes EVERY time.

This indicates that the CA event task is blocking somewhere (in
this case we see that it is blocking waiting for the TCP virtual
circuit to accept additional bytes).

> 
> These are typical results from mbufShow, ifShow, and tcpstatShow:
> iocRCS_2> mbufShow
> type        number
> ---------   ------
> FREE    :    867
> DATA    :    114
> HEADER  :     19
> SOCKET  :      0
> PCB     :     55
> RTABLE  :      2
> HTABLE  :      0
> ATABLE  :      0
> SONAME  :      1
> ZOMBIE  :      0
> SOOPTS  :      0
> FTABLE  :      0
> RIGHTS  :      0
> IFADDR  :      2
> TOTAL   :    1060
> number of mbufs: 1060
> number of clusters: 39
> number of interface pages: 0
> number of free clusters: 32
> number of times failed to find space: 0
> number of times waited for space: 0
> number of times drained protocols for space: 0
> value = 47 = 0x2f = '/'

The number of mbufs in the system has grown, but there
appears to be a reasonable number of free mbufs. Perhaps
the mbuf count has grown because at certain points in time
there is quite a bit of data waiting in the IP kernel until 
it can be placed on the wire.

Jeff

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System