g+
g+ Communities
Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  Index 2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014 
<== Date ==> <== Thread ==>

Subject: Re: Channel Access PV Gateway Server Problems (more investigation)
From: Ralph Lange <Ralph.Lange@mail.bessy.de>
To: "Ernest L. Williams Jr." <ernesto@ornl.gov>
Cc: kvs@aps.anl.gov, Carl Lionberger <lionbergerca@ornl.gov>, "David H. Thompson" <thompsondh@ornl.gov>, ernesto@sns.gov, nypaverdj@ornl.gov, johill@lanl.gov, evans@aps.anl.gov, hicksse@ornl.gov, Greg Lawson <lawsongs@ornl.gov>, kasemir@lanl.gov, mrk@aps.anl.gov, Jeff Hill <johill@lanl.gov>, Core-Talk Archive <core-talk@aps.anl.gov>
Date: Fri, 24 Jan 2003 12:12:08 +0100
Hi Ernest,

please include Jeff on the list as this seems to touch his field of
expertise. And core-talk so we have this thread archived.

  > ---------------------------------------------------------------------
  > Our PV gateway is running on a "stand-alone" Linux Box and has 6 network
  > interfaces:

  > eth4: 160.91.228.17 (External Interface configured as CA Server)

  > eth0: 172.31.72.152 (Internal interface configured as CA client)
  > eth1: 172.31.124.102 (Internal interface configured as CA client)
  > eth2: 172.31.80.110 (Internal interface configured as CA client)

  > eth5: 172.31.96.104 (Internal interface to isolate server tape backups)
  > -------------------------------------------------------------------------
  > The only network interface that is configured to have a GATEWAY is eth4.
  > The gateway for eth4 is 160.91.228.1



  > So let's sit on cf-ics-srv1 (i.e. 172.31.80.100)
  > Next tell Channel Access Client where to find/search for PVs
  > -----------------------------------------------------------
  > export EPICS_CA_ADDR_LIST=160.91.228.17
  > export EPICS_CA_AUTO_ADDR_LIST=NO
  > ----------------------------------------------------------
  > Now, here's what happens when we ask for a channel:
  > [williams@cf-srv1 williams]$ caget RFQ_Vac:CVG_5:P
  > RFQ_Vac:CVG_5:P               766.667
  > [williams@cf-srv1 williams]$ caget RFQ_Vac:CVG_5:P
  > RFQ_Vac:CVG_5:P               763.333

If correctly configured, 160.91.228.17 should _not_ be visible from this
machine. There seems to be a route from cf-ics-srv1 that reaches the
Gateway machine on its eth4 interface.

Try pinging the Gateway machine from cf-ics-srv1. If you can reach
160.91.228.17, you have an IP configuration problem here. Either
cf-ics-srv1 itself or the router its default route entry points to are
allowing connections that they shouldn't.

What is the defaut gateway entry on cf-ics-srv1?

You are right: If you can reach 160.91.228.17 from your
172.31.80. network - your so-called "internal" network obviously isn't.

If the Gateway server is not pingable, there might be a configuration
problem on the Gateway server so that at some level it misses the
distinction between the network interfaces.

NB: A default gateway routing entry is always per host, not per
interface. So the statement "The gateway for eth4 is 160.91.228.1" does
not make sense.

Bottom line: IMHO you have an IP routing configuration problem, not a
Gateway configuration problem.

  > Next test, 
  > Once again sit on 172.31.80.100
  > This time we will not use unicast but broadcast on the subnet
  > that we sit on:
  > ----------------------------------------------------------
  > export EPICS_CA_ADDR_LIST=""
  > export EPICS_CA_AUTO_ADDR_LIST=YES

  > Now, here's what happens when we ask for a channel:
  > [williams@cf-srv1 williams]$ caget RFQ_Vac:CVG_5:P
  > CA.Client.Diagnostic..............................................
  >     Message: "Ambiguous channel host (multiple IOC's have a channel by
  > that name)"
  >     Severity: "Warning" Context: "Channel: RFQ_Vac:CVG_5:P Accepted:
  > ics-srv-cagate1-cf.ics.sns.gov Rejected:
  > ics-srv-cagate1-bkup.ics.sns.gov:5064 "
  >     Source File: ../service.c Line Number: 905
  > ..................................................................
  > RFQ_Vac:CVG_5:P               763.333
  > --------------------------------------------------------------------

  > Note: that ics-srv-cagate1-cf.ics.sns.gov  =  172.31.80.110
  > Note: that ics-srv-cagate1-bkup.ics.sns.gov  = 172.31.96.104

Ehhm ... where is the original server for this channel?
So - are both of these machines that are answering wrong? Or only one?

Why can two hosts on different networks answer the same broadcast? What
are your subnet masks on these nets?

I can only think of two reasons for getting answer to one broadcast from
two networks.
 - You have configured a router to forward broadcasts in one subnet to a
   list of IPs in another subnet (-> redirected broadcast; some cisco
   routers allow that technique).
 - You have a misconfiguration in default gateway and/or subnet entries
   on one or more of the machines.

  > Let's perform a tcpdump on the PV gateway server to look at the traffic
  > when a request is made for a PV from a CA client with an internal
  > address.
  > ----------------------------------------------------------------------------
  > [root@ics-srv-cagate1 root]# tcpdump -i eth2 -lnep host 172.31.80.100
  > tcpdump: listening on eth2
  > 21:07:05.628132 0:6:5b:ec:8c:50 Broadcast ip 74: 172.31.80.100.32863 >
  > 172.31.83.255.5064:  udp 32 (DF)
  > 21:07:05.629018 0:e0:b6:4:71:8c Broadcast arp 42: arp who-has
  > 172.31.80.100 tell 172.31.80.110
  > 21:07:05.629188 0:6:5b:ec:8c:50 0:e0:b6:4:71:8c arp 60: arp reply
  > 172.31.80.100 is-at 0:6:5b:ec:8c:50
  > 21:07:05.629198 0:e0:b6:4:71:8c 0:6:5b:ec:8c:50 ip 66:
  > 172.31.80.110.5064 > 172.31.80.100.32863:  udp 24 (DF)
  > 21:07:05.631214 0:6:5b:ec:8c:50 0:e0:b6:4:71:8c ip 74:
  > 172.31.80.100.33319 > 172.31.80.110.5064: S 177884468:177884468(0) win
  > 5840 <mss 1460,sackOK,timestamp 771949911 0,nop,wscale 0> (DF)
  > 21:07:05.631264 0:e0:b6:4:71:8c 0:6:5b:ec:8c:50 ip 74:
  > 172.31.80.110.5064 > 172.31.80.100.33319: S 192559080:192559080(0) ack
  > 177884469 win 5792 <mss 1460,sackOK,timestamp 331098
  > 771949911,nop,wscale 0> (DF)
  > 21:07:05.631446 0:6:5b:ec:8c:50 0:e0:b6:4:71:8c ip 66:
  > 172.31.80.100.33319 > 172.31.80.110.5064: . ack 1 win 5840
  > <nop,nop,timestamp 771949911 331098> (DF)
  > 21:07:05.632229 0:6:5b:ec:8c:50 0:e0:b6:4:71:8c ip 170:
  > 172.31.80.100.33319 > 172.31.80.110.5064: P 1:105(104) ack 1 win 5840
  > <nop,nop,timestamp 771949911 331098> (DF)
  > 21:07:05.632240 0:e0:b6:4:71:8c 0:6:5b:ec:8c:50 ip 66:
  > 172.31.80.110.5064 > 172.31.80.100.33319: . ack 105 win 5792
  > <nop,nop,timestamp 331098 771949911> (DF)
  > -----------------------------------------------------------------------------

  > This is very interesting, right?  This is not what I want to happen.
  > I don't want traffic destined for 160.91.228.17 routed through
  > 172.31.80.110 (i.e. eth2) or vice versa.

I don't get you here. Where is 160.91.228.17 in the above printout? What
traffic is destined for 160.91.228.17? I thought this was the
misbehaviour?

  > Another problem we had was due to the gateway.killer not working
  > properly.  This would result in two gateway servers starting with the
  > second server unnoticed.  The only way that we knew another server was
  > running, resulted from looking at our firewall logs. The second gateway
  > server would choose a high port number since 5064 was consumed by the
  > first server.  The end result was that external clients were actually
  > trying to talk to the second CA server.  The second CA server was being
  > blocked by our firewall.  So, the firewall was doing its job.  After a
  > reboot, the Second server was gone and everything returned to normal. 
  > Ken did you guys experience this problem on your Solaris system?

I don't understand this at all. You are probably misinterpreting things
here.

If a second Gateway starts and detects the port is taken, it prints an
error message and exits. There is no code in CA that would allow it to
take a higher port.

How should external clients from the outside try to connect to a high
port number? There is again no code in CA that allows this. How should
the client know the high number to connect to? The CA network protocol
doesn't allow this kind of negotiation. Any CA client _always_ connects
to 5064 (or another port if configured accordingly). High port numbers
belong to CA _clients_. I.e. a running Gateway would take port 5064 on
its server side and use a bunch of high port numbers on all its client
side connections.

Cheers,
Ralph

Replies:
RE: Channel Access PV Gateway Server Problems (more investigation) Jeff Hill

Navigate by Date:
Prev: RE: MEDM 3.13 vs. 3.14 Jeff Hill
Next: RE: Channel Access PV Gateway Server Problems (more investigation) Jeff Hill
Index: 2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014 
Navigate by Thread:
Prev: RE: MEDM 3.13 vs. 3.14 Jeff Hill
Next: RE: Channel Access PV Gateway Server Problems (more investigation) Jeff Hill
Index: 2002  <20032004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014 
ANJ, 02 Feb 2012 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICSv4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·