EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: PV CAS Gateway Problem
From: "Allison, Stephanie" <[email protected]>
To: "'[email protected]'" <[email protected]>
Date: Thu, 17 Jan 2002 09:44:06 -0800
Greetings to PV gateway users,

Since mid-November, one of our gateways hangs/freezes every now and then.
In the last few days, I've been able to capture some more detail about it
and would like to summarize here in case somebody else has already found
and fixed the problem and would like to share.  

We run gateway version 1.2.0 and EPICS 3.13.2 on solaris 8.  We've
been running 1.2.0 for over a year with no problems.  We use the default
values of the timeouts: connection (1sec), inactive (2min), and dead (2hrs).
Though we run 4 gateway processes, each with a different interface address, 
the problem has been identified in just the one that's most heavily used.
I would normally expect less than 10 clients (mostly running DM or DM2K)
and less than 10 servers though eight of the servers (old niCpu030 iocs) are 
quite loaded.  In early November, our solaris host was upgraded to a much 
faster machine.  This may be a coincidence.  We may also be running more 
clients than ever.

A few days ago, I noticed these statistics on the gateway (I haven't
added these PVs to the archiver yet, shame on me!):

  Active PVs = 7621606  !!!!!
  Alive  PVs = 1051
  Total  VCs = 418      (does VC mean virtual channel?)
  Total  PVs = 1051
  Client Event Rate = 39
  Client Post  Rate = 33
  Exist  Test  Rate = 0
  Loop         Rate = 909
  Server Event Rate = 59
  Server Post  Rate = 59

I noticed these errors flooding the gateway log file:

      PV not found - Server unable to create a new PV

The only place that would log such an error would be casStrmClient.cc.
It appears to indicate either a memory allocation problem or a pointer 
corruption.  I expect the problem to be a memory leak in the gateway code.

Restarting the gateway (using gateway.restart) fixes the problem.
And the gateway will then run for days or weeks with no problems.

Normally, #active PVs = #total VCs and #alive PVs = #total PVs.  So
the abnormally large number of active PVs is a clue.  I've tried (!) 
to follow the creation and deletion of PVs and VCs in the gateway code.  
I will continue studying it.  I will also recompile and run the gateway 
with debug lines on.  

Any comments would be appreciated.  Thanks very much,

Stephanie Allison  [email protected]


Replies:
Re: PV CAS Gateway Problem Ralph . Lange

Navigate by Date:
Prev: Record Reference Manual / calcout record description Benjamin Franksen
Next: betrothed to VAXC? Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Record Reference Manual / calcout record description Benjamin Franksen
Next: Re: PV CAS Gateway Problem Ralph . Lange
Index: 1994  1995  1996  1997  1998  1999  2000  2001  <20022003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024