Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017 Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
<== Date ==> <== Thread ==>

Subject: Re: Archiver: Problems with disconnected PVs
From: Gabriel de Souza Fedel <gabriel.fedel@lnls.br>
To: "Shankar, Murali" <mshankar@slac.stanford.edu>, "tech-talk@aps.anl.gov" <tech-talk@aps.anl.gov>
Date: Wed, 6 Sep 2017 08:22:52 -0300


Em 05-09-2017 14:33, Shankar, Murali escreveu:
When I restart archiver "broken" PVs back to normally.
I will keep monitoring them
This is always a good idea. I have a script that uses the getCurrentlyDisconnectedPVs BPL and validates these on a path independent of whatever the archiver is using. That is, if the archiver goes thru a gateway, this monitoring script checks the liveness of the PV's directly on the VLAN.

Great idea! I will try something like this too.
Please let me know if you find any more details.
Sure, if I discover something new I will bring here.

Regards,
Murali
Thank you again,

Regards

________________________________________
From: Gabriel de Souza Fedel <gabriel.fedel@lnls.br>
Sent: Monday, September 4, 2017 5:28 AM
To: Shankar, Murali; tech-talk@aps.anl.gov
Cc: (mdavidsaver@gmail.com)
Subject: Re: Archiver: Problems with disconnected PVs

Hi,

When I restart archiver "broken" PVs back to normally. I will keep
monitoring them

Em 01-09-2017 16:43, Shankar, Murali escreveu:
Also, which version is this? Apologies if you mentioned it before but I naturally assume you have a reasonably recent version.
Our version is Oct/2016


Regards,
Murali
Regards and Thank you again for help.



________________________________________
From: Shankar, Murali
Sent: Friday, September 1, 2017 12:23 PM
To: Gabriel de Souza Fedel; tech-talk@aps.anl.gov
Cc: (mdavidsaver@gmail.com)
Subject: Re: Archiver: Problems with disconnected PVs

One thing look's strange, max_array_bytes.
This would probably affect your waveforms mostly...

and a few has problems (eventually disconnect).
Are all these PV's from one or two IOCs? In this case I would look at the IOC.

Your PV Details page for this PV could have information
There should be lines here for "When did we request CA to make a connection to this PV?" and "Time elapsed since search request (s)". Does these look ok? Since you paused/resumed the PV; this should be the time you resumed the PV.

The "Currently DisconnectedPV's" report also has some additional information; this is not always helpful but if all your live but disconnected PV's are in the same CAJ context ID then a restart may be required. Might be useful to get a stack trace of all the threads; there might be some clue here on where the search thread for that context is stuck.

I take a look, but i can't see anything strange
I think after this we get into wireshark territory (aka I'm out of ideas). You'll need to see if we are issuing search requests properly and getting proper responses etc.

Regards,
Murali
________________________________________
From: Gabriel de Souza Fedel <gabriel.fedel@lnls.br>
Sent: Friday, September 1, 2017 11:59 AM
To: Shankar, Murali; tech-talk@aps.anl.gov
Cc: (mdavidsaver@gmail.com)
Subject: Re: Archiver: Problems with disconnected PVs

Em 01-09-2017 15:10, Shankar, Murali escreveu:
is there another location
Depends on your setup of course but I see something like this in my arch.log right at the beginning.

<context class="com.cosylab.epics.caj.CAJContext">
  <preemptive_callback>true</preemptive_callback>
  <addr_list>gateway:5076 gateway:5077 gateway:5078 other-gateway:5064</addr_list>
  <auto_addr_list>false</auto_addr_list>
  <connection_timeout>30.0</connection_timeout>
  <beacon_period>15.0</beacon_period>
  <repeater_port>5069</repeater_port>
  <server_port>5076</server_port>
  <max_array_bytes>80000000</max_array_bytes>
  <event_dispatcher class="org.epics.archiverappliance.engine.epics.JCAEventDispatcherBasedOnPVName"/>
</context>

I found it:


<context class="com.cosylab.epics.caj.CAJContext">

   <preemptive_callback>true</preemptive_callback>

   <addr_list></addr_list>

   <auto_addr_list>true</auto_addr_list>

   <connection_timeout>30.0</connection_timeout>

   <beacon_period>30.0</beacon_period>

   <repeater_port>5065</repeater_port>

   <server_port>5064</server_port>

   <max_array_bytes>30.0</max_array_bytes>

   <event_dispatcher
class="org.epics.archiverappliance.engine.epics.JCAEventDispatcherBasedOnPVName"/>
</context>

One thing look's strange, max_array_bytes...Looks a bit low, can be the
problem?

pause/resume I tried
Pausing/resuming tears down and recreates the CAJ channel for the PV so if you are unable to connect even after this you probably have some misconfiguration; that is, you can rule out most transient errors.

IOC's log on ioc machine right
Yes; sometimes there could be stuck tasks on the IOC side; you can check for that.

Your PV Details page for this PV could have information there that could help. You can get to this using something like so - http://localhost:17665/mgmt/bpl/getPVDetails?pv=Your_PV

I take a look, but i can't see anything strange
Finally, would you be in a position to attempt a restart?
I will try it, but on next week.

The most strange thing is a lot of PV's work well, and a few has
problems (eventually disconnect)

Thank you again

Regards


Regards,
Murali




________________________________________
From: Gabriel de Souza Fedel <gabriel.fedel@lnls.br>
Sent: Friday, September 1, 2017 10:52 AM
To: Shankar, Murali; tech-talk@aps.anl.gov
Cc: (mdavidsaver@gmail.com)
Subject: Re: Archiver: Problems with disconnected PVs

Em 01-09-2017 13:39, Shankar, Murali escreveu:
This seems like it might be a CA client configuration issue

This is the most likely case. The engine prints out it's CAJ
configuration on startup and you should be able to see this in your logs.

I didn't find it. I find engine/logs/catalina.err, is there another
location?


You can try a couple of things. You can pause/resume the PV's in
question and see if they reconnect back. You can also look in the IOC's
logs and see if there is anything interesting going there.

pause/resume I tried. IOC's log on ioc machine right? apparently there
is no errors (on epics console).



Regards,

Regards and thank you for the answer

Murali



--
Gabriel Fedel
Software de Operação das Linhas de Luz
Laboratório Nacional de Luz Síncrotron – (LNLS)
Centro Nacional de Pesquisa em Energia e Materiais (CNPEM)
gabriel.fedel@lnls.br | +55 (19) 3512 1226
www.lnls.cnpem.br


--
Gabriel Fedel
Software de Operação das Linhas de Luz
Laboratório Nacional de Luz Síncrotron – (LNLS)
Centro Nacional de Pesquisa em Energia e Materiais (CNPEM)
gabriel.fedel@lnls.br | +55 (19) 3512 1226
www.lnls.cnpem.br


--
Gabriel Fedel
Software de Operação das Linhas de Luz
Laboratório Nacional de Luz Síncrotron – (LNLS)
Centro Nacional de Pesquisa em Energia e Materiais (CNPEM)
gabriel.fedel@lnls.br | +55 (19) 3512 1226
www.lnls.cnpem.br


--
Gabriel Fedel
Software de Operação das Linhas de Luz
Laboratório Nacional de Luz Síncrotron – (LNLS)
Centro Nacional de Pesquisa em Energia e Materiais (CNPEM)
gabriel.fedel@lnls.br | +55 (19) 3512 1226
www.lnls.cnpem.br

References:
Re: Archiver: Problems with disconnected PVs Shankar, Murali
Re: Archiver: Problems with disconnected PVs Gabriel de Souza Fedel
Re: Archiver: Problems with disconnected PVs Shankar, Murali
Re: Archiver: Problems with disconnected PVs Gabriel de Souza Fedel
Re: Archiver: Problems with disconnected PVs Shankar, Murali
Re: Archiver: Problems with disconnected PVs Shankar, Murali
Re: Archiver: Problems with disconnected PVs Gabriel de Souza Fedel
Re: Archiver: Problems with disconnected PVs Shankar, Murali

Navigate by Date:
Prev: Re: C++ multi threaded application. Ralph Lange
Next: RE: C++ multi threaded application. Mark Rivers
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
Navigate by Thread:
Prev: Re: Archiver: Problems with disconnected PVs Shankar, Murali
Next: C++ multi threaded application. Giacomo S.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
ANJ, 06 Sep 2017 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·