Experimental Physics and
Industrial Control System

<2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024	Index	<2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
<== Date ==>		<== Thread ==>

Subject:	RE: 3.14 Gateway Performance
From:	"Kenneth Evans, Jr." <[email protected]>
To:	"Marty Kraimer" <[email protected]>, "Ned Arnold" <[email protected]>
Cc:	"Johnson, Andrew N." <[email protected]>, "Ralph Lange" <[email protected]>
Date:	Mon, 2 Dec 2002 12:04:38 -0600

Marty,

Replies below.

-Ken

-----Original Message-----
From: Marty Kraimer [mailto:[email protected]]
Sent: Wednesday, November 27, 2002 7:48 AM
To: Ned Arnold; [email protected]
Cc: Johnson, Andrew N.; Marty Kraimer; Ralph Lange
Subject: Gateway

At the EPICS Core Working Group meeting at JLAB the problem of running the
gateway on 3.14 was discussed. My understanding is that when we build the
gateway against 3.14 it uses so much cpu time that it doesn't work correctly.
Someone mentioned that on 3.13 it already uses 75% of the cpu.
[KE] It works correctly. It uses more CPU than the old one. Possibly up to a factor of 2 more.
Some questions.

Is this true? Ken should know the answers. Do you have some actual performance
numbers?
[KE] The only real-life test is for our Hydra Gateway, It typically uses about 75% CPU with the old version and the CPU is saturated at about 95% with the new one. All other programs tend to be sluggish when it is running, and the machine is too close to the limit. It ran for a week or two during shutdown and crashed only once, compared to much more frequently for the old one. It also has some useful new features and bug fixes. We (including Ned) decided to not use it further. The performance is, of course, time dependent. I have quite a lot of StripTool plots if you want more information. The dramatic ones are the ones taken when we changed versions. Only the CPU changes.

[KE] This could all be fixed by a faster machine, but replacing all our Gateway machines would be expensive. There is no reason 3.14 should be using more CPU. It is a problem that should be fixed now. This is the last (probably) of a series of problems with CAS in 3.14. Jeff has fixed the others as the Gateway has uncovered them. The long time spent getting the 3.14 Gateway running was not from the conversion, which was done in a few days, but in fixing the sucession of problems in CAS for 3.14. I trust Jeff can fix this one, too, if he works on it.

If this is true then it sounds like only a matter of time until even the 3.13
version will fail.
[KE] It's been doing OK for some time. The load doesn't appear to be increasing much with time. You should verify this with Marty Smith.

I assume this only applys to the gateway for ASD not the gateways for the CATS.
[KE] I would guess Hydra is the most used. Some of the CATs probably have a light load. Marty Smith or Mohan would be the person to contact about this. All the stats are available from http://www.aps4.anl.gov/user_operations/index.html.

For the ASD gateway can't we have another solution?
[KE] Yes, we can buy a faster computer, perhaps even a Linux one. Saturn does much better than Hydra in my tests. But the problem probably affects all portable servers. It is possibly actually in CA as the routine using all the timing is in tcp_recv_thread in CA. Hence it may affect CA clients. It may be owing to thread scheduling, which is used in CA but not in CAS, according to my understanding. Why would we not investigate and try to fix it?

[KE] A long-standing problem with the Gateway is that it first calls fdManager then ca_poll. Neither quits if it has work to do. Hence the other one cannot get time to empty its queues, etc., when one is running. Filled queues affect the one running. And the problem compounds and accelerates. It would be nice to multiplex them in some way as a long-term solution.

Some possibilities.

Run separate gatways for phoebus and oxygen.
Get a more powerful gateway machine.

Marty

References:: Gateway Marty Kraimer

Navigate by Date:: Prev: Re: base max thread priority Andrew Johnson; Next: POSIX recursive mutex Marty Kraimer; Index: <2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Navigate by Thread:: Prev: Gateway Marty Kraimer; Next: Re: Gateway Ned D. Arnold; Index: <2002> 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024

ANJ, 02 Feb 2012

· Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System