EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Gige performance increasing.
From: Mark Rivers <[email protected]>
To: "[email protected]" <[email protected]>
Cc: "[email protected] Talk" <[email protected]>
Date: Mon, 22 Apr 2013 15:15:33 +0000
Tom,

Thanks so much!!!!

That Knowledge Base article from Point Grey (a Prosilica/AVT competitor!) was the key!

The problem is that the default setting of net.core.rmem_default and net.core.rmem_max is too small for these GigE cameras when the system starts to get busy.

Here are the settings of the net.core.rmem settings before I changed anything.

Colorado:areaDetector/iocBoot/iocPS3>sudo sysctl -a | grep net.core
...
net.core.rmem_max = 131071
net.core.rmem_default = 126976
...

So they are about 128 KB.  I then increased them to 1MB as suggested by the Point Grey article:

Colorado:areaDetector/iocBoot/iocPS3>sudo sysctl -w net.core.rmem_max=1048576 net.core.rmem_default=1048576

Colorado:areaDetector/iocBoot/iocPS3>sudo sysctl -a | grep net.core.rmem
net.core.rmem_max = 1048576
net.core.rmem_default = 1048576

With those settings the problem of lost frames is completely eliminated.  I can run 3 cameras at 10Hz with JPEG and Statistics plugins running.  All 4 cores are 100% saturated, but I only lost 5 frames out of 3000, so about 0.2%.  I was losing more than 50% before I changed the buffer sizes.  At 5 frames/sec with the increased buffer sizes I did not lose any frames, but the SLAC folks reported 5% frame loss under those conditions, even on a system with separate GigE cards per camera and 12 cores.

I will put a note about this in the Prosilica documentation for areaDetector.

Moral of the story: ask tech-talk sooner rather than later!

Thanks again,
Mark

________________________________________
From: [email protected] [[email protected]]
Sent: Monday, April 22, 2013 2:48 AM
To: Mark Rivers
Subject: RE: Gige performance increasing.

When using point grey cameras with aravis I had to set rmem_max higher to avoid significant dropped frames:
http://www.ptgrey.com/support/kb/index.asp?a=4&q=354

AVT and Prosilica cameras with aravis seem to be equally happy with either setting of rmem_max, but we run all cameras on the same 1 Gig network interface, so we're not dealing with the same amount of data that you are.

Thanks,
Tom Cobb

> -----Original Message-----
> From: [email protected] [mailto:tech-talk-
> [email protected]] On Behalf Of Mark Rivers
> Sent: 20 April 2013 15:38
> To: [email protected] Talk
> Subject: FW: Gige performance increasing.
>
> Folks,
>
> I apologize for the previous post, the tech-talk mailer converts to
> plain text so the table was lost. I have now attached it as a PDF,
> hopefully this will work.
>
> I am forwarding a message reporting some interesting problems running
> multiple Prosilica cameras on a single machine.
>
> Suggestions are most appreciated!
>
> Mark
>
>
> From: Mark Rivers
> Sent: Saturday, April 20, 2013 9:07 AM
> To: Slava Isaev
> Cc: Matjaz Kobal; Spencer J. Gessner; Matthew Boyes; Luciano Piccoli;
> Williams Jr., Ernest L.; Andrew Johnson
> Subject: RE: Gige performance increasing.
>
> Folks,
>
> Yesterday I set up a system in my lab to try to reproduce the results
> that Matjaz presented in the report he sent me on Nov. 7, 2012. I
> have attached the complete report. The following is Table 1 from the
> report.
>
> He was testing on a Linux system with 8 network cards (dedicated card
> per camera) and dual 6-core CPUs (12 cores total). Each camera ran in
> its own IOC, so in its own process on Linux.
>
> Note that with 3 cameras running at 10 Hz and all 3 plugins (JPEG,
> Process, and Analyze) running he observed 80% usage of a single CPU,
> and 340% usage of the total CPUs. There was 15% frame loss between the
> cameras and the computer under these conditions.
>
> The system I have available for testing has a single GigE network card
> and only 4 cores. It is a dual-boot system with Fedora Core 14 and
> Windows 7 64-bit.
>
> I first tested on Fedora. I ran tests for each camera to collect 1000
> frames, which took 100 seconds at 10 frames/sec. Running 3 cameras at
> 10 Hz I observed results similar to what Matjaz did. It was dropping
> less than 0.1% of the frames when I had no plugins enabled, and more
> than 10% of the frames when I enabled the JPEG and Statistics plugins
> on each camera, which was putting more than 80% load on each CPU. I
> will get some more precise numbers next week before I come, but I
> believe I am effectively seeing results similar to Matjaz.
>
> However, I then booted the system with Windows 7 and conducted the
> identical tests. With all 3 cameras running at 10 Hz, and the JPEG and
> Statistics plugins running, Windows was reporting over 90% CPU
> utilization. Windows task manager reports the %CPU utilization such
> that 100% means that all cores are saturated, unlike Linux which
> reports NCores*100% when all cores are saturated. Thus the system was
> very close to fully CPU saturated. Under these conditions the cameras
> did not drop a SINGLE frame! This was 0 dropped frames out of 3000
> total, so less than 0.03%, compared to 15% dropped frames that Matjaz
> measured under almost identical conditions on a much more powerful
> Linux server with dedicated Ethernet port per camera and 3 times more
> cores. The plugins also did not drop any frames, although I was
> monitoring the free queue size in each plugin, and they occasionally
> came close to depleting the queue and dropping frames. That is exactly
> what I expect as the CPUs approach saturation.
>
> So my conclusion is that whatever is causing the dropped frames is not
> really a problem with the areaDetector driver or architecture, but is
> something specific to the Linux Ethernet driver or perhaps the Linux
> AVT driver library.
>
>
> IMPORTANT NOTE:
>
> I tried to automate my testing by writing an IDL script to turn the
> cameras on, wait for them to get done, and then read the statistics on
> the dropped frames. This was using the normal IDL channel access
> library. I observed VERY WEIRD behavior which I do not understand at
> all. Here is what I observe:
>
> - If I start each camera acquiring by using medm to set the Acquire PV
> to 1 then it almost always works fine. I press Acquire on camera 1,
> then quickly press Acquire on camera 2, and then camera 3. If I do
> that with all plugins disabled, then each cameras starts acquiring at
> 10 frames/sec with essentially no dropped frames.
> - However, if I do the "identical" operation using IDL to set the
> Acquire PV to 1 on each camera in succession here is what I see:
> - If I only start 2 cameras, rather than 3 it works fine. Both
> cameras acquire at 10 Hz with no dropped frames.
> - If I start 3 cameras with say a 2 second delay between starting
> each one (to simulate my delay when using medm to do it) then the first
> 2 cameras begin acquiring at 10 Hz. But as soon as the third camera is
> started all 3 cameras begin dropping MORE THAN 90% of their frames!
> - This behavior of dropping 90% of frames when camera 3 starts
> happens no matter what delay (0.1 to 5 seconds) I put between starting
> the next camera.
> - I see the identical behavior on Linux and Windows.
>
> IDL and medm are both running on another Linux machine, not the machine
> running the camera IOCs, so these are channel access put operations
> from the same remote machine.
>
> I am totally baffled by this. Why does it make a difference if I start
> the cameras with medm or IDL? They should both result in similar
> channel access put operations. Furthermore, what can the IDL put
> operation be doing that causes the cameras to suddenly begin to drop
> 90% of their frames?
>
> I see one other behavior that I don't understand. When I use the
> "caput" program from EPICS base (3.14.12.3) to write to any PV in the
> camera IOC I see about a 2 second delay before the caput completes:
>
> corvette:~>date ; /usr/local/epics/base-3.14.12.3/bin/linux-x86/caput
> 13PS1:cam1:Gain.DESC "Test" ; date
> Sat Apr 20 08:58:54 CDT 2013
> Old : 13PS1:cam1:Gain.DESC Test
> New : 13PS1:cam1:Gain.DESC Test
> Sat Apr 20 08:58:56 CDT 2013
>
> Note that "date" is reporting that this operation took about 2 seconds.
> There is a very noticeable delay between when the "New" value of the PV
> is printed, and when the Linux shell prompt returns. Why? This
> happens when all 3 cameras are not acquiring, so it cannot be a problem
> with Ethernet loading. It happens whether the camera IOCs are running
> on Windows or Linux.
>
> If I write to a PV in a vxWorks IOC I do not see this delay, the Linux
> prompt returns "immediately" with no perceptible delay.
>
> corvette:~>date ; /usr/local/epics/base-3.14.12.3/bin/linux-x86/caput
> 13LAB:m1.DESC "Test" ; date
> Sat Apr 20 08:59:09 CDT 2013
> Old : 13LAB:m1.DESC test
> New : 13LAB:m1.DESC Test
> Sat Apr 20 08:59:09 CDT 2013
>
> I wonder if this could be related to the problem I am seeing with IDL
> starting the cameras?
>
> Cheers,
> Mark


--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom






References:
FW: Gige performance increasing. Mark Rivers
FW: Gige performance increasing. Mark Rivers

Navigate by Date:
Prev: RE: genesys power supply IOC Mark Rivers
Next: Re: genesys power supply IOC James F Ross
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: FW: Gige performance increasing. Andrew Johnson
Next: Andrew 周永年
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·