EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Using all the cores available on modern processors
From: Mark Rivers <[email protected]>
To: "'Pearson, Matthew R.'" <[email protected]>, "[email protected]" <[email protected]>
Cc: EPICS Tech Talk <[email protected]>
Date: Fri, 21 Jun 2013 21:52:47 +0000
Just so you know, the Linux machine I was testing on today was running Fedora Core 15.

corvette:~/devel/areaDetector/documentation>more /proc/version
Linux version 2.6.43.8-1.fc15.x86_64 ([email protected]) (gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) ) #1 SMP Mon Jun 4 20:33:44 UTC
 2012


Mark

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Pearson, Matthew R.
Sent: Friday, June 21, 2013 4:45 PM
To: [email protected]
Cc: EPICS Tech Talk
Subject: Re: Using all the cores available on modern processors


Hi Nick,

I saw the problem using a simulation too (no sockets, just a thread filling an array with data and creating NDArray objects). I'm pretty sure I wasn't writing files either. I just had threads doing processing and scaler/array callbacks.

One way to narrow down the problem would be to try a completely different Linux distribution, and also try the machines that are running real-time patched kernels (the ethercat machines that DLS has). It's possible that the RHEL6 that you're using is configured in some way that is causing the issue.

Cheers,
Matt

On Jun 21, 2013, at 5:04 PM, [email protected] wrote:

> Hi Mark,
> 
> That sounds interesting. If I understand correctly, the Windows system was most interesting because it had 8 threads and 11 processes and wasn't getting into the same problem that Matt and I had seen. Maybe it is the file writing that is killing it because it probably is yielding the CPU and not getting it back when it needs. However, your results are different to ours so we will have to look at this in detail. I think all our problems have been seen when we had real detectors (and/or were listening on sockets) and were writing data to a file.
> 
> Cheers,
> 
> Nick Rees
> Principal Software Engineer           Phone: +44 (0)1235-778430
> Diamond Light Source                  Fax:   +44 (0)1235-446713
> 
> -----Original Message-----
> From: Mark Rivers [mailto:[email protected]]
> Sent: 21 June 2013 18:34
> To: Rees, Nick (DLSLtd,RAL,DIA); [email protected]
> Subject: RE: Using all the cores available on modern processors
> 
> Hi Nick,
> 
> I just did a quick study of the performance on a multi-core Linux and Windows system using the simDetector and 11 plugins all running in their own threads.  The simDetector was producing 1024x1024 8-bit images.  The details are in the attached PDF file.
> 
> Here are the main conclusions on Linux system with dual quad-core CPUS (Intel(R) Xeon(R) CPU, E5630 @ 2.53GHz). The system thus has 8 physical cores, but has hyper-threading enabled, and so has 16 virtual cores.
> 
> As the frame rate increased from 20 frames/s to 440 frames/s the percentage CPU increased from 300% to 1050%. Plugins began saturating their threads and dropping frames at different frame rates depending on the computation required by that plugin.
> 
> A simple view of the system is that there are 11 plugins, each running in its own thread, plus the simDetector thread that computes the images. When the system is running at its maximum possible rate there should thus be 12 cores running at 100% CPU, or 1200% CPU time in "top". In fact we reached 1050% CPU, or 10.5 cores, and at the point the ROI threads were not saturated since they are dropping only a few frames.
> 
> Conclusions: the areaDetector plugin architecture and the Linux scheduler are not getting in the way of nearly ideal scaling as the frame rate increases.
> 
> I then repeated the tests on a Windows 7 64-bit computer system with dual quad-core CPUS (Intel(R) Core(TM) i7-2820QM CPU@ 2.30GHz). The system thus has 8 physical cores, and does not have hyper-threading, so has 8 cores total.
> 
> On Windows I increased the frame size to 2048 x 2048 because the minimum time for epicsThreadSleep() did not permit as high a frame rate as with the simDetector as on Linux.
> 
> As the frame rate increased from 20 frames/s to 150 frames/s the percent CPU (using same scale as Linux) increased from 240% to 800% (fully saturated). Plugins began saturating their threads and dropping frames at different frame rates depending on the computation required by that plugin.
> 
> Under the maximum frame rate conditions the system is 100% CPU busy. But the simDetector thread was not being held up by the plugins. As on Linux the processing and some statistics plugins are dropping over 90% of the frames, but other plugins and the simDetector main thread are not being held back by these plugins.
> 
> Conclusion: the areaDetector plugin architecture and the Windows scheduler are not getting in the way of nearly ideal scaling as the frame rate increases.
> 
> This study was performed without any file-writing plugins active.
> 
> I will now extend the test to include file-writing plugins.
> 
> Mark
> 
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
> Sent: Thursday, June 20, 2013 5:55 AM
> To: [email protected]
> Subject: Using all the cores available on modern processors
> 
> This is probably aimed at Mark Rivers (sorry Mark!) but I am hoping other people may have seen the same symptoms. As I mentioned at the last EPICS meeting, on a number of development projects over the past year (mostly with area detector systems, but with some others) we have run into what superficially looks like a common problem.
> 
> When we configure a system simply (e.g. an areaDetector configuration which just reads out data and writes it to disk and does nothing else apart from scalar status callbacks) then the main CPU load is on a single core and it can use up to 90% of that core quite happily.
> 
> When we make things more complicated and add processing plugins, for example, the data throughput to disk drops off dramatically and frames start buffering up (and some may be dropped). The CPU load is distributed across all cores, but they are only loaded at the 20-30% level, so the system seems largely idle with plenty of processing power available, but it can't be utilised.
> 
> Typically we have seen these problems on dual socket NUMA architecture Intel systems, with a non pre-emptively scheduled Windows or Linux OS.
> 
> In theorising about this we have conjectured a number of scenarios (in some sort of priority order):
> 
> 1. AreaDetector has a locking issue and processing is held up by tasks holding locks that they shouldn't have.
> 2. Because this has happened on non-preemptively scheduled systems, the high priority tasks aren't getting CPU for some reason (despite there being CPU available). This may be related to (1) so maybe the system isn't overcoming priority inversions properly.
> 3. There is some other resource bottleneck - such as a bottleneck in the QPI because of the NUMA architecture.
> 4. There is a problem with top or the Windows performance monitor which doesn't account for overheads properly in a busy multi-core system.
> 
> So, at this stage we have not looked at this problem in any detail, we have just noticed the symptoms. Has anyone else come across it and have any pointers before we invest a serious amount of effort looking at it. I suspect that if we are going to understand this it may take some time, and may have some design implications for EPICS on modern architectures.
> 
> Cheers,
> 
> Nick Rees
> Principal Software Engineer           Phone: +44 (0)1235-778430
> Diamond Light Source                  Fax:   +44 (0)1235-446713
> 
> 
> --
> This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 
> 
> 
> 
> --
> This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
> 
> 
> 
> 
> 




References:
Using all the cores available on modern processors nick.rees
RE: Using all the cores available on modern processors nick.rees
Re: Using all the cores available on modern processors Pearson, Matthew R.

Navigate by Date:
Prev: Re: Using all the cores available on modern processors Pearson, Matthew R.
Next: Re: Problem in errlogRemoveListener Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Using all the cores available on modern processors Pearson, Matthew R.
Next: AutoSAVE file cannot use substitution macros Amien TLABS
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  <20132014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Apr 2015 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·