EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <20172018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <20172018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: Area Detector and high performance NVME devices
From: "Mark S. Engbretson" <[email protected]>
To: "'Mark Rivers'" <[email protected]>
Cc: [email protected]
Date: Thu, 29 Jun 2017 15:29:24 -0500
So thread safe on Linux, but still with a mutex? Bummer. May as well confirm this under windows . . . 

-----Original Message-----
From: Mark Rivers [mailto:[email protected]] 
Sent: Thursday, June 29, 2017 3:16 PM
To: 'Mark S. Engbretson' <[email protected]>
Subject: RE: Area Detector and high performance NVME devices

I just tested running 3 HDF5 plugins with NDPluginScatter in the simDetector on Linux.  Unlike Mark's test on Windows, this did not crash.  However, as Ulrik predicted there is no performance increase.

This is the performance with 1 HDF5 writer:

13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:05:28.581398 137
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:05:32.581403 140
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:05:33.581351 139

So it is doing about 140 frames/s.

This is with 2 writers:

13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:05.581385 108
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:05.581442 19
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:06.581393 63
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:06.581484 68
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:07.581383 59
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:07.581442 66
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:08.581382 110
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:08.581435 18
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:09.581388 114
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:09.581500 15
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:10.581351 106
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:10.581404 25
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:11.581362 121
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:11.581445 11
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:02:12.581331 117
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:02:12.581383 13

If you sum the rates at the time point it is about 130 frames/s.  So both threads are writing, but they are blocking each other and the total rate does not increase over a single thread.

This is with 3 writers:

13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:30.581377 29
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:30.581433 33
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:30.581437 31
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:31.581387 34
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:31.581492 35
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:31.581497 35
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:32.581384 39
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:32.581449 31
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:32.581454 39
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:33.581360 32
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:33.581425 36
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:33.581430 33
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:34.581368 34
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:34.581440 34
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:35.581394 32
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:35.581478 37
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:35.581497 38
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:36.581394 38
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:36.581493 31
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:36.581501 40
13SIM1:HDF1:ArrayRate_RBV      2017-06-29 15:03:37.581347 41
13SIM1:HDF2:ArrayRate_RBV      2017-06-29 15:03:37.581400 32
13SIM1:HDF3:ArrayRate_RBV      2017-06-29 15:03:37.581407 44

Again if you sum the rates at the same time point it is about 110 frames/s.  So all 3 threads are writing, but the net throughput is decreasing as the number of threads increases.

So the bottom line is that if you want to increase the HDF5 rate you need to do it in multiple processes, as I did in my previous test.

Mark



-----Original Message-----
From: Mark S. Engbretson [mailto:[email protected]]
Sent: Thursday, June 29, 2017 3:03 PM
To: Mark Rivers
Subject: RE: Area Detector and high performance NVME devices

Actually, it appears that windows threadsafe code might only works within a DLL, unless someone actually writes in the setup and cleanup code somewhere in main. With DLL's this can be handled as part of the dll initialization and cleanup. In a static lib, somone has to do it themselves. Or at least that was the case as of last October.

Rinse and repeat . . . 



-----Original Message-----
From: Mark Rivers [mailto:[email protected]]
Sent: Thursday, June 29, 2017 2:44 PM
To: 'Mark S. Engbretson' <[email protected]>; [email protected]
Cc: [email protected]
Subject: RE: Area Detector and high performance NVME devices

The actual flag passed to the compiler when building HDF5 thread safe is H5_HAVE_THREADSAFE.

The ADSuppport/supportApp/hdf5Src/Makefile contains this line:

       USR_CFLAGS_WIN32 += -D H5_HAVE_THREADSAFE

So it is building the library thread safe on Windows.

Mark

-----Original Message-----
From: Mark S. Engbretson [mailto:[email protected]]
Sent: Thursday, June 29, 2017 2:37 PM
To: [email protected]; Mark Rivers
Cc: [email protected]
Subject: RE: Area Detector and high performance NVME devices

Mark Rivers was just suggesting that since at the instant, that does not seem to be the build configuration. 

One more thing to test and try.


-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: Thursday, June 29, 2017 2:11 PM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: Re: Area Detector and high performance NVME devices

Yes, this is the way to go - multiple writer processes. The HDF5 library does have global data structures internally and threadsafety was added later with a global lock (that you have to enable at build-time with the --threadsafe configuration flag).

So if you are limited by your simDetector you can also just benchmark the file system/hdf5 library with this application that I’ve written for testing the SWMR functionality - it’s just a tight loop around appending an 2D image to a 3D dataset. It uses Direct Chunk Write. Our system administrators use it to benchmark our file systems occasionally.

If you have Redhat6 or 7 then the prebuilt binary on release 0.4.0 might just run directly out of the tar file. If you need to build from source then take the ‘fordistribution’ branch and follow instructions in the README.md. It has never been built for Windows.

https://github.com/ulrikpedersen/swmr-testapp/releases 

Cheers,
Ulrik

> On 29 Jun 2017, at 19:35, Mark Rivers <[email protected]> wrote:
> 
> Mark,
> 
> I just did some tests on a relatively high-power Linux box.
> 
> RHEL7, 20 cores, 128GB RAM.
> 
> I created a 64GB ramdisk (tmpfs).  I then tested writing 3000 frames (4096x3078 Int8 = 36GB total) with a single HDF5 plugin to that disk.  The plugin was doing about 160 frames/s = ~1.9GB/s.  This is about 2.5 times faster than you were getting to your SSD on Windows.  But it is still not quite fast enough to keep up with your detector.
> 
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:45.207142 148
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:47.207139 150
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:48.207152 149
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:49.207140 151
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:51.207131 146
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:52.207154 152
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:53.207148 153
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:54.207138 162
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:55.207140 163
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:56.207122 164
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:57.207151 165
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:11:58.207125 164
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:12:01.207131 166
> 13SIM1:HDF1:ArrayRate_RBV      2017-06-29 11:12:02.207124 163
> 
> I then implemented what I proposed earlier, running
> 
> - simDetector IOC, NDPluginScatter, 3 NDPluginPva plugins
> - 3 IOCs running pvaDriver and NDFileHDF5
> 
> I tested with 1, 2 and 3 NDPluginPva enabled, so 1, 2 or 3 simultaneous file writers.
> 
> 1 NDPluginPva running, so 1 HDF5 files being written:
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:12.989192 96
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:13.989140 98
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:14.989172 99
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:15.989180 98
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:16.989140 102
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:18.989212 103
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:22:19.989186 112
> This is doing about 100 frames/s (1.2GB/s).  The previous test without NDPluginPva shows that this is not limited by the NDFileHDF5. It is actually limited by the pvaDriver, it can only handle a bit more than 100 frames/s.
> 
> 2 NDPluginPva running, so 2 HDF5 files being written simultaneously:
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 12:57:16.920113 90
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:16.989199 97
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 12:57:17.920112 99
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:17.989114 99
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 12:57:18.920129 98
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:18.989177 98
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 12:57:19.920076 99
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:19.989159 99
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 12:57:20.920111 98
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:20.989184 97
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:21.989209 99
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 12:57:22.989164 98
> This is also doing 100 frames/s per pvaDriver, but now the total file writing speed is 200 frames/s = 2.4 GB/s.  This meets the performance specs of Mark's detector.
> 
> 3 NDPluginPva running, so 3 HDF5 files being written simultaneously:
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:37.920132 72
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:37.989185 72
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:38.123926 72
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:38.920141 73
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:38.989200 73
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:39.123877 73
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:40.920114 72
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:40.989201 72
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:41.123883 72
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:41.920078 73
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:41.989188 73
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:42.123879 73
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:42.920174 72
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:42.989204 72
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:43.123939 72
> 13PVA2:HDF1:ArrayRate_RBV      2017-06-29 13:04:44.920105 73
> 13PVA1:HDF1:ArrayRate_RBV      2017-06-29 13:04:44.989221 73
> 13PVA3:HDF1:ArrayRate_RBV      2017-06-29 13:04:45.123883 73
> Each pvaDriver is now writing 73 frames/s = 220 frames/s total = 2.6GB/s.  This is about 10% faster than 2 writers.  It is being limited by the speed at which simDetector can generate frames.  simDetector can do 300 frames/s (3.6 GB/s) when the HDF5 files writers are not active, but when they are active it drops to 220 frames/s.  This is probably limited by memory bandwidth.
> 
> Mark
> 
> 
> 
> -----Original Message-----
> From: Mark S. Engbretson [mailto:[email protected]]
> Sent: Thursday, June 29, 2017 11:01 AM
> To: [email protected]; Mark Rivers
> Cc: [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> 
> This seemed to crash the simDetector on my system, on trivial 
> examples, Maybe on the first file, maybe after 10.
> 
> Will have to rebuild code with debugging, although that *really* slows 
> down write speeds on my systems.
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: Thursday, June 29, 2017 9:34 AM
> To: [email protected]; [email protected]
> Cc: [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> Hi Mark R and Mark E,
> 
>> You load 1 NDPluginScatter plugin and 3 NDFileHDF5 plugins.  Each
>> HDF5
> plugin is writing to a different file.
> 
> This would use the HDF5 library in 3 separate threads. The threadsafe 
> mechanism in the HDF5 library is a crude mutex lock on each library 
> call. I don't think performance will scale with this solution.
> 
> Cheers,
> Ulrik
> 
> -----Original Message-----
> From: Mark S. Engbretson [mailto:[email protected]]
> Sent: 29 June 2017 14:36
> To: 'Mark Rivers'; Pedersen, Ulrik (DLSLtd,RAL,TEC)
> Cc: [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> That is what I intend to try now just to see what happens. 
> 
> Thanks to both Mark Rivers and Ulrik Pedersen for a road map of what 
> to try next.
> 
> Me
> 
> 
> -----Original Message-----
> From: Mark Rivers [mailto:[email protected]]
> Sent: Thursday, June 29, 2017 8:01 AM
> To: [email protected]; [email protected]
> Cc: [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> Mark,
> 
> In your case it appears that the underlying file system is fast enough 
> to keep up with the detector, but the HDF5 plugin is not. In this case 
> you could take advantage of the NDPluginScatter feature that was added 
> in ADCore R3-0.
> 
> Let's say that the detector and file system are somewhat less than 3 
> times faster than the HDF5 file writing plugin. You can then build the 
> following pipeline.  You load 1 NDPluginScatter plugin and 3
> NDFileHDF5 plugins.  Each
> HDF5 plugin is writing to a different file.
> 
>                    Driver
>                        |
>             NDPluginScatter
>               |            |              |
>     HDF5#1   HDF5#2  HDF5#3
> 
> 
> NDPluginScatter distributes the NDArrays on a round-robin basis, i.e. 
> the first array goes to HDF5#1, the second to HDF5#2, etc.  So you 
> will end up with 3 files, where each contains 1/3 of the data.  As 
> long as the HDF5 plugins can keep up then it will be deterministic 
> that file#1 has arrays 1, 4, 7, .., file #2 has arrays, 2, 5, 8, .., 
> and file 3 has arrays 3, 6, 9, .... If arrays are dropped then you can 
> tell which file has which arrays via the UniqueId which is always stored for each array in the file.
> 
> Obviously you can scale this to more NDFileHDF5 plugins if you need 
> more than a factor of 3 performance gain.
> 
> Mark
> 
> 
> ________________________________________
> From: [email protected] [[email protected]]
> Sent: Thursday, June 29, 2017 7:10 AM
> To: [email protected]
> Cc: Mark Rivers; [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> Hi Mark,
> 
> Apologies, I'll just use this opportunity for a bit of shameless promotion:
> Using HDF5 with high performance detectors is a very wide and complex 
> topic that can be discussed in great lengths. So come along and do 
> just that at the HDF5 workshop at ICALEPCS2017 in sunny Barcelona:
> http://www.icalepcs2017.org/index.php/program/workshops#HDF5
> 
> 
>> I know that I could probably create multiple HDF file plugins - just 
>> no
> idea of what would actually happen if they all look at the same image 
> buffer stream. Would they lock so that each one would get a unique 
> NDArray, or might the same image appear in multiple output files?
> 
> There is no point in creating multiple instances of the HDF5 file 
> writer
> plugin: the HDF5 library implements thread-safety with a crude global lock.
> So you don't get a performance increase.
> 
>> In theory, under Cygwin or Linux, one can build the Open-MPI Libs 
>> that
> PHDF5 requires, but I would suspect that the HDF file Plugin would 
> still not automagically become multithreaded and vastly faster running the same code.
> 
> I have done some work a few years ago to use the parallel HDF5 to try 
> and increase performance. MPI scales in processes, not threads. Each 
> process runs a single thread HDF5. You can't build the parallel HDF5 
> library with MPI into areaDetector as it runs in a single process. If 
> you receive the datastream in an areaDetector driver then you would 
> need to split and fan-out the data stream over some IPC mechanism to 
> the MPI/pHDF5 file writer processes.
> 
> In my experience this does not scale to perform as well as one would 
> expect and I would not advise to go this way. At least not unless you 
> scale to 100s or 1000s of writer nodes. Writing from multiple 
> independent (i.e. not mpi) processes to individual files works much 
> faster. You can then tie these datasets together using the new HDF5 
> Virtual Dataset (VDS) feature to provide a single, coherent dataset 
> 'view' for reading/processing. VDS is available in from HDF5 version 
> 1.10. This is what we are doing at Diamond now for new fast/parallel detectors.
> 
>> "Direct Chunk". Hmmm, don't supposed that This is already in the HDF5 
>> libs
> and one could tweak some code to make use out of it, would be 
> interested to see a spped test on the nvme hardware.
> 
> Yes, Direct Chunk Write is available from HDF5 1.8.11 onwards. Even 
> h5py now have support for that so you can script some benchmarking in python!
> 
> 
> Cheers,
> Ulrik
> 
> -----Original Message-----
> From: Mark S. Engbretson [mailto:[email protected]]
> Sent: 28 June 2017 23:05
> To: Pedersen, Ulrik (DLSLtd,RAL,TEC)
> Cc: [email protected]; [email protected]
> Subject: RE: Area Detector and high performance NVME devices
> 
> The impressive hardware is probably the Euresys Coaxlink frame grabber 
> with a 2.5 GB/s readout rate and the current NVME disk technology 
> which claims write benchmarks of 8 GB/s (in a raid-0 configuration).
> But I have heard people talking about cameras with a 17 GB/s acquires 
> rates and even one in the 100's that are/will be available Real Soon Now.
> 
> Yes, everyone wants HDF5 formatted data, but as you pointed out, the 
> single thread HDF5 pipeline appears to choke before it hits wrtite 
> limits on modern hardware. Raw binary file writing on this hardware is 
> actually easy, with similar cavets as Lustre has (i.e., writes must be 
> a multiple of the nvme sector size, buffers might need to be memory 
> page aligned). But when the hardware runs at twice the write speed of 
> the camera, lots of issues go away.
> 
> I know that I could probably create multiple HDF file plugins - just 
> no idea of what would actually happen if they all look at the same 
> image buffer stream. Would they lock so that each one would get a 
> unique NDArray, or might the same image appear in multiple output files?
> 
> In theory, under Cygwin or Linux, one can build the Open-MPI Libs that
> PHDF5 requires, but I would suspect that the HDF file Plugin would 
> still not automagically become multithreaded and vastly faster running the same code.
> 
> "Direct Chunk". Hmmm, don't supposed that This is already in the HDF5 
> libs and one could tweak some code to make use out of it, would be 
> interested to see a spped test on the nvme hardware.
> 
> Me
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]
> Sent: Wednesday, June 28, 2017 3:40 PM
> To: [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: Area Detector and high performance NVME devices
> 
> Hi Marks,
> 
> So this is quite an impressive piece of detector! We have some fast 
> detectors here at Diamond but we have not used the areaDetector HDF5 
> file writer beyond what can fit through a 10gbps Ethernet pipe.
> Writing faster than that presents a number of challenges as you have noticed.
> 
> Writing 'raw' binary files will probably always be the most performant 
> options (if you do it right and tune the I/O pattern for the file system).
> However, you lose a lot of goodness by not having a container like
> HDF5 around it.
> 
> There are a lot of tuning parameters available in the HDF5 library 
> that I assume you have played around with: chunking and flushing 
> parameters, boundary alignment, etc. The first thing you have to 
> figure out about your file system is what size of 'chunks' (i.e.
> individual IO writes) it likes in order to perform best - and does it 
> require write operations to start on specified boundaries? Our Lustre 
> and GPFS file systems like 1MB and 4MB boundaries for example.
> 
> The HDF5 library operate a pipeline when doing file I/O: all 
> read/write operations pass the data through this pipeline by default 
> in order to do things like compression or datatype conversions.
> However, even when you're not using these features (like when you're 
> "just" streaming a lot of uint16 pixels to an image dataset) the data 
> pass through the pipeline and that has a certain performance overhead
> - i.e. a CPU or perhaps even memory I/O bottleneck.
> 
> Fortunately there is a way to circumvent this internal HDF5 pipeline 
> using a feature that was developed for Dectris because they wanted to 
> be able to write out HDF5 datasets consisting of pre-compressed 
> images. The single-threaded HDF5 pipeline was too slow for their 
> compression requirements. The Direct Chunk Write [1] feature can be 
> used with or without compressed datasets and because it circumvents 
> the pipeline it basically just does a simple write under the hood.
> 
> From my tests (a good while ago now) the Direct Chunk Write from 
> multiple processes performed much better than the parallel HDF5 (which 
> btw is not supported in the areaDetector HDF5 file writer).
> 
> The Direct Chunk Write functionality should be added to the 
> areaDetector
> HDF5 file writer. I will raise a ticket on github to discuss how best 
> to do that. That should enable nearly the same performance as your 'raw binary'
> write.
> 
> If that turns out to not be enough, the next step would be to 
> parallelise the problem by splitting the stream into multiple file writer processes.
> This is also something we are working on at Diamond for our parallel 
> high-performance detector systems.
> 
> Cheers,
> Ulrik
> 
> [1]: Direct Chunk Write
> https://support.hdfgroup.org/HDF5/doc/Advanced/DirectChunkWrite/
> 
> 
>> On 27 Jun 2017, at 21:07, Mark S. Engbretson <[email protected]> wrote:
>> 
>> Pete suggested something like that, have the HDF file have a pointer 
>> to an each image raw file, but can not sustain the write rate with 
>> single
> files.
>> Why I was originally trying to tweak this file writer into something 
>> that would acquire all the data, but would then generate reasonable 
>> output after the fact.
>> 
>> 2BM seems to be willing to have a starting point of perhaps one 15 
>> minute acquire per hour, in which the other 45 minutes either 
>> processing the data or getting it off the computer to prepare for 
>> another. Which may just be enough if the raw data is buffered and HDF 
>> write is out at whatever rate that it can keep up at.
>> 
>> I hadn't thought of this raw plugin abstracting a much smaller image 
>> that could be a placeholder in an HDF file. Mostly since someone 
>> would still have to post process both of these files after the fact.
>> No file plugins have implemented the read funcstions yet.
>> 
>> -----Original Message-----
>> From: Mark Rivers [mailto:[email protected]]
>> Sent: Tuesday, June 27, 2017 2:48 PM
>> To: Mark S. Engbretson <[email protected]>; [email protected]
>> Subject: RE: Area Detector and high performance NVME devices
>> 
>> Hi Mark,
>> 
>> I just tried with simDetector on a Windows 7 system with 8 cores, 15K 
>> RPM SAS Raid-0 disk, 96 GB RAM.
>> 
>> 4096 x 3078 Int8 images = 12 MB/image.
>> 
>> The simDetector is generating about 150 frames/s = 1.8 GB/s.
>> 
>> This is the output of camonitor on the ArrayRate_RBV and 
>> WriteFile_RBV PVs in the HDF5 plugin:
>> 
>> corvette:simDetectorIOC/iocBoot/iocSimDetector>camonitor -tc 
>> 13SIM1:HDF1:ArrayRate_RBV 13SIM1:HDF1:WriteFile_RBV 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:21.809029) 0 
>> 13SIM1:HDF1:WriteFile_RBV (2017-06-27 14:15:21.809162) Done 
>> 13SIM1:HDF1:WriteFile_RBV (2017-06-27 14:15:31.451841) Writing STATE 
>> MINOR 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:33.410624) 34 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:34.411782) 54 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:35.410801) 55 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:37.408966) 52 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:38.408085) 54 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:39.407156) 48 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:40.408254) 53 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:41.409449) 55 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:42.410515) 45 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:43.411498) 50 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:44.411601) 47 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:45.410666) 53 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:46.409790) 50 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:47.408919) 54 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:48.407977) 53 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:49.407138) 52 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:50.408088) 60 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:51.409265) 62 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:52.410457) 52 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:53.411547) 59 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:54.411524) 62 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:55.411724) 55 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:56.410814) 60 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:57.409946) 57 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:58.408960) 59 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:15:59.407069) 60 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:00.408095) 42 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:01.409358) 17 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:02.410576) 27 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:03.411677) 25 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:04.413487) 27 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:05.411605) 25 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:06.411612) 24 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:07.409803) 23 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:11.410220) 19 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:12.411320) 18 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:13.412261) 21 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:15.412525) 20 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:16.411553) 22 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:17.410808) 24 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:20.409031) 28 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:21.411129) 25 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:22.411199) 24 
>> 13SIM1:HDF1:WriteFile_RBV (2017-06-27 14:16:22.818068) Done 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:23.412388) 1 
>> 13SIM1:HDF1:ArrayRate_RBV (2017-06-27 14:16:24.413361) 0
>> 
>> So for the first 25 seconds or so it is writing at about 55 frames/s 
>> =
>> 660 MB/s. This is probably filling the Windows file cache. It then 
>> slows down to about 23 frames/s = 280 MB/s, which is probably the 
>> steady state write speed of the disks.
>> 
>> So I agree that it will probably be difficult to write HDF5 files at 
>> the full rate of your camera, which is 190 frame/s = 2.3 GB/s.
>> 
>> One possible solution would be to write your RAW files that can keep 
>> up, and also write HDF5 files of "thumbnail" data that is either 
>> cropped or binned to 512x384 for example. You can then store all the 
>> metadata in the HDF file and the images in the RAW file. Later on you 
>> can either merge these 2 files into a large HDF5 file, or just keep 
>> them
> separate.
>> 
>> Mark
>> 
>> ________________________________
>> From: Mark S. Engbretson [[email protected]]
>> Sent: Tuesday, June 27, 2017 1:51 PM
>> To: Mark Rivers; [email protected]
>> Subject: RE: Area Detector and high performance NVME devices
>> 
>> I do not have enough memory to create a queue large enough to buffer 
>> all the images. 2BM wants to acquire the camera stream for at least
>> 15 minutes and ideally as long as possible. So talking about 2-4 TB.
>> Or larger if the buy huge nvme chips or multiple turbo Z units - the 
>> computer supports having 3 of them.
>> 
>> I have allocated very large buffers, but for a 4096 by 3078 image 
>> being created at 190 FPS, the file plugins would have to be able to 
>> keep
> up . . .
>> and they don't.
>> 
>> Using the SimDetector, I can create such images at ~260 FPS. HDF is 
>> only writing the file out at about 60 FPS, using whatever the 
>> defaults
> settings.
>> NetCFD writes about 30 FPS. A buffer of 4000 in both cases only 
>> lasted for about a minute. The raw file plugin slows the simDetector 
>> acquire rate to about 170-180 FPS, which the plugin can keep up with.
>> The stardardarray plugin by itself also slows simDetecor to about the 
>> same
> thing.
>> 
>> 
>> From: Mark Rivers [mailto:[email protected]]
>> Sent: Tuesday, June 27, 2017 1:06 PM
>> To: 'Mark S. Engbretson' <[email protected]>; [email protected]
>> Subject: RE: Area Detector and high performance NVME devices
>> 
>> Hi Mark,
>> 
>> I am surprised that a raw file plugin is significantly faster than 
>> netCDF or HDF5. I would like to see the tests, and figure out what is 
>> actually slowing them down, i.e. is it CPU bound, waiting for a 
>> semaphore, etc.? Can you post actual benchmark results for the 
>> different plugins, i.e. frames/s and MB/s?
>> 
>> You should not need to do anything special to create a FIFO to buffer 
>> images while the disk is busy. Every areaDetector plugin comes with 
>> such a FIFO, i.e. its input queue. Just increase the QueueSize to be 
>> large enough to buffer all the images you need to store in one 
>> "burst". You can also use the CircularBuffer plugin to do this, but 
>> it should really not be necessary, that is intended more for "triggered"
>> applications where the buffer is emptied when a trigger condition is
> satisfied.
>> 
>> Mark
>> 
>> 
>> From: Mark S. Engbretson [mailto:[email protected]]
>> Sent: Tuesday, June 27, 2017 12:24 PM
>> To: Mark Rivers; [email protected]<mailto:[email protected]>
>> Subject: Area Detector and high performance NVME devices
>> 
>> Mark -
>> 
>> I have the adimec camera which generates data at ~2.5 GB/s. I 
>> recently got my hands on a newer HP 840 with a HP Turbo Z nvme drive 
>> which claims a sustained write speed of 6 GB/S. None of the existing 
>> file plugin see any performance increase when writing to this device
>> - I do not think that any are actually write limited. I have modified 
>> a raw binary file plugin that I obtained from Keenan Lang that easily 
>> sustains the cameras write rate until the device is full.
>> 
>> Problem is - Raw data really doesn't do anyone much good. I was 
>> thinking that perhaps a quick solution to my problem might be to 
>> change this Raw File plugin to look/act like a disk based fifo or 
>> circular buffer. This could collect to the limit of the hardware at 
>> full speed, and if someone wanted HDF output, they would just drain 
>> this queue at the speed that HDF files are generated. Or is there an 
>> easier/better solution? I.e. any way that file plugins can use the 
>> new
> multi-thread model of AD 3.0?
>> 
>> I know the HDF files can be generates at very high speeds on Lustre 
>> file systems, but this seems to be using parallel HDF5. Is this 
>> something that Area Detector supports?
>> 
> 
> ----------------------------------------------
> Ulrik Kofoed Pedersen
> Head of Beamline Controls
> Diamond Light Source Ltd
> Phone: +44 1235 77 8580
> 
> 
> --
> This e-mail and any attachments may contain confidential, copyright 
> and or privileged material, and are for the use of the intended 
> addressee only. If you are not the intended addressee or an authorised 
> recipient of the addressee please notify us of receipt by returning 
> the e-mail and do not use, copy, retain, distribute or disclose the 
> information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the individual 
> and not necessarily of Diamond Light Source Ltd.
> Diamond Light Source Ltd. cannot guarantee that this e-mail or any 
> attachments are free from viruses and we cannot accept liability for 
> any damage which you may sustain as a result of software viruses which 
> may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). Registered in 
> England and Wales with its registered office at Diamond House, Harwell 
> Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United 
> Kingdom
> 

----------------------------------------------
Ulrik Kofoed Pedersen
Head of Beamline Controls
Diamond Light Source Ltd
Phone: +44 1235 77 8580


--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom




References:
Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices Mark Rivers
RE: Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices Mark Rivers
RE: Area Detector and high performance NVME devices Mark S. Engbretson
Re: Area Detector and high performance NVME devices ulrik.pedersen
RE: Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices ulrik.pedersen
RE: Area Detector and high performance NVME devices Mark Rivers
RE: Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices ulrik.pedersen
RE: Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices Mark Rivers
Re: Area Detector and high performance NVME devices ulrik.pedersen
RE: Area Detector and high performance NVME devices Mark S. Engbretson
RE: Area Detector and high performance NVME devices Mark Rivers

Navigate by Date:
Prev: RE: Area Detector and high performance NVME devices Mark Rivers
Next: RE: Area Detector and high performance NVME devices Mark S. Engbretson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <20172018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: Area Detector and high performance NVME devices Mark Rivers
Next: RE: Area Detector and high performance NVME devices Mark S. Engbretson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <20172018  2019  2020  2021  2022  2023  2024