EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: archive server discussion
From: <[email protected]>
To: <[email protected]>
Date: Mon, 2 Aug 2010 15:59:55 +0100
Hi
 
At Diamond we are upgrading our archiver hardware, before the EPICS meeting I'd like to open the discussion on archive server software. The obvious question is: Is there a standard EPICS archiver server suitable for everyone? And if not, why not?
 
There are a few servers in the community (apologies to their authors if I get the details wrong):
 
Channel Archiver v1
index?
data in file system, files contain per-channel blocks of time-sorted data up to 8k samples long
 
Channel Archiver v2
Hashtable + RTree index
data as above
 
CZAR
Index in RDB
data storage ?
compression using arithmetic encoding
 
MYA
RDB replacement for CZAR
 
BEAUTY
Partitioned RDB for index and data
 
Hyperarchiver
RDB for configuration
Hypertable for index and data (zlib-style compression)
 
I only have experience with Channel Archiver v2. Looking at our installation archiving 3GB a day, a great deal of time is spent in random write operations to the individual data blocks. We buffer for 30 seconds, then write approximately 30 samples to each of 30,000 channels. Because of the way the data is stored on disk (channel1[8192], channel2[2048], channel3[1024] etc.) that's a lot of random writes. The RAID array copes well but the large write op load reduces read performance to the point that backups and large data-mining jobs are very slow.
 
Our typical archiver access pattern is append-only sample writes, read channel data from a contiguous time range from up to 100 channels. Assuming magnetic disks will store our 10TB+ datasets for the next few years, an ideal database would reduce the number of slow random IO operations and store data on disk reflecting the typical access pattern. A suitable sort key might be (channel, timestamp) or (month, channel, timestamp) to partition the data by date. This is the way that hypertable and some real-time relational databases get their single-node performance, with an in-memory table for recently arrived data, a sequential logfile for durability, periodic sorting and flushing of the in-memory table to disk in a contiguous write, and period compaction into larger sorted tables. The commercial real-time relational databases KDB and Vertica use this architecture, if any Oracle or MySQL expert knows if something similar can be done with them I'd be very interested. There are alternate storage back-ends for MySQL for data warehousing but they only support efficient writes in bulk.
 
At Diamond we plan to run the v2 archiver with 64 bit index files to allow us to merge our indices into one file, and benchmark for performance on some updated hardware. I also need to merge the 10TB of data files from a primary and standby archive server used for high availability, either by combining the data files or adding a merging reader to the storage library. I would like to know what other facilities are using and if there are any other current archiver developments.
 
James
 

 

-- 

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 



Replies:
Re: archive server discussion Matt Newville

Navigate by Date:
Prev: Re: monitoring a very large number of channels Ralph Lange
Next: Re: archive server discussion Matt Newville
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: sample gas analysis peter.leicester
Next: Re: archive server discussion Matt Newville
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·