Experimental Physics and
Industrial Control System

1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024	Index	1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
<== Date ==>		<== Thread ==>

Subject:	archive server discussion
From:	<[email protected]>
To:	<[email protected]>
Date:	Mon, 2 Aug 2010 15:59:55 +0100

At Diamond we are upgrading our archiver hardware, before the EPICS meeting I'd like to open the discussion on archive server software. The obvious question is: Is there a standard EPICS archiver server suitable for everyone? And if not, why not?

There are a few servers in the community (apologies to their authors if I get the details wrong):

Channel Archiver v1

index?

data in file system, files contain per-channel blocks of time-sorted data up to 8k samples long

Channel Archiver v2

Hashtable + RTree index

data as above

CZAR

Index in RDB

data storage ?

compression using arithmetic encoding

MYA

RDB replacement for CZAR

BEAUTY

Partitioned RDB for index and data

Hyperarchiver

RDB for configuration

Hypertable for index and data (zlib-style compression)

I only have experience with Channel Archiver v2. Looking at our installation archiving 3GB a day, a great deal of time is spent in random write operations to the individual data blocks. We buffer for 30 seconds, then write approximately 30 samples to each of 30,000 channels. Because of the way the data is stored on disk (channel1[8192], channel2[2048], channel3[1024] etc.) that's a lot of random writes. The RAID array copes well but the large write op load reduces read performance to the point that backups and large data-mining jobs are very slow.

Our typical archiver access pattern is append-only sample writes, read channel data from a contiguous time range from up to 100 channels. Assuming magnetic disks will store our 10TB+ datasets for the next few years, an ideal database would reduce the number of slow random IO operations and store data on disk reflecting the typical access pattern. A suitable sort key might be (channel, timestamp) or (month, channel, timestamp) to partition the data by date. This is the way that hypertable and some real-time relational databases get their single-node performance, with an in-memory table for recently arrived data, a sequential logfile for durability, periodic sorting and flushing of the in-memory table to disk in a contiguous write, and period compaction into larger sorted tables. The commercial real-time relational databases KDB and Vertica use this architecture, if any Oracle or MySQL expert knows if something similar can be done with them I'd be very interested. There are alternate storage back-ends for MySQL for data warehousing but they only support efficient writes in bulk.

At Diamond we plan to run the v2 archiver with 64 bit index files to allow us to merge our indices into one file, and benchmark for performance on some updated hardware. I also need to merge the 10TB of data files from a primary and standby archive server used for high availability, either by combining the data files or adding a merging reader to the storage library. I would like to know what other facilities are using and if there are any other current archiver developments.

James

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Replies:: Re: archive server discussion Matt Newville

Navigate by Date:: Prev: Re: monitoring a very large number of channels Ralph Lange; Next: Re: archive server discussion Matt Newville; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Navigate by Thread:: Prev: sample gas analysis peter.leicester; Next: Re: archive server discussion Matt Newville; Index: 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 <2010> 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024

ANJ, 02 Sep 2010

· Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System