Experimental Physics and
| |||||||||||||||
|
Hi
At Diamond we are
upgrading our archiver hardware, before the EPICS meeting I'd like to open the
discussion on archive server software. The obvious question is:
Is there a
standard EPICS archiver server suitable for everyone? And if not, why
not?
There are a few
servers in the community (apologies to their authors if I get the details
wrong):
Channel Archiver
v1
index?
data in file
system, files contain per-channel blocks of time-sorted data up to 8k
samples long
Channel Archiver
v2
Hashtable + RTree
index
data as
above
CZAR
Index in RDB
data storage ?
compression using arithmetic
encoding
MYA
RDB replacement for CZAR
BEAUTY
Partitioned RDB for index and
data
Hyperarchiver
RDB for configuration
Hypertable for index and data (zlib-style
compression)
I only have experience with Channel Archiver
v2. Looking
at our installation archiving 3GB a day, a great deal of time is spent in random
write operations to the individual data blocks. We buffer for 30 seconds, then
write approximately 30 samples to each of 30,000 channels. Because of the way
the data is stored on disk (channel1[8192], channel2[2048], channel3[1024] etc.)
that's a lot of random writes. The RAID array copes well but the large
write op load reduces read performance to the point that backups and large
data-mining jobs are very slow.
Our typical archiver access pattern is
append-only sample writes, read channel data from a contiguous time range from
up to 100 channels. Assuming
magnetic disks will store our 10TB+ datasets for the next few years, an
ideal database would reduce the number of slow random IO operations and store
data on disk reflecting the typical access pattern. A suitable sort key might be
(channel, timestamp) or (month, channel, timestamp) to partition the data by
date. This is the way that hypertable and
some real-time relational databases get their single-node performance, with an
in-memory table for recently arrived data, a sequential logfile for durability,
periodic sorting and flushing of the in-memory table to disk in a
contiguous write, and period compaction into larger sorted tables. The commercial real-time relational databases KDB
and Vertica use this architecture, if any Oracle or MySQL expert knows if
something similar can be done with them I'd be very interested. There are
alternate storage back-ends for MySQL for data warehousing but they only support
efficient writes in bulk.
At Diamond we plan to run the v2 archiver
with 64 bit index files to allow us to merge our indices into one file, and
benchmark for performance on some updated hardware. I also need to merge the
10TB of data files from a primary and standby archive server used for high
availability, either by combining the data files or adding a merging
reader to the storage library. I would
like to know what other facilities are using and if there are any other current
archiver developments.
James
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
| ||||||||||||||
ANJ, 02 Sep 2010 |
·
Home
·
News
·
About
·
Base
·
Modules
·
Extensions
·
Distributions
·
Download
·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing · |