A Record Run for the APS X-ray Source
FEBRUARY 23, 2012
X-ray beams and their many useful properties bring thousands of scientists each year to synchrotron light source facilities such as the U.S. Department of Energy Office of Science’s Advanced Photon Source (APS) at Argonne National Laboratory to carry out world-class experiments.
Researchers have much invested in their visits to these facilities. They can travel significant distances at some expense, and with much on the line in terms of scientific objectives and professional goals.
APS management and the three APS divisions—the Accelerator Systems Division (ASD), the APS Engineering Division (AES), and the X-ray Science Division (XSD)—do everything possible to assure that x-rays are available to users as promised.
So there were many smiling faces around the APS when the recently completed user-beam run 2011-3 set a new APS record for machine availability and reliability.
During this run the APS delivered to users 99.6% of the scheduled 1,545 hours of x-rays. That x-ray availability was nearly 0.5% better than the previous best (more than a factor of two reduction in the unavailable time). During the 1,545 hours there were 7 faults (unavailability of x-ray beam), which is 36% better than the previous fewest number. The mean time between those faults was 219.7 hours, an improvement of almost 100 hours over the previous record. And the longest time without a fault was 606 hours, 44 hours better than the longest previous uninterrupted period. (For a complete explanation of these terms, see the end of this article.)
“Our microcrystallography beamlines are run as a high-throughput research program that allows many users to study a large number of proteins in a very short time,” said Bob Fischetti, Associate Director of the General Medicine and Cancer Institutes Collaborative Access Team, one of the facilities at the APS that specializes in the study of biological proteins. “Our users generally get only one day’s worth of beam time, so if there is a fault and they lose even one hour, that’s a significant fraction of their allotted time. Having a month where there are no faults at all, which occurred in this last run, is really fantastic for our users.”
A reliable source of x-rays is also important for future researchers and for dynamic studies. “There are groups that bring students here to work on their thesis experiment,” said Denis Keane, Director of the DuPont-Northwestern–Dow Collaborative Access Team at the APS, where users study materials and solve industrial-related problems. “A student may be scheduled for only three or four hours of beam and they have a very tight schedule. Any kind of minor disturbance in the scheduled beam time means a student won’t get their data until they come back perhaps six months later, which can have serious implications for their doctoral program.
“In cases where researchers are studying materials transformations or processing,” Keane said, “if the beam goes down during an irreversible change of the material or one that’s difficult to reproduce, that represents a loss of data that will be very difficult to re-collect.”
The APS has from the earliest days of operations maintained an enviable rate of x-ray availability.
User-beam-run statistics kept by the Accelerator Operations and Physics Group in ASD show that since the fourth run of fiscal year 1994—when the APS began routinely scheduling more than 1,000 hours of x-rays each run for users—x-ray availability has averaged 97.2%, making the APS the standard-bearer for synchrotron light source performance world-wide.
That performance has been achieved with many thousands of pieces of sophisticated technology working in harmony.
Machines like the APS that accelerate electrons to nearly the speed of light are among the most technologically complex machines in existence. For instance, at last count the APS accelerator system (linac, particle accumulator ring, booster, and storage ring) included:
• More than 2,000 conventional electromagnets and 16 pulsed electromagnets;
• More than 700 beam-position monitors, 600 corrector magnets, and 80 computer systems to monitor and correct the orbit of the electrons and steer x-ray beams onto experiment samples to micro-tolerances;
• More than 120 computers monitoring the more than 25,000 signals that comprise radiation interlock systems protecting personnel and equipment;
• Beam diagnostics controlling multiple x-ray beams simultaneously while utilizing more than 500 ultrahigh-resolution beam-position monitors, each resolving beam motion that is a fraction of the size of the period at the end of a sentence, while nearly 100 remote computers collect data from the 500 monitors and re-steer x-rays 1,500 times per second;
• A beam control system that comprises 80 workstations, 227 distributed input/output computers (IOCs), more than 7,000 replaceable hardware components, and more than 100,000 IOC points monitoring or controlling more than 450,000 technical parameters.
“In one way or another, everyone at the APS contributes to our successful x-ray delivery program,” said Sasha Zholents, Director of ASD, the division that has primary responsibility for APS accelerator development. “Excellent technical design based on a sound understanding of accelerator physics, proactive preventive maintenance and quality assurance programs, and best-practice reliability engineering all help us keep machine operations at a record level of reliability.”
There are many groups in the APS organization that have direct responsibilities for one or more elements of the accelerator and beamline systems. All fulfill important roles in keeping the x-ray beams coming. In a short article it is not possible to mention all these roles, but a few examples follow.
Randy Flood of ASD is the Chief of Operations and supervisor of the APS Main Control Room (MCR), where accelerator operators and physicists monitor all accelerator systems and components during runs.
“MCR operators are extensively trained and periodically re-certified to minimize recovery times in a wide variety of potential beam-loss scenarios,” said Flood. “Our operators’ ability to quickly analyze and correct issues helps to mitigate faults when they do occur.
“The two major contributors to downtime every run are the radio-frequency [rf] and power supply systems,” Flood said, “because they have more equipment, and more complex equipment, that directly affects accelerator operations. The ASD Power Systems Group under Ju Wang and ASD Radio Frequency Group under Ali Nassiri have been doing intensive preventive maintenance for the last two or three years. This was the first run where we saw a significant payback. In fact, all of the technical groups have implemented preventive maintenance programs.”
“We instituted a proactive approach at the rf system and component level to predict potential problem areas ahead of time,” said Doug Horan, Nassiri’s deputy, “so that during the runs we minimize faults. For instance, each rf area — linac, particle accumulator ring, booster, and storage ring system — has its own experienced system engineer who is responsible for looking after all operational aspects of that system. The system engineers monitor and track the performance of rf systems routinely and are able to address problem areas proactively. Our rf technicians carry out twice-daily walkthroughs during runs so that they can let the system engineer know when something is not functioning correctly. The group chief technician holds regular weekly meetings with the technicians to review work assignments and to discuss potential problems. System engineers work closely with the chief tech to address problems before they become a headache for operations during the run.”
“A good example is the high-voltage capacitors that are used within the booster and storage ring high-voltage power systems,” said Gian Trento, also of the RF Group, who oversees the operation and maintenance of these systems. “A failure of one would trigger a cascade of failures throughout the system, so we immediately went into a program of upgrading all our capacitors.”
“From day one we have made sure that our power supply designs have been very sound,” said Ju Wang. “The extremely high reliability we enjoy now is also a result of years of continuous improvement and upgrade of our power supply equipment. The technicians who build and repair the power supplies are required to do quality work. And we follow the IPC-Association Connecting Electronics Industries standards for electronic work.
“Power supplies are like a consumable good: eventually, they will fail. We don’t wait for a failure. Our proactive preventive maintenance program lets us identify weak components early so that we can repair or replace them before they cause a problem. At the start of each run we do a very thorough power supply test and stress the power supplies, running them harder than they normally would during operations, trying to see if a component is operating outside parameters. When we find one that does, we replace it.
“We track power supply faults in a database—what the problem was, what kind of repairs were done—so that we can understand why a fault happened,” Wang said, “and whether it was a design weakness or a part that failed or a component that wore out after a number of years.
“Finally, our quality assurance program defines a QA process for each scenario—new components and repair of existing components—who is responsible for the process, and who has the authority for the process. Our goal is provide the best power supply we can.”
George Goeppner is Group Leader of the Mechanical Operations and Maintenance Group (or MOM) in AES with responsibility for two major accelerator systems: vacuum and water. “MOM has the broadest responsibility for accelerator components,” Goeppner said. “Most of the work we do to assure reliable operations cuts across all four accelerators.
“Our group does everything from leak-checking the storage ring vacuum system to looking for radiation damage to cable insulation in the storage ring. We do routine inspections of the accelerator magnets and we upgrade the beamline and front-end pneumatics.
“We try to predict where there might be a component failure, and we have backups for most of the systems,” Goeppner said. “We’re seeing results from aggressive maintenance during the periods when the accelerator is between runs, and from being vigilant and proactive during operations periods.”
“One of our group’s role is that of monitoring other things,” said Richard Farnsworth, leader of the Controls Group in AES. “We’ve done failure effects modes cause analysis, because if you can predict what’s likely to fail, then you can direct your preventive maintenance toward those components and increase your statistical chance of not having a failure.
“What we aim for is 100% reliability 24 by 7, 365 days per year on all of our [controls] systems. The reliability bathtub curve, well known to reliability engineers, says that at the beginning of any process or system, you get early failures.
“We try to make sure that there are no untested pieces of software when we do upgrades at the end of every shutdown,” Farnsworth said, “and we do significant amounts of preventive maintenance with our hardware. As far as the work that preceded this particular run, we did exactly what we always do.”
Said John Grimmer, manager of APS insertion device operations in the ASD Mechanical Devices Group, “For many years, APS insertion devices have performed flawlessly, and have never been a significant factor that adversely impacts x-ray availability. This reflects not only the fault-free performance of APS insertion devices in relation to the electron beam in the storage ring, but also the 99.6% uptime record in delivery of x-ray beams to the APS insertion device beamlines. Obviously, run 2011-3 has continued this well-established tradition of performance.”
APS Director Brian Stephenson offered his congratulations to all. “As a long-time user of the APS, I have always been impressed and gratified by the ability of APS personnel to deliver x-rays as promised. Now, looking at it also from the management perspective, I can see exactly how much effort and ingenuity are required. I know all of us aim to maintain the highest possible level of performance for our x-ray source.”
Reliability and Availability
Storage Ring Reliability: A measure of the mean time between beam losses (faults), or MTBF. MTBF is calculated by taking the delivered beam and dividing by the total number of faults. The APS targets, and routinely exceeds, 70 hours MTBF. A fault is defined as complete unavailability of beam either via beam loss or removal of shutter permit (i.e., x-ray shutters, which allow the x-rays to go to the experiments, are not permitted to be opened). A fault also occurs when beam has decayed to the point where stability and orbit can no longer be considered reliable. At the APS, this threshold is 50 mA, half the nominal operating value.
X-ray Availability: The number of hours that the beam is available to the users divided by the number of hours of scheduled beam delivery prior to the beginning of a run. The specific definition of available beam is that the APS Main Control Room has granted permission to the users to open their shutters, and there is more than 50-mA stored beam in the storage ring.