[Xrays@aps.anl.gov] 2006 XSD Scientific Software Workshop User Survey
Bruce Ravel
bravel at anl.gov
Fri Jun 16 10:53:01 CDT 2006
> Dear Colleague,
>
> We have been asked by the XSD Division to organize a workshop to
> determine our fundamental needs and opportunities in scientific
> software systems for x-ray data reduction, analysis, modeling and
> simulation. The workshop has been scheduled for August 29, 2006
> at the Advanced Photon Source.
>
> In order to prepare for this workshop we would like your input on
> what you see as the needs and opportunities for scientific
> software development at the APS and in the X-ray community, as
> well as information that would support making a funding proposal
> for such resources. In particular:
>
> 1. What are the limitations of current tools for
> x-ray data reduction, analysis, modeling, and simulation?
>
> 2. What additional tools are needed?
>
> 3. How can the existing tools be improved?
>
> 4. What will most affect the scientific impact of your work?
I think I come at this survey from a different perspective than many
of your respondents. Many of my software needs are met by my own work
and by my collaboration with one of the members of your committee.
Given that background, the most serious shortcomings I see in the
software landscape are infrastructural rather than end product. I see
three big areas that need attention. I'll illustrate these by example
from my own experience.
Fundamental data
================
One of my software efforts is a little tool based on the periodic
table and on tables of x-ray absorption coefficients. Hephaestus
(most of my programs have names that come from a minor obsession
with mythology) tries to provide many of the small but useful
calculations that one needs while preparing for an XAS experiment or
while sitting at the beam-line. It does the additional interesting
thing of incorporating five different tabulations of absorption
coefficients, thus allowing the user to compare and contrast.
When I started writing Hephaestus, these various tabulations existed
in a variety of inconvenient forms. One had been transcribed from a
paper publication into a Fortran program. Three others were hidden
behind web interfaces, each using a collection of data files, one
per element, as the repository of the information. The most
recently published was an oddly formatted flat text file available
as the supplementary information from the journal in which it was
published. What a mess!
I gathered these various resources together and wrote a bunch of
one-off scripts to convert all of the data into a database. I then
encapsulated these data behind a bit of OO code written in the same
language as Hephaestus. That served my purpose very well -- I was
able to access data from any resource with a common interface. My
solution, however, serves other developers very poorly -- unless
they happen to choose the same programming language I use.
The APS could serve its users very well by identifying information
fundamental to the various experimental disciplines, encapsulating
that data into a package using a low-level language, and providing a
nice wrapping mechanism such that the information can be deployed in
any context and using any programming language. The model I have in
mind is a database + a wrapper written in C. SWIG could then be
used with almost any language.
APIs to theory calculations
===========================
Theory is great! Without it, we would be stumped by most of what we
measure at the APS. As great as theory is, it is all-too-often
written by brilliant scientists with no training in computer
science. The consequence is that many theory packages are clunky,
monolithic black boxes that are difficult to deploy in context that
their developers did not think of.
My main experience is with Feff, a popular, first-principles,
absorption spectroscopy program. When version 5 appeared back in
the early 90s, it was a hit and fueled a rapid development of the
XAS technique. Sadly, Feff remains stuck in the early 90s from a
user interface perspective. Information goes into Feff via a flat
text input file and information comes out of Feff via inconsistently
formatted flat text files. Subsequent software developed to use
Feff as its theory engine must deal with a clumsy and fragile system
of file-based IO and system calls.
Feff would be much more powerful with some kind of API. One part of
the code uses previously-computed atomic potentials to calculate the
contribution from a multiple scattering geometry given a set of
atomic coordinates. As a part of data analysis package, it would be
attractive to be able to move atom positions and recompute the
contribution on the fly. Currently, such a thing requires file IO
and system calls. The capacity to do this via an API would allow
the development of different, interesting analysis software.
I fully understand that Feff is a poor example of something that the
APS could work on. Feff is the research project of a particular
university research group and is encumbered with a restrictive
license. The point of this example was to illustrate a way in which
a dedicated synchrotron software group could serve its constituency
by making extablished theory available for novel uses.
Documentation of file formats
=============================
When I collect imaging data from 2ID-D, I come home with maps in the
mda file format. Fortunately, I get to use Stefan Vogt's MAPS to
import these files and to export individual pixel spectra in a
easier-to-use format.
But ... what's an mda file? Perhaps its contents are documented
somewhere, but that place isn't easy to find. Perhaps I don't
google cleverly enough, but I was never able to find any
documentation on its contents. It may have been helpful to me to be
able to examine the data independently of MAPS, but I never had that
option. I am not opposed to big, binary blobs in principle. But
without documentation and a clean API, they are not particularly
useful.
Good documentation of file formats used on the floor and a central,
easy-to-locate respository of information would be very useful
indeed.
There is a theme to this long-winded email. None of my concerns are
about end-user software products. Each is about the things that are
used to build end-user software products. The APS draws lots of
clever, motivated people to its doors. Were those clever, motivated
people given nice tools, they could build great things.
Hope that helps,
B
--
Bruce Ravel ---------------------------------------------- bravel at anl.gov
Molecular Environmental Science Group, Building 203, Room E-165
MRCAT, Sector 10, Advanced Photon Source, Building 433, Room B007
Argonne National Laboratory phone and voice mail: (1) 630 252 5033
Argonne IL 60439, USA fax: (1) 630 252 9793
My homepage: http://cars9.uchicago.edu/~ravel
EXAFS software: http://cars9.uchicago.edu/~ravel/software/exafs/
More information about the XRAYS
mailing list