[Xrays@aps.anl.gov] 2006 XSD Scientific Software Workshop User Survey

Bruce Ravel bravel at anl.gov
Fri Jun 16 10:53:01 CDT 2006


> Dear Colleague,
>
> We have been asked by the XSD Division to organize a workshop to
> determine our fundamental needs and opportunities in scientific
> software systems for x-ray data reduction, analysis, modeling and
> simulation.  The workshop has been scheduled for August 29, 2006
> at the Advanced Photon Source.
>
> In order to prepare for this workshop we would like your input on
> what you see as the needs and opportunities for scientific
> software development at the APS and in the X-ray community, as
> well as information that would support making a funding proposal
> for such resources.  In particular:
>
> 1. What are the limitations of current tools for
>    x-ray data reduction, analysis, modeling, and simulation?
>
> 2. What additional tools are needed?
>
> 3. How can the existing tools be improved?
>
> 4. What will most affect the scientific impact of your work?


I think I come at this survey from a different perspective than many
of your respondents.  Many of my software needs are met by my own work
and by my collaboration with one of the members of your committee.
Given that background, the most serious shortcomings I see in the
software landscape are infrastructural rather than end product.  I see
three big areas that need attention.  I'll illustrate these by example
from my own experience.

Fundamental data
================
  One of my software efforts is a little tool based on the periodic
  table and on tables of x-ray absorption coefficients.  Hephaestus
  (most of my programs have names that come from a minor obsession
  with mythology) tries to provide many of the small but useful
  calculations that one needs while preparing for an XAS experiment or
  while sitting at the beam-line.  It does the additional interesting
  thing of incorporating five different tabulations of absorption
  coefficients, thus allowing the user to compare and contrast.

  When I started writing Hephaestus, these various tabulations existed
  in a variety of inconvenient forms.  One had been transcribed from a
  paper publication into a Fortran program.  Three others were hidden
  behind web interfaces, each using a collection of data files, one
  per element, as the repository of the information.  The most
  recently published was an oddly formatted flat text file available
  as the supplementary information from the journal in which it was
  published.  What a mess!

  I gathered these various resources together and wrote a bunch of
  one-off scripts to convert all of the data into a database.  I then
  encapsulated these data behind a bit of OO code written in the same
  language as Hephaestus.  That served my purpose very well -- I was
  able to access data from any resource with a common interface.  My
  solution, however, serves other developers very poorly -- unless
  they happen to choose the same programming language I use.

  The APS could serve its users very well by identifying information
  fundamental to the various experimental disciplines, encapsulating
  that data into a package using a low-level language, and providing a
  nice wrapping mechanism such that the information can be deployed in
  any context and using any programming language.  The model I have in
  mind is a database + a wrapper written in C.  SWIG could then be
  used with almost any language.


APIs to theory calculations
===========================

  Theory is great!  Without it, we would be stumped by most of what we
  measure at the APS.  As great as theory is, it is all-too-often
  written by brilliant scientists with no training in computer
  science.  The consequence is that many theory packages are clunky,
  monolithic black boxes that are difficult to deploy in context that
  their developers did not think of.

  My main experience is with Feff, a popular, first-principles,
  absorption spectroscopy program.  When version 5 appeared back in
  the early 90s, it was a hit and fueled a rapid development of the
  XAS technique.  Sadly, Feff remains stuck in the early 90s from a
  user interface perspective.  Information goes into Feff via a flat
  text input file and information comes out of Feff via inconsistently
  formatted flat text files.  Subsequent software developed to use
  Feff as its theory engine must deal with a clumsy and fragile system
  of file-based IO and system calls.

  Feff would be much more powerful with some kind of API.  One part of
  the code uses previously-computed atomic potentials to calculate the
  contribution from a multiple scattering geometry given a set of
  atomic coordinates.  As a part of data analysis package, it would be
  attractive to be able to move atom positions and recompute the
  contribution on the fly.  Currently, such a thing requires file IO
  and system calls.  The capacity to do this via an API would allow
  the development of different, interesting analysis software.

  I fully understand that Feff is a poor example of something that the
  APS could work on.  Feff is the research project of a particular
  university research group and is encumbered with a restrictive
  license.  The point of this example was to illustrate a way in which
  a dedicated synchrotron software group could serve its constituency
  by making extablished theory available for novel uses.
  


Documentation of file formats
=============================

  When I collect imaging data from 2ID-D, I come home with maps in the
  mda file format.  Fortunately, I get to use Stefan Vogt's MAPS to
  import these files and to export individual pixel spectra in a
  easier-to-use format.

  But ... what's an mda file?  Perhaps its contents are documented
  somewhere, but that place isn't easy to find.  Perhaps I don't
  google cleverly enough, but I was never able to find any
  documentation on its contents.  It may have been helpful to me to be
  able to examine the data independently of MAPS, but I never had that
  option.  I am not opposed to big, binary blobs in principle.  But
  without documentation and a clean API, they are not particularly
  useful.

  Good documentation of file formats used on the floor and a central,
  easy-to-locate respository of information would be very useful
  indeed.






There is a theme to this long-winded email.  None of my concerns are
about end-user software products.  Each is about the things that are
used to build end-user software products.  The APS draws lots of
clever, motivated people to its doors.  Were those clever, motivated
people given nice tools, they could build great things.

Hope that helps,
B


-- 
 Bruce Ravel  ---------------------------------------------- bravel at anl.gov

 Molecular Environmental Science Group, Building 203, Room E-165
 MRCAT, Sector 10, Advanced Photon Source, Building 433, Room B007

 Argonne National Laboratory         phone and voice mail: (1) 630 252 5033
 Argonne IL 60439, USA                                fax: (1) 630 252 9793

 My homepage:    http://cars9.uchicago.edu/~ravel 
 EXAFS software: http://cars9.uchicago.edu/~ravel/software/exafs/





More information about the XRAYS mailing list