The AgentX Framework.

Introduction.

Increasingly, several computational codes need to be used in close co-operation to address particular scientific problems. There are many examples in the domain of biolog y where, for example, hybrid methods must be employed to study complex systems such a s enzymes. In order to form such workflows, information needs to pass between the app lications in an automated fashion. Unfortunately, this is complicated by the fact tha t most simulation codes work with their own well-established format for data and exch anging information between them often requires converters and wrappers to be written. This is a very error-prone and time-consuming process.

Communities working with a particular collection of simulation codes will develop tools that can be used to analyse or visualise data generated from these codes. Often, the functionality of these tools will be similar to those developed by other communities. Very few computational communities employ shared data standards and this makes it difficult for them to share data analysis tools. Problems exchanging information between simulation codes and analysis tools lead to much duplication of effort that detacts from the original scientific goals.

Often, data generated during experimental processes require some post processing befo re they can be used as input for computational simulations. Information needs to be e xchanged between software used to represent information gathered from the experimenta and other applications used to post process and analyse the data. Users of experimental facilities may wish to use a range of applications from an increasing set available for data processing. The development and maintenance of bespoke converters and wrappers that are required to facilitate this exchange of information is aheavy drain on available resources.

The CCLRC e-Science Centre has developed a framework, called AgentX, which allows the simple and automated exchange of information between components of a scientific workflow. AgentX makes use of some of the latest standards for managing scientific information, adopting technologies from the W3C and semantic web communities. The framework consists of three components: ontology, mappings and a library. The library can be used directly by scientific applications to extract information from data documents generated from any data source. Once an application has been interfaced to AgentX, it is possible for it to extract information without the need to understand the documentformat; the developer does not have to put any more effort into wrappers and converters. AgentX allows different applications to be used together in a plug-and-play fashion, facilitating some of the processes outlined above. The library can be used by applications written in various languages (Python, Perl, Fortran, C, C++) and is portable across a range of platforms.

Work Packages.

1. AgentX core development.

Plug-in architecture.

AgentX can be used with XML data documents and makes use of the libxml2 library for parsing. However, AgentX could be developed to work with any data source. This would be an extremely important development: any application that has been interfaced to AgentX would be able, with no more modification, to extract information from many different data sources, with different underlying data models. For example, quantum chemistry codes like GAMESS-UK, that have already been interfaced to AgentX to work with XML, would be able to extract information from binary files (such as those generated using HDF/ netCDF, NEXUS) and relational databases. AgentX needs to support plug-ins, where each plug-in will manage a different data source (XML, HDF, relational databases). This requires substantial development of the AgentX core. There is a significant opportunity here to work together with the Digital Curation Centre on data access and preservation issues.

Large data sets.

Many simulation codes produce large amounts of data that must be efficiently represented in the output files. XML is advocated as being a good standard for representing data because it is portable, self-describing and there are many existing tools that can be employed to parse the documents. However, it is difficult to use XML for large data sets; marked-up documents become very large and popular methods used for parsing(DOM based) become impractical. The AgentX core needs to be developed so that it implements recent research into efficient index based schemes for handling large XML documents. This will enable applications to efficiently extract information from very large data documents, such as those generated during molecular dynamics simulations or experimental runs.

Linking across data sources.

It is a common requirement to be able to work with data from a range of data sources. For example, large data sets relating to molecular orbital vectors generated from aquantum chemistry simulation could be stored in binary files while the molecular structure may be represented using XML. Currently, RDF provides the best way of linking data sets across different sources. AgentX needs to be able to handle these RDF assertions to enable an application to move between data from different sources in a seamless, automated way.

Rules and inferences.

AgentX can be used to extract information from a data source using a series of logical queries. It is the ontology and mappings that provide AgentX with the information used to evaluate these queries. Currently, all this information has to be provided explicitly, even if some of it can be inferred. The AgentX architecture is being developed to support the long-term incorporation of technologies like inference engines (e.g. racer). These can be used in conjunction with a rule-based system to make relationships explicit. There is a significant opportunity to investigate how AgentX might work alongside rule-based data management systems like iRODS (the successor to SRB).

2. Tool development.

AgentX Toolkit.

The AgentX framework provides a layer of abstraction away from the physical structure of data. This makes it a very powerful layer upon which higher-level tools can be developed. These tools will be more domain specific and of a direct interest to non-application-developer scientists. The continued development and maintenance of new and existing (AXGraph, AXTransform and AXGrep) tools is required. These tools are designed to make the examination and manipulation of data simpler for the scientist. For example, AXGraph makes it possible to produce an SVG graph from simulation code or experimental data, which can then be viewed in a web browser. This could be extended to a range of other visualisation techniques.

3. Testing, quality control and documentation.

It is important that software development best practice is employed to ensure that AgentX is simple to deploy and maintain and can be done so with a good level of confidence. Resources need to be allocated for the development of test frameworks and implementation of configuration, build and installation standards (e.g. GNU auto-tools).

4. User support/ integration.

Supporting existing communities.

The AgentX Framework has been developed in a generic way and is already being used within a number of core e-Science projects (MaterialsGrid, e-Minerals), by developers of computational applications (GAMESS-UK, SIESTA, DL_POLY) and visualisation and analysis tools (CCP1-GUI, AtomEye, Ambrosia) and by end-user scientists; a continued level of support is required for these communities (e.g. bug fixes, new features).

Supporting new communities.

Workshops and training events are required to introduce new communities to AgentX and other associated data management concepts. In addition, support is also required tointerface new applications to AgentX (e.g. CSED simulation codes and analysis tools). This support will be vital in the development work required to interface AgentX to applications used in CCLRC Departments and Large Scale Facilities.

AgentX (last edited 2009-02-11 10:28:15 by RobAllan)

This website maintained by Research Computing Services, University of Manchester