Scalable Software Services for Life Science

Standards for Data Exchange and Management

About

We aim to develop and push standards for handling both storage and exchange of the ever-increasing amount of simulation data in life science. This will include XML representations of input/output files formats for simulation data, standardized compressed file formats for simulation data, and high-level APIs.

We currently conduct:

  • Analysis of requirements on data storage and exchange formats in life science
  • Try to formulate file format standards for complex molecular data and job description
  • Develop an open application programming interfaces (APIs) for molecular modelling
  • Setting up automatic procedures for setting-up and analysis of MD simulations
  • Develop a database structure for data storage trajectories

Work in progress

  • Data Storage System
    • Molecular dynamics trajectories constitute one of the most challenging problems in terms of data storage and transmission in bioinformatics. Scalalife project is working towards database infrastructure for bio-molecular simulations. To facilitate the searches on stored trajectories, and the design and storage of analysis data, a second level database is discussed and proposed in the Competence Center.

  • Standard File Format for Molecular Dynamics
    • The definition of data file formats for Molecular Dynamics trajectories will help us to rationalize simulation results and secondary data analysis. The Competence Center is our channel to connect community, developers and Scalalife project, where you will find the current status of the work, from preliminary discussions to future XML schema specifications.

  • UMM File Format
    • The UMM (Unified Molecular Modeling) file format specifies descriptions of molecular systems and associated simulation procedures.The format uses a simplified XML-like syntax that aims to be human readable. The file is processed by a converter script that generates a standards-compliant XML file that will be used as actual input. The latter is more ʻcomputer-readableʼ and can be easily processed by standard XML libraries.

  • TNG File Format for binary trajectory data

    • The TNG (Trajectory Next Generation) file format is a container-type format that supports stoarge of different types of payload and different levels of compression including temporal one. It is intended to be used as the default standard for storage of molecular simulations data.

  • MDWeb: The Automatic Input Generator
    • The web portal MDWeb provides a friendly environment to setup new systems, run test simulations and perform analysis within a guided interface. MDWeb currently supports ScalaLife Molecular Dynamics applications: Gromacs and AMBER. The platform can also prepare and launch MD using the Amber and NAMD. It provides users with a personal workspace where intermediate data, trajectories and analysis results can be stored. Registration is free but necessary to maintain a permanent workspace. The primary entry is a structure (uploaded or obtained from PDB) for setup or a trajectory for analysis. The input structure or trajectory acts as the root of a tree to those new sets of data are added according to the operations performed. Results of trajectory analysis are presented through alphanumerical values, 2D plots or Jmol based 3D visualizations, as appropriate.