Workflows

According to the BMBF call, the goal is the establishment of digital workflows in the sense of a decentralized data or simulation concept by active agents within the software environment of the innovation platform. The sustainability of the software solutions and the uniform access to data and tools are decisive. The platform will therefore provide two workflow environments called "pyiron" and "SimStack" right from the start. Beneficiaries are expected to design their solutions in such a way that they can be made executable within one of these workflow environments.

What is our point of view on workflows?

A workflow is a chain of well-documented process steps to create or handle data for a specific problem in order to deliver a particular set of outputs. Within PMD we provide the environment for a digitalization of these workflows (i.e., making each individual, often still manual step of this chain accessible, interpretable, and storable by the machine). Advantages of using this workflow environment include:

  • Providing engineers, data scientists, etc. a user-friendly interface to a large variety of tools
  • Enabling non-experts the usage of standardized computational procedures that are based on complex connections of individual software tools
  • Capture of complex individual computational workflows for documentation and distribution (e.g. for a paper, IP application, collaboration, etc.)
  • Automated deposition of final results as well as of all relevant intermediate steps (e.g. in database systems, repositories, ...)
  • Integration and easy access to HPC resources
  • Connection to community-wide semantics and knowledge graphs (ontologies) due to the description of input/output of individual tools within a workflow chain

Currently, the PMD supports two different workflow environments that are Pyiron and SimStack.

Levels of workflow implementations within the PMD

Within the PMD we distinguish between four levels (A, B, C, D) of workflow implementation. Within a project these levels will be exploited step-by-step, with various levels of implementation effort, workflow control and user support. It is, however, possible to combine different levels of implementation for the different steps of a single workflow.

A: Script based job

The user provides a script job for the individual task of a workflow with well defined input and output parameters for the individual steps. Input parameters can be passed into the script as a result from other computations or the outputs can be processed in a subsequent computation step. The parameters and the script are stored and documented to ensure reproducibility of the workflow step and to avoid the recomputation of previously computed results. Here, the file formats used by the script for input and output do not need to be identical with the file format used for example by pyiron for storage.

B: Predefined, but extendible workflow class / workflow components

A predefined job type for a simulation tool can be created and integrated into the workflow system, e.g. either pyiron or Simstack. In pyiron, this class defines and handles the import/export as well as the storage of input/output as well as the serialization of the job attributes for communication with HPC. In Simstack, this is accomplished by the WaNo. In this way, well defined problems with a subset of parameters (compared to the full functionality of the tools) can be executed as step of a workflow. The advantage of this approach is that users who are not familiar with a specific software tool do not have to learn attributes that are not essential for the present workflow, as they are provided by a simplified, readable, standard pyiron interface or structured xml format (with a well-documented and easy-to-learn terminology).

At the same time they can easily be extended to new job types, which enable additional or redefined functionality. In this way advanced users can employ the environment to develop workflows, having the same flexibility as provided by scripts. The benefit of using the environment for this purpose is the integration of various analysis, visualization and simulation tools, which can be used for each intermediate step of the workflow as well as the automated execution of transferal and execution on remote compute ressources.

The combination and exchange of different simulation tools in metajobs, the handling of interactive jobs and the loop over a large set of cases are straight-forward at this implementation level.

C: Graphical interface for predefined workflows

Once a workflow is established, especially less involved user do not want to bother with command lines and their execution. To this end a graphical user interface can be provided that generates the output based on an often limited, predefined set of input parameters.

D: Interoperable workflows (ensuring community standards)

The usage of generic input and output parameters for predefined classes allows a description of (part of) a workflow in a notation that is generic, i.e., independent on the specific software tool. Generic parameters are also key to enable interopability between software tools. Such a standard exists in pyiron for atomistic simulations (ASE), but needs to be implemented by domain experts for other communities. For example, the VMAP standard can be used for FEM simulations.

What kind of connection to ontology are we aiming at?

The interplay of workflows and ontologies within the PMD has different facets:

  • A workflow can be used to read data from an ontology-based data store as input, modify them according to the process chain and feed the output back into the ontology store.
  • The information of a workflow developed within the environment (i.e. the exploration of dependencies that were not known before) can be automatically exported into a materials knowledge-graph.
  • If the functionality of a tool (including the input and the output) is described in terms of an ontology, it can be integrated into a workflow environment without a need for tool specific parsers.
  • If the description of a workflow (including the input and the output of a simulation module) is generically described (e.g. in terms of a standardized ontology), a tool independent formulation is achieved. Thus individual tools within a complex tool chain can be easily replaced.

Members of working group workflows:

Muhammad Hassani
Tilmann Hickel
Celso Rego
Jörg Schaarschmidt
Jörg Unger
Philipp von Hatrott

Contact us:

Discourse_logo
Forum
Contact Form

Workflow Frameworks

...

Pyiron - a Python-based integrated development environment for computer-aided materials science

A Python-based framework called pyiron will be developed to coordinate method development in computer-aided materials science and to integrate existing methods into a common platform. It provides all necessary tools to interactively execute complex simulation protocols combining different computer codes and performing millions of separate calculations on powerful computer clusters. At the same time, pyiron allows to interactively develop, implement and test these simulation protocols similar to an integrated development environment (IDE). By integrating structured and unstructured data, metadata and workflows within the same platform, they are automatically stored in an efficient hierarchical database. Thus, the complete material science expertise of both developers and users is preserved and made accessible in a standardized ontology.

The basic idea behind this framework is to provide a single tool with a uniform interface for various simulation codes as well as analysis and visualization tools. The availability of this IDE allows the user to focus on science rather than on technical details such as input/output formats of codes and tools.


Further information about Pyiron can be found here:

...

SimStack workflow environment

A central difficulty in integrating material simulations into the product design cycle is the need to integrate simulation workflows that are tailored to each application and usually consist of several modules. In addition, the execution of available patchwork solutions requires specialized know-how both in the methodology and in the operation of mainframes.

The workflow environment SimStack enables the efficient design and adaptation of complex workflows ("rapid prototyping") with software modules of different providers via drag-and-drop, whereby only settings relevant for the respective use case are exposed. Together with the automated execution of workflows on mainframes, this minimizes the complexity for the end user and the required expertise. This enables the transfer of complex, scientific multi-scale methods to industry.


Further information about SimStack is given here: