According to the BMBF call, the goal is the establishment of digital workflows in the sense of a decentralized data or simulation concept by active agents within the software environment of the innovation platform. The sustainability of the software solutions and the uniform access to data and tools are decisive. The platform will therefore provide two workflow environments called "pyiron" and "SimStack" right from the start. Beneficiaries are expected to design their solutions in such a way that they can be made executable within one of these workflow environments.
A workflow is a chain of well-documented process steps to create or handle data for a specific problem in order to deliver a particular set of outputs. Within PMD we provide the environment for a digitalization of these workflows (i.e., making each individual, often still manual step of this chain accessible, interpretable, and storable by the machine). Advantages of using this workflow environment include:
Currently, the PMD supports two different workflow environments that are Pyiron and SimStack.
Within the PMD we distinguish between four levels (A, B, C, D) of workflow implementation. Within a project these levels will be exploited step-by-step, with various levels of implementation effort, workflow control and user support. It is, however, possible to combine different levels of implementation for the different steps of a single workflow.
The user provides a script job for the individual task of a workflow with well defined input and output parameters for the individual steps. Input parameters can be passed into the script as a result from other computations or the outputs can be processed in a subsequent computation step. The parameters and the script are stored and documented to ensure reproducibility of the workflow step and to avoid the recomputation of previously computed results. Here, the file formats used by the script for input and output do not need to be identical with the file format used for example by pyiron for storage.
A predefined job type for a simulation tool can be created and integrated into the workflow system, e.g. either pyiron or Simstack. In pyiron, this class defines and handles the import/export as well as the storage of input/output as well as the serialization of the job attributes for communication with HPC. In Simstack, this is accomplished by the WaNo. In this way, well defined problems with a subset of parameters (compared to the full functionality of the tools) can be executed as step of a workflow. The advantage of this approach is that users who are not familiar with a specific software tool do not have to learn attributes that are not essential for the present workflow, as they are provided by a simplified, readable, standard pyiron interface or structured xml format (with a well-documented and easy-to-learn terminology).
At the same time they can easily be extended to new job types, which enable additional or redefined functionality. In this way advanced users can employ the environment to develop workflows, having the same flexibility as provided by scripts. The benefit of using the environment for this purpose is the integration of various analysis, visualization and simulation tools, which can be used for each intermediate step of the workflow as well as the automated execution of transferal and execution on remote compute ressources.
The combination and exchange of different simulation tools in metajobs, the handling of interactive jobs and the loop over a large set of cases are straight-forward at this implementation level.
Once a workflow is established, especially less involved user do not want to bother with command lines and their execution. To this end a graphical user interface can be provided that generates the output based on an often limited, predefined set of input parameters.
The usage of generic input and output parameters for predefined classes allows a description of (part of) a workflow in a notation that is generic, i.e., independent on the specific software tool. Generic parameters are also key to enable interopability between software tools. Such a standard exists in pyiron for atomistic simulations (ASE), but needs to be implemented by domain experts for other communities. For example, the VMAP standard can be used for FEM simulations.
The interplay of workflows and ontologies within the PMD has different facets:
A Python-based framework called pyiron will be developed to coordinate method development in computer-aided materials science and to integrate existing methods into a common platform. It provides all necessary tools to interactively execute complex simulation protocols combining different computer codes and performing millions of separate calculations on powerful computer clusters. At the same time, pyiron allows to interactively develop, implement and test these simulation protocols similar to an integrated development environment (IDE). By integrating structured and unstructured data, metadata and workflows within the same platform, they are automatically stored in an efficient hierarchical database. Thus, the complete material science expertise of both developers and users is preserved and made accessible in a standardized ontology.
The basic idea behind this framework is to provide a single tool with a uniform interface for various simulation codes as well as analysis and visualization tools. The availability of this IDE allows the user to focus on science rather than on technical details such as input/output formats of codes and tools.
A central difficulty in integrating material simulations into the product design cycle is the need to integrate simulation workflows that are tailored to each application and usually consist of several modules. In addition, the execution of available patchwork solutions requires specialized know-how both in the methodology and in the operation of mainframes.
The workflow environment SimStack enables the efficient design and adaptation of complex workflows ("rapid prototyping") with software modules of different providers via drag-and-drop, whereby only settings relevant for the respective use case are exposed. Together with the automated execution of workflows on mainframes, this minimizes the complexity for the end user and the required expertise. This enables the transfer of complex, scientific multi-scale methods to industry.