Reimann, Lars: Safe-DS: A Toolkit for Safe Development of Data Science Pipelines. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-83337
@phdthesis{handle:20.500.11811/13159,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-83337,
doi: https://doi.org/10.48565/bonndoc-584,
author = {{Lars Reimann}},
title = {Safe-DS: A Toolkit for Safe Development of Data Science Pipelines},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = jun,

note = {This work introduces the Safe-DS stack for developing data science (DS) pipelines:
The Safe-DS Python library offers commonly needed DS operations behind a simple, integrated, and consistent API. It is implemented on top of state-of-the-art DS libraries using the adapter design pattern. Usage counts of API elements and documentation are used to infer suitable API transformations for the original libraries. Further changes are applied in a custom GUI and adapter code is generated automatically. We call this transformation process "adaptoring".
The Safe-DS language consists of a simple pipeline language to write DS pipelines, and a stub language to safely integrate Python libraries. The Safe-DS library is included by default. We statically catch type errors, boundary errors (number outside the legal interval), and state errors (inferring with an untrained ML model). Schema errors (e.g. accessing a non-existent column) are detected by running minimal code.
Our execution system for Safe-DS pipelines is correct (runs all required code in the right order), and minimal (runs only the required code). It derives dependencies between individual operations, like calls. Call results are cached between runs to avoid expensive recomputation.
The Safe-DS IDE offers comprehensive support for the Safe-DS language, like code-completion. A graphical view complements the textual pipeline language: Here, operations are displayed as nodes of a graph, and data flow as its edges. Results of pipelines runs are presented in dedicated views. For example, tables are opened in a custom GUI, which displays data, statistics, quality checks, and plots.
Our evaluation indicates that the Safe-DS stack is considerably more usable, lets DS novices get a lot more work done, and greatly accelerates development compared to a traditional Python stack.
},

url = {https://hdl.handle.net/20.500.11811/13159}
}

The following license files are associated with this item:

InCopyright