Safe-DS: A Toolkit for Safe Development of Data Science Pipelines
Safe-DS: A Toolkit for Safe Development of Data Science Pipelines

dc.contributor.advisor | Lehmann, Jens | |
dc.contributor.author | Reimann, Lars | |
dc.date.accessioned | 2025-06-25T12:59:15Z | |
dc.date.available | 2025-06-25T12:59:15Z | |
dc.date.issued | 25.06.2025 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11811/13159 | |
dc.description.abstract | This work introduces the Safe-DS stack for developing data science (DS) pipelines: The Safe-DS Python library offers commonly needed DS operations behind a simple, integrated, and consistent API. It is implemented on top of state-of-the-art DS libraries using the adapter design pattern. Usage counts of API elements and documentation are used to infer suitable API transformations for the original libraries. Further changes are applied in a custom GUI and adapter code is generated automatically. We call this transformation process "adaptoring". The Safe-DS language consists of a simple pipeline language to write DS pipelines, and a stub language to safely integrate Python libraries. The Safe-DS library is included by default. We statically catch type errors, boundary errors (number outside the legal interval), and state errors (inferring with an untrained ML model). Schema errors (e.g. accessing a non-existent column) are detected by running minimal code. Our execution system for Safe-DS pipelines is correct (runs all required code in the right order), and minimal (runs only the required code). It derives dependencies between individual operations, like calls. Call results are cached between runs to avoid expensive recomputation. The Safe-DS IDE offers comprehensive support for the Safe-DS language, like code-completion. A graphical view complements the textual pipeline language: Here, operations are displayed as nodes of a graph, and data flow as its edges. Results of pipelines runs are presented in dedicated views. For example, tables are opened in a custom GUI, which displays data, statistics, quality checks, and plots. Our evaluation indicates that the Safe-DS stack is considerably more usable, lets DS novices get a lot more work done, and greatly accelerates development compared to a traditional Python stack. | en |
dc.language.iso | eng | |
dc.rights | In Copyright | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Data Science | |
dc.subject | Maschinelles Lernen | |
dc.subject | Gebrauchstauglichkeit | |
dc.subject | Erlernbarkeit | |
dc.subject | Safety | |
dc.subject | API | |
dc.subject | DSL | |
dc.subject | IDE | |
dc.subject | Machine Learning | |
dc.subject | Usability | |
dc.subject | Learnability | |
dc.subject.ddc | 004 Informatik | |
dc.title | Safe-DS: A Toolkit for Safe Development of Data Science Pipelines | |
dc.type | Dissertation oder Habilitation | |
dc.identifier.doi | https://doi.org/10.48565/bonndoc-584 | |
dc.publisher.name | Universitäts- und Landesbibliothek Bonn | |
dc.publisher.location | Bonn | |
dc.rights.accessRights | openAccess | |
dc.identifier.urn | https://nbn-resolving.org/urn:nbn:de:hbz:5-83337 | |
dc.relation.doi | https://doi.org/10.1109/SANER60148.2024.00027 | |
dc.relation.doi | https://doi.org/10.1109/icse-nier58687.2023.00029 | |
dc.relation.doi | https://doi.org/10.1109/icse-nier58687.2023.00019 | |
dc.relation.doi | https://doi.org/10.1145/3510455.3512789 | |
dc.relation.doi | https://doi.org/10.1145/3397537.3397552 | |
ulbbn.pubtype | Erstveröffentlichung | |
ulbbnediss.affiliation.name | Rheinische Friedrich-Wilhelms-Universität Bonn | |
ulbbnediss.affiliation.location | Bonn | |
ulbbnediss.thesis.level | Dissertation | |
ulbbnediss.dissID | 8333 | |
ulbbnediss.date.accepted | 12.06.2025 | |
ulbbnediss.dissNotes.extern | In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Bonn's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink. | |
ulbbnediss.institute | Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik | |
ulbbnediss.fakultaet | Mathematisch-Naturwissenschaftliche Fakultät | |
dc.contributor.coReferee | Bauckhage, Christian | |
dcterms.hasSupplement | https://github.com/Safe-DS | |
ulbbnediss.contributor.orcid | https://orcid.org/0000-0002-5129-3902 |
Files in this item
This item appears in the following Collection(s)
-
E-Dissertationen (4316)