Show simple item record

Safe-DS: A Toolkit for Safe Development of Data Science Pipelines

dc.contributor.advisorLehmann, Jens
dc.contributor.authorReimann, Lars
dc.date.accessioned2025-06-25T12:59:15Z
dc.date.available2025-06-25T12:59:15Z
dc.date.issued25.06.2025
dc.identifier.urihttps://hdl.handle.net/20.500.11811/13159
dc.description.abstractThis work introduces the Safe-DS stack for developing data science (DS) pipelines:
The Safe-DS Python library offers commonly needed DS operations behind a simple, integrated, and consistent API. It is implemented on top of state-of-the-art DS libraries using the adapter design pattern. Usage counts of API elements and documentation are used to infer suitable API transformations for the original libraries. Further changes are applied in a custom GUI and adapter code is generated automatically. We call this transformation process "adaptoring".
The Safe-DS language consists of a simple pipeline language to write DS pipelines, and a stub language to safely integrate Python libraries. The Safe-DS library is included by default. We statically catch type errors, boundary errors (number outside the legal interval), and state errors (inferring with an untrained ML model). Schema errors (e.g. accessing a non-existent column) are detected by running minimal code.
Our execution system for Safe-DS pipelines is correct (runs all required code in the right order), and minimal (runs only the required code). It derives dependencies between individual operations, like calls. Call results are cached between runs to avoid expensive recomputation.
The Safe-DS IDE offers comprehensive support for the Safe-DS language, like code-completion. A graphical view complements the textual pipeline language: Here, operations are displayed as nodes of a graph, and data flow as its edges. Results of pipelines runs are presented in dedicated views. For example, tables are opened in a custom GUI, which displays data, statistics, quality checks, and plots.
Our evaluation indicates that the Safe-DS stack is considerably more usable, lets DS novices get a lot more work done, and greatly accelerates development compared to a traditional Python stack.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectData Science
dc.subjectMaschinelles Lernen
dc.subjectGebrauchstauglichkeit
dc.subjectErlernbarkeit
dc.subjectSafety
dc.subjectAPI
dc.subjectDSL
dc.subjectIDE
dc.subjectMachine Learning
dc.subjectUsability
dc.subjectLearnability
dc.subject.ddc004 Informatik
dc.titleSafe-DS: A Toolkit for Safe Development of Data Science Pipelines
dc.typeDissertation oder Habilitation
dc.identifier.doihttps://doi.org/10.48565/bonndoc-584
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-83337
dc.relation.doihttps://doi.org/10.1109/SANER60148.2024.00027
dc.relation.doihttps://doi.org/10.1109/icse-nier58687.2023.00029
dc.relation.doihttps://doi.org/10.1109/icse-nier58687.2023.00019
dc.relation.doihttps://doi.org/10.1145/3510455.3512789
dc.relation.doihttps://doi.org/10.1145/3397537.3397552
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID8333
ulbbnediss.date.accepted12.06.2025
ulbbnediss.dissNotes.externIn reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Bonn's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeBauckhage, Christian
dcterms.hasSupplementhttps://github.com/Safe-DS
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0002-5129-3902


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright