Paurat, Daniel: Intuitive Exploration of Multivariate Data. - Bonn, 2017. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.

Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-46856

Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-46856

@phdthesis{handle:20.500.11811/7166,

urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-46856,

author = {{Daniel Paurat}},

title = {Intuitive Exploration of Multivariate Data},

school = {Rheinische Friedrich-Wilhelms-Universität Bonn},

year = 2017,

month = jul,

note = {Approaching a dataset with an analysis question is usually not a trivial process. Apart from integrating, cleaning and pre-processing the data, typical issues are to generate and validate hypotheses, to understand which algorithms to apply, to estimate parameter settings and to interpret intermediate analysis results. To this end, it is often helpful to explore the data at first in order to find and understand its main characteristics, the driving influences, structures and relations among the data records, as well as revealing outliers. Exploratory data analysis, a term coined by John W. Tukey (Tukey, 1977), is a loose set of methods, mostly of graphical nature, to summarize and understand the main characteristics of the data at hand. This work extends the set of exploratory data analysis methods by proposing several new methods that support the analyst in his, or her task of understanding the data. Over the course of this thesis, two conceptually different approaches are investigated.

The first approach studies pattern mining algorithms, a family of methods that find and report hypotheses which describe interesting sub-populations of the dataset to the analyst, where the interestingness is measured by different quality functions. As the results of pattern mining methods are interpretable by a human expert, these algorithms are often utilized to study a dataset in an exploratory way. Note that many pattern mining algorithms address the problem of finding a small set of diverse high patterns. To this end, this work introduces two new algorithms, one for relevant and one for Δ-relevant subgroup discovery. In addition an algorithmic framework for sampling patterns according to different pattern quality measures is introduced. The second approach towards exploratory data analysis leaves the discovery of interesting sub-populations to the analyst and enables him, or her to study a two dimensional projection of the data and interact with it. A scatter plot visualization of the projected data lets the analyst observe the data collection as a whole and visually uncover interesting structures. Manipulating the locations of individual data records within the plot further enables the analyst to alter the projection angle and to actively steer the projection. This way relations among the data records can be set, or discovered and aspects of the data’s underlying distribution can be explored in a visual manner. Finding the according projections is not trivial and throughout this thesis three novel approaches are proposed to do so.

The thesis concludes with a synthesis of both approaches. Classical pattern mining algorithms often aim at reducing the output of patterns to a small set of highly interesting and diverse patterns. However, by discarding most of the patterns, a trade-off has to be made between ruling out potentially insightful patterns and possibly drowning the analyst in results. Combining interactive visual exploration techniques with pattern discovery, on the other hand, excels on working with larger pattern collections, as the underlying pattern-distribution emerges more clearly. This way, the analyst does not only retain an overview on the underlying structure of the dataset, but can also survey the relations among the interesting aspects of the dataset.},

url = {https://hdl.handle.net/20.500.11811/7166}

}

urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-46856,

author = {{Daniel Paurat}},

title = {Intuitive Exploration of Multivariate Data},

school = {Rheinische Friedrich-Wilhelms-Universität Bonn},

year = 2017,

month = jul,

note = {Approaching a dataset with an analysis question is usually not a trivial process. Apart from integrating, cleaning and pre-processing the data, typical issues are to generate and validate hypotheses, to understand which algorithms to apply, to estimate parameter settings and to interpret intermediate analysis results. To this end, it is often helpful to explore the data at first in order to find and understand its main characteristics, the driving influences, structures and relations among the data records, as well as revealing outliers. Exploratory data analysis, a term coined by John W. Tukey (Tukey, 1977), is a loose set of methods, mostly of graphical nature, to summarize and understand the main characteristics of the data at hand. This work extends the set of exploratory data analysis methods by proposing several new methods that support the analyst in his, or her task of understanding the data. Over the course of this thesis, two conceptually different approaches are investigated.

The first approach studies pattern mining algorithms, a family of methods that find and report hypotheses which describe interesting sub-populations of the dataset to the analyst, where the interestingness is measured by different quality functions. As the results of pattern mining methods are interpretable by a human expert, these algorithms are often utilized to study a dataset in an exploratory way. Note that many pattern mining algorithms address the problem of finding a small set of diverse high patterns. To this end, this work introduces two new algorithms, one for relevant and one for Δ-relevant subgroup discovery. In addition an algorithmic framework for sampling patterns according to different pattern quality measures is introduced. The second approach towards exploratory data analysis leaves the discovery of interesting sub-populations to the analyst and enables him, or her to study a two dimensional projection of the data and interact with it. A scatter plot visualization of the projected data lets the analyst observe the data collection as a whole and visually uncover interesting structures. Manipulating the locations of individual data records within the plot further enables the analyst to alter the projection angle and to actively steer the projection. This way relations among the data records can be set, or discovered and aspects of the data’s underlying distribution can be explored in a visual manner. Finding the according projections is not trivial and throughout this thesis three novel approaches are proposed to do so.

The thesis concludes with a synthesis of both approaches. Classical pattern mining algorithms often aim at reducing the output of patterns to a small set of highly interesting and diverse patterns. However, by discarding most of the patterns, a trade-off has to be made between ruling out potentially insightful patterns and possibly drowning the analyst in results. Combining interactive visual exploration techniques with pattern discovery, on the other hand, excels on working with larger pattern collections, as the underlying pattern-distribution emerges more clearly. This way, the analyst does not only retain an overview on the underlying structure of the dataset, but can also survey the relations among the interesting aspects of the dataset.},

url = {https://hdl.handle.net/20.500.11811/7166}

}