Logo CG
Scalable Visual Analytics
Logo TU

SPP 1335 - Scalable Visual Analytics


DFG MA2555/6-1



12.07.2011 Our paper on Synthetic Generation of High-dimensional Datasets has been accepted at InfoVis 2011.

11.07.2011 Our paper on Perception-based Visual Quality Measures has been accepted at the VAST 2011.

18.11.2009 Our paper on Combining automated analysis and visualization techniques for effective exploration of high-dimensional data has won the SPP Collaboration Award in the DFG priority program on Scalable Visual Analytics (SPP 1335).

08.12.2008 SSP Kick-off Meeting - Dagstuhl

Abstract

Goal of this research project is to develop and evaluate a fundamentally new approach to exhaustively search for, and interactively characterize any non-random mutual relationship between attribute dimensions in general data sets. To be able to systematically consider all possible attribute combinations, we propose to apply image analysis to visualization results in order to automatically pre-select only those attribute combinations featuring non-random relationships. To characterize the found information and to build mathematical descriptions, we rely on interactive visual inspection and visualization-assisted interactive information modeling. This way, we intend to discover and explicitly characterize all information implicitly represented in unbiased sets of multi-dimensional data points.

Publications

Dirk. J. Lehmann, Georgia Albuquerque, Martin Eisemann, Marcus Magnor, and Holger Theisel:
"Selecting Coherent and Relevant Plots in Large Scatterplot Matrices",
Computer Graphics Forum, April 2012.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

The scatterplot matrix (SPLOM) is a well-established technique to visually explore high-dimensional data sets. It is characterized by the number of scatterplots (plots) of which it consists of. Unfortunately, this number quadratically grows with the number of the data set’s dimensions. Thus, an SPLOM scales very poorly. Consequently, the usefulness of SPLOMs is restricted to a small number of dimensions. For this, several approaches already exist to explore such ‘small’ SPLOMs. Those approaches address the scalability problem just indirectly and without solving it. Therefore, we introduce a new greedy approach to manage ‘large’ SPLOMs with more than 100 dimensions. We establish a combined visualization and interaction scheme that produces intuitively interpretable SPLOMs by combining known quality measures, a pre-process reordering and a perception-based abstraction. With this scheme, the user can interactively find large amounts of relevant plots in large SPLOMs.

Georgia Albuquerque, Thomas Löwe, and Marcus Magnor:
"Synthetic Generation of High-dimensional Datasets",
IEEE Transactions on Visualization and Computer Graphics (TVCG, Proc. Visualization / InfoVis), vol. 17, no. 12, pp. 2317–2324, December 2011.
doi: http://dx.doi.org/10.1109/TVCG.2011.237
Part of project "Scalable Visual Analytics".
[pdf] [bib] [linux-version]

Generation of synthetic datasets is a common practice in many research areas. Such data is often generated to meet specific needs or certain conditions that may not be easily found in the original, real data. The nature of the data varies according to the application area and includes text, graphs, social or weather data, besides many others. The common process to create such synthetic datasets is to implement small scripts or programs, restricted to small problems or to a specific application. In this paper we propose a framework designed to generate high dimensional datasets. Users can interactively create and navigate through multi dimensional datasets using a suitable graphical user-interface. The data creation is driven by statistical distributions based on few user-defined parameters. First, a grounding dataset is created according to given inputs, and then structures and trends are included in selected dimensions and orthogonal projection planes. Furthermore, our framework supports the creation of complex non-orthogonal trends and classified datasets. It can successfully be used to create synthetic datasets simulating important trends as multidimensional clusters, correlations and outliers.

Georgia Albuquerque, Martin Eisemann, and Marcus Magnor:
"Perception-based Visual Quality Measures",
in Proc. IEEE Symposium on Visual Analytics Science and Technology (VAST) 2011, pp. 13–20, October 2011.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

In recent years diverse quality measures to support the exploration of high-dimensional data sets have been proposed. Such measures can be very useful to rank and select information-bearing projections of very high dimensional data, when the visual exploration of all possible projections becomes unfeasible. But even though a ranking of the low dimensional projections may support the user in the visual exploration task, different measures deliver different distances between the views that do not necessarily match the expectations of human perception. As an alternative solution, we propose a perception-based approach that, similar to the existing measures, can be used to select information bearing projections of the data. Specifically, we construct a perceptual embedding for the different projections based on the data from a psychophysics study and multi-dimensional scaling. This embedding together with a ranking function is then used to estimate the value of the projections for a specific user task in a perceptual sense.

Martin Eisemann, Georgia Albuquerque, and Marcus Magnor:
"Data Driven Color Mapping",
in Proc. EuroVA: International Workshop on Visual Analytics 2011, Bergen, Norway, May 2011.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

In this paper we present a simple, yet effective method to map data set values of different distributions to a color map in order to reveal interesting structures. We make use of an ordering and a simple projection technique to transform the data set before color mapping. Our transformation yields convincing results for various distributions. It also removes the burden from the user to test several mappings beforehand. A simple angular interpolation technique allows to project the data values of the visualization as desired, interactively.

Andrada Tatu, Georgia Albuquerque, Martin Eisemann, Peter Bak, Holger Theisel, Marcus Magnor, and Daniel Keim:
"Automated Analytical Methods to Support Visual Exploration of High-Dimensional Data",
IEEE Transactions on Visualization and Computer Graphics (TVCG), vol. 17, no. 5, pp. 584–597, February 2011.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

Visual exploration of multivariate data typically requires projection onto lower-dimensional representations. The number of possible representations grows rapidly with the number of dimensions, and manual exploration quickly becomes ineffective or even unfeasible. This paper proposes automatic analysis methods to extract potentially relevant visual structures from a set of candidate visualizations. Based on features, the visualizations are ranked in accordance with a specified user task. The user is provided with a manageable number of potentially useful candidate visualizations, which can be used as a starting point for interactive data analysis. This can effectively ease the task of finding truly useful visualizations and potentially speed up the data exploration task. In this paper, we present ranking measures for class-based as well as non class-based scatterplots and parallel coordinates visualizations. The proposed analysis methods are evaluated on different datasets.

Georgia Albuquerque, Martin Eisemann, Dirk. J. Lehmann, Holger Theisel, and Marcus Magnor:
"Improving the Visual Analysis of High-dimensional Datasets Using Quality Measures",
in Proc. IEEE Symposium on Visual Analytics Science and Technology (VAST) 2010, Salt Lake City, Utah, USA, pp. 19–26, October 2010.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

Modern visualization methods are in need to cope with very highdimensional data. Efficient visual analytical techniques are required to extract the inherent information content. The large number of possible projections for each method, which usually grow quadratically or even exponentially with the number of dimensions, urges the necessity to employ automatic reduction techniques, automatic sorting or selecting the projections, based on their informationbearing content. Different quality measures have been successfully applied for several specified user tasks and established visualization techniques, like Scatterplots, Scatterplot Matrices or Parallel Coordinates. Many other popular visualization techniques exist, but due to the structural differences, the measures are not directly applicable to them and new approaches are needed. In this paper we propose new quality measures for three popular visualization methods: Radviz, Pixel-Oriented Displays and Table Lenses. Our experiments show that these measures efficiently guide the visual analysis task.

Dirk. J. Lehmann, Georgia Albuquerque, Martin Eisemann, Andrada Tatu, Heidrun Schumann, Marcus Magnor, and Holger Theisel:
"Visualisierung und Analyse multidimensionaler Datensätze",
Informatik-Spektrum, vol. 33, no. 5, pp. 589–600, September 2010.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

Concerning multi-dimensional data sets there exist a lot of visual-based as well as automatical techniques to detect inherent relations and characteristics. Due to the (increasing) size and complexity of such data, it is necessary to combine both approaches. In this article, we therefore present established visual-based and automatical data analysis approaches and we reveal modern methods to combine these approaches, with the goal to enhance the data analysis process. All explanations are supported by examples to ease the reader's understanding.

Georgia Albuquerque, Martin Eisemann, Dirk. J. Lehmann, Holger Theisel, and Marcus Magnor:
"Quality-Based Visualization Matrices",
in Proc. Vision, Modeling and Visualization (VMV) 2009, Braunschweig, Germany, pp. 341–349, November 2009.
Part of project "Scalable Visual Analytics".
[pdf] [bib]

Parallel coordinates and scatterplot matrices are widely used to visualize multi-dimensional data sets. But these visualization techniques are insufficient when the number of dimensions grows. To solve this problem, different approaches to preselect the best views or dimensions have been proposed in the last years. However, there are still several shortcomings to these methods. In this paper we present three new methods to explore multivariate data sets: a parallel coordinates matrix, in analogy to the well-known scatterplot matrix, a classbased scatterplot matrix that aims at finding good projections for each class pair, and an importance aware algorithm to sort the dimensions of scatterplot and parallel coordinates matrices.

Andrada Tatu, Georgia Albuquerque, Martin Eisemann, Jörn Schneidewind, Holger Theisel, Marcus Magnor, and Daniel Keim:
"Combining automated analysis and visualization techniques for effective exploration of high-dimensional data",
in Proc. IEEE Symposium on Visual Analytics Science and Technology (VAST) 2009, Atlantic City, New Jersey, USA, pp. 59–66, October 2009.
Won the SPP Collaboration Award in the DFG priority program on Scalable Visual Analytics (SPP 1335).
Part of project "Scalable Visual Analytics".
[pdf] [bib]

Visual exploration of multivariate data typically requires projection onto lower-dimensional representations. The number of possible representations grows rapidly with the number of dimensions, and manual exploration quickly becomes ineffective or even unfeasible. This paper proposes automatic analysis methods to extract potentially relevant visual structures from a set of candidate visualizations. Based on features, the visualizations are ranked in accordance with a specified user task. The user is provided with a manageable number of potentially useful candidate visualizations, which can be used as a starting point for interactive data analysis. This can effectively ease the task of finding truly useful visualizations and potentially speed up the data exploration task. In this paper, we present ranking measures for class-based as well as non class-based Scatterplots and Parallel Coordinates visualizations. The proposed analysis methods are evaluated on different datasets.


Line
TU Braunschweig - Fakultät für Mathematik und Informatik - Computer Graphics - Research Projects - Scalable Visual Analytics