Seminar Computer Vision SS'26
Seminar
Prof. Dr.-Ing. Martin Eisemann
Hörerkreis: Bachelor & Master
Kontakt: seminarcv@cg.cs.tu-bs.de
Modul: INF-STD-66, INF-STD-68
Vst.Nr.: 4216031, 4216032
Topic: Recent research in Visual Computing

Latest News
Content
In this seminar we discuss current research results in computer vision, visual computing and image/video processing. The task of the participants is to understand and explain a certain research topic to the other participants. In a block seminar in the middle of the semester the background knowledge required for the final talks will be presented in oral presentations and at the end of the semester, the respective research topic is presented in an oral presentation. This must also be rehearsed beforehand in front of another student and his/her suggestions for improvement must be integrated.
Participants
The course is aimed at bachelor's and master's students from the fields of computer science (Informatik), IST, business informatics (Wirtschaftsinformatik), and data science.
Registration takes place centrally via StudIP. The number of participants is initially limited to 8 students, but can be extended in the kickoff if necessary.
Important Dates
All dates listed here must be adhered to. Attendance at allevents is mandatory.
Events in person
Submission deadlines / action required
- xx.xx.2026 12:00 - xx.xx.2026 12:00: Registration via Stud.IP (dates haven't been announced yet)
- 07.04.2026, 10:30-12:00, (G30, ICG): Kickoff Meeting
- 20.04.2026: End of the deregistration period
- 23.04.2026, 10:30-12:00, G30 (ICG): Gather topics for fundamentals talk
- 20.05.2026: Submission of presentation slides for fundamentals talk (please use the following naming scheme: Lastname_FundamentalsPresentation_SeminarCV.pdf)
- 21.05.2026, 09:00 - 12:00, G30 (ICG): Fundamentals presentations, Block
- Till 14.01.2026: Trial presentation for final presentation (between tandem partners from fundamentals talk)
- 24.06.2026: Submission of presentation slides for final talk (ALL participants!) (please use the following naming scheme: Lastname_FinalPresentation_SeminarCV.pdf)
- 25.06.2026, 09:00 - 15:00, G30 (ICG): Presentations - Block Event Part 1
- 26.06.2026, 09:00 - 15:00, G30 (ICG): Presentations - Block Event Part 2 (probably not needed)
Registered students have the option to deregister up to two weeks after the official start of lectures for this semester have started. For a successful deregistration it is necessary to deregister with the seminar supervisor.
The respective drop-offs are done by email to seminarcv@cg.cs.tu-bs.de , and your advisor, and if necessary by email to the tandem partner. Unless otherwise communicated, submissions must be made by 11:59pm on the submission day.
If you would like to be provided with a presentation notebook for your talk, please let us know and send your presentation directly or via download link (TU-Cloud) in PPTX or PDF format at least 3 days in advance to seminarcv@cg.cs.tu-bs.de.
If you have any questions about the event, please contact seminarcv@cg.cs.tu-bs.de.
Format
- The topics for the final talks will be distributed amongst the participants during the Kickoff event.
- The topics for the fundamentals talks will be distributed amongst the participants during the second meeting.
- The topics will be presented in approximately 20 minute presentations followed by a discussion, see important dates.
- For the on-site lectures, a laptop of the institute or an own laptop can be used. If the institute laptop is to be used, it is necessary to contact seminarcv@cg.tu-bs.de in time, at least two weeks before the presentations. In this case, the presentation slides must be made available at least one week before the lecture.
- The presentations will be given on site. If, for some reason, the presentations take place online, Big Blue Button will be used as a platform. In this case, students need their own PC with microphone. In addition, a video transmission during the own lecture would be desirable. If these requirements cannot be met, it is necessary to contact seminarcv@cg.cs.tu-bs.de in time.
- The language for the presentations can be either German or English.
- The presentations are mandatory requirements to pass the course successfully.
Files and Templates
- Kickoff-Slides
- Slide-Template (optional usage)
Topics - Bachelor Level
- Dual Photography
Sen, P., Chen, B., Garg, G., Marschner, S. R., Horowitz, M., Levoy, M., & Lensch, H. P. (2005). In ACM SIGGRAPH 2005 Papers (pp. 745-755).
[ paper ]
We present a novel photographic technique called dual photography, which exploits Helmholtz reciprocity to interchange the lights and cameras in a scene. With a video projector providing structured illumination, reciprocity permits us to generate pictures from the viewpoint of the projector, even though no camera was present at that location. The technique is completely image-based, requiring no knowledge of scene geometry or surface properties, and by its nature automatically includes all transport paths, including shadows, interreflections and caustics. In its simplest form, the technique can be used to take photographs without a camera; we demonstrate this by capturing a photograph using a projector and a photo-resistor. If the photo-resistor is replaced by a camera, we can produce a 4D dataset that allows for relighting with 2D incident illumination. Using an array of cameras we can produce a 6D slice of the 8D reflectance field that allows for relighting with arbitrary light fields. Since an array of cameras can operate in parallel without interference, whereas an array of light sources cannot, dual photography is fundamentally a more efficient way to capture such a 6D dataset than a system based on multiple projectors and one camera. As an example, we show how dual photography can be used to capture and relight scenes.
Advisor: Fabian Friederichs - Acquiring the Reflectance Field of a Human Face
Debevec, P., Hawkins, T., Tchou, C., Duiker, H. P., Sarokin, W., & Sagar, M. (2000, July). In Proceedings of the 27th annual conference on Computer graphics and interactive techniques (pp. 145-156).
[ paper | project page ]
We present a method to acquire the reflectance field of a human face and use these to render the face under arbitrary changes in lighting and viewpoint. We first images of the face from a small set of viewpoints under a dense sampling of in cident illumination directions using a light stage. We then construct a reflectance image for each observed image pixel from its values over the space of illumination directions. From the reflectance functions, we can directly generate images of the face from the original viewpoints in any form of sampled or computed illumination. To change the viewpoint, we use a model of skin reflectance to estimate the of the reflectance functions for novel viewpoints. We demonstrate the technique with renderings of a person’s face under novel illumination and viewpoints.
Advisor: Fabian Friederichs - Fast separation of direct and global components of a scene using high frequency illumination
Nayar, S. K., Krishnan, G., Grossberg, M. D., & Raskar, R. (2006). In ACM SIGGRAPH 2006 Papers (pp. 935-944).
[ paper ]
We present fast methods for separating the direct and global illumination components of a scene measured by a camera and illuminated by a light source. In theory, the separation can be done with just two images taken with a high frequency binary illumination pattern and its complement. In practice, a larger number of images are used to overcome the optical and resolution limitations of the camera and the
source. The approach does not require the material properties of objects and media in the scene to be known. However, we require that the illumination frequency is high enough to adequately sample the global components received by scene points. We present separation results for scenes that include complex interreflections, subsurface
scattering and volumetric scattering. Several variants of the separation approach are also described. When a sinusoidal illumination pattern is used with different phase shifts, the separation can be done using just three images. When the computed images are of lower resolution than the source and the camera, smoothness constraints are used to perform the separation using a single image. Finally, in
the case of a static scene that is lit by a simple point source, such as the sun, a moving occluder and a video camera can be used to do the separation. We also show several simple examples of how novel images of a scene can be computed from the separation results.
Advisor: Fabian Friederichs - Shape-from-shading: a survey
Zhang, R., Tsai, P. S., Cryer, J. E., & Shah, M. (2002). IEEE transactions on pattern analysis and machine intelligence, 21 (8), 690-706.
[ paper ]
Since the first shape-from-shading (SFS) technique was developed by Horn in the early 1970s, many different approaches have emerged. In this paper, six well-known SFS are implemented and compared. The performance of the algorithms was analyzed on synthetic images using mean and standard deviation of depth (Z) error, mean of gradient (p, q) error, and CPU timing. Each algorithm works well for certain images, but performs poorly for others. In general, minimization approaches are more robust, while the other approaches are faster. The implementation of these algorithms in C and images used in this paper are available by anonymous ftp under the /tech_paper/survey directory at eustis.cs.ucf.edu (132.170.108.42). These are also part of the electronic version of paper.
Advisor: Fabian Friederichs - Recovering High Dynamic Range Radiance Maps from Photographs
Debevec, P. E., & Malik, J. (2023). In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 (pp. 643-652).
[ paper ]
We present a method of recovering high dynamic range radiance maps from photographs taken with conventional imaging equipment. In our method, multiple photographs of the scene are taken with different amounts of exposure. Our algorithm uses these differently exposed photographs to recover the response function of the imaging process, up to factor of scale, using the assumption of reciprocity. With the known response function, the algorithm can fuse the multiple photographs into a single, high dynamic range radiance map whose pixel values are proportional to the true radiance values in the scene. We demonstrate our method on images acquired with both photochemical and digital imaging processes. We discuss how this work is applicable in many areas of computer graphics involving digitized photographs, including image-based modeling, image compositing, and image processing. Lastly, we demonstrate a few applications of having high dynamic range radiance maps, such as synthesizing realistic motion blur and simulating the response of the human visual system.
Advisor: Fabian Friederichs - Stable fluids
Stam, J. (1999). SIGGRAPH
[ paper ]
Building animation tools for fluid-like motions is an important and challenging problem with many applications in computer graphics. The use of physics-based models for fluid flow can greatly assist in creating such tools. Physical models, unlike key frame or procedural based techniques, permit an animator to almost effortlessly create interesting, swirling fluid-like behaviors. Also, the interaction of flows with objects and virtual forces is handled elegantly. Until recently, it was believed that physical fluid models were too expensive to allow real-time interaction. This was largely due to the fact that previous models used unstable schemes to solve the physical equations governing a fluid. In this paper, for the first time, we propose an unconditionally stable model which still produces complex fluid-like flows. As well, our method is very easy to implement. The stability of our model allows us to take larger time steps and therefore achieve faster simulations. We have used our model in conjuction with advecting solid textures to create many fluid-like animations interactively in two- and three-dimensions.
Advisor: Jannis Möller - Denoising Diffusion Probabilistic Models
Ho, J., Jain, A., & Abbeel, P. (2020). NeurIPS
[ paper | project page ]
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN.
Advisor: Jannis Möller - Instant neural graphics primitives with a multiresolution hash encoding
Müller, T., Evans, A., Schied, C., & Keller, A. (2022). SIGGRAPH
[ paper | project page ]
Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of 1920x1080.
Advisor: Jannis Möller
Topics - Master Level
- Stochastic Ray Tracing of Transparent 3D Gaussians
Sun, X., Georgiev, I., Fei, Y. (Raymond), & Hasan, M. (2025). Eurographics Symposium on Rendering.
[ paper ]
3D Gaussian splatting has been widely adopted as a 3D representation for novel-view synthesis, relighting, and 3D generation tasks. It delivers realistic and detailed results through a collection of explicit 3D Gaussian primitives, each carrying opacity and view-dependent color. However, efficient rendering of many transparent primitives remains a significant challenge. Existing approaches either rasterize the Gaussians with approximate per-view sorting or rely on high-end RTX GPUs. This paper proposes a stochastic ray-tracing method to render 3D clouds of transparent primitives. Instead of processing all ray-Gaussian intersections in sequential order, each ray traverses the acceleration structure only once, randomly accepting and shading a single intersection (or N intersections, using a simple extension). This approach minimizes shading time and avoids primitive sorting along the ray, thereby minimizing register usage and maximizing parallelism even on low-end GPUs. The cost of rays through the Gaussian asset is comparable to that of standard mesh-intersection rays. The shading is unbiased and has low variance, as our stochastic acceptance achieves importance sampling based on accumulated weight. The alignment with Monte Carlo philosophy simplifies implementation and integration into a conventional path-tracing framework.
Advisor: Fabian Friederichs - Neural Importance Sampling
Müller, T., McWilliams, B., Rousselle, F., Gross, M., & Novák, J. (2019). ACM Transactions on Graphics (ToG), 38(5), 1-19.
[ paper ]
We propose to use deep neural networks for generating samples in Monte
Carlo integration. Our work is based on non-linear independent compo-
nents estimation (NICE), which we extend in numerous ways to improve
performance and enable its application to integration problems. First, we
introduce piecewise-polynomial coupling transforms that greatly increase
the modeling power of individual coupling layers. Second, we propose to
preprocess the inputs of neural networks using one-blob encoding, which
stimulates localization of computation and improves inference. Third, we
derive a gradient-descent-based optimization for the Kullback-Leibler and
the χ 2 divergence for the specific application of Monte Carlo integration
with unnormalized stochastic estimates of the target distribution. Our ap-
proach enables fast and accurate inference and efficient sample generation
independently of the dimensionality of the integration domain. We show
its benefits on generating natural images and in two applications to light-
transport simulation: first, we demonstrate learning of joint path-sampling
densities in the primary sample space and importance sampling of multi-
dimensional path prefixes thereof. Second, we use our technique to extract
conditional directional densities driven by the product of incident illumina-
tion and the BSDF in the rendering equation, and we leverage the densities
for path guiding. In all applications, our approach yields on-par or higher
performance than competing techniques at equal sample count.
Advisor: Fabian Friederichs - Fluid Simulation on Neural Flow Maps
Deng, Y., Yu, H.-X., Zhang, D., Wu, J., & Zhu, B. (2023). SIGGRAPH Asia
[ paper | project page ]
This work introduces Neural Flow Maps, a novel method bridging implicit neural representations with flow map theory to achieve state-of-the-art inviscid fluid simulation. It utilizes a hybrid representation fusing small neural networks with multi-resolution sparse grids to compactly and accurately model long-term spatiotemporal velocity fields. This neural velocity buffer enables the symmetric computation of long-term, bidirectional flow maps and their Jacobians, drastically improving accuracy over existing solutions. These flow maps provide high advection accuracy with low dissipation, facilitating high-fidelity incompressible simulations of intricate vortical structures.
Advisor: Jannis Möller - 4Deform: Neural Surface Deformation for Robust Shape Interpolation
Sang, L., Canfes, Z., Cao, D., Marin, R., Bernard, F., & Cremers, D. (2025). CVPR
[ paper | project page ]
Generating realistic intermediate shapes between non-rigidly deformed shapes is a challenging task in computer vision, especially with unstructured data (e.g., point clouds) where temporal consistency across frames is lacking, and topologies are changing. Most interpolation methods are designed for structured data (i.e., meshes) and do not apply to real-world point clouds. In contrast, our approach leverages neural implicit representation (NIR) to enable free-topology changing shape deformation. Unlike previous mesh-based methods, which model learns vertex-based deformation fields, our method learns a continuous velocity field in Euclidean space, making it suitable for less structured data such as point clouds.Additionally, our method does not require intermediate-shape supervision during training; instead, we incorporate physical and geometrical constraints to regularize the velocity field. We reconstruct intermediate surfaces using a modified level-set equation, directly linking our NIR with the velocity field. Experiments show that our method significantly outperforms previous NIR approaches across various scenarios (e.g., noisy, partial, topology-changing, non-isometric shapes) and, for the first time, enables new applications like 4D Kinect sequence upsampling and real-world high-resolution mesh deformation.
Advisor: Jannis Möller - Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
Wu, J. Z., Zhang, Y., Turki, H., Ren, X., Gao, J., Shou, M. Z., Fidler, S., Gojcic, Z., & Ling, H. (2025). CVPR
[ paper | project page ]
Neural Radiance Fields and 3D Gaussian Splatting have revolutionized 3D reconstruction and novel-view synthesis task. However, achieving photorealistic rendering from extreme novel viewpoints remains challenging, as artifacts persist across representations. In this work, we introduce Difix3D+, a novel pipeline designed to enhance 3D reconstruction and novel-view synthesis through single-step diffusion models. At the core of our approach is Difix, a single-step image diffusion model trained to enhance and remove artifacts in rendered novel views caused by underconstrained regions of the 3D representation. Difix serves two critical roles in our pipeline. First, it is used during the reconstruction phase to clean up pseudo-training views that are rendered from the reconstruction and then distilled back into 3D. This greatly enhances underconstrained regions and improves the overall 3D representation quality. More importantly, Difix also acts as a neural enhancer during inference, effectively removing residual artifacts arising from imperfect 3D supervision and the limited capacity of current reconstruction models. Difix3D+ is a general solution, a single model compatible with both NeRF and 3DGS representations, and it achieves an average 2x improvement in FID score over baselines while maintaining 3D consistency.
Advisor: Jannis Möller - PatchFusionVR: Multitask Prediction of User Gaze, Reaction Time, and Cognitive Load in Virtual Reality from Multimodal Signals
Pavel, M. I., Mahmud, M. R., Setu, J. N., Desai, K., & Quarles, J. (2025, November). In Proceedings of the 2025 31st ACM Symposium on Virtual Reality Software and Technology (pp. 1-11).
[ paper ]
Enhancing user experience and performance, including task load in immersive environments, requires accurate prediction of user gaze point, reaction time, and mental and physical load uptake. Current gaze prediction approaches focus primarily on motion-based information, lacking physiological data, which leads to poor prediction accuracy in highly dynamic virtual reality (VR) environments. Traditional cognitive load measurements rely on post-task analysis without proper multimodal data integration and fail to capture the real-time dynamics of user states during interaction. Likewise, reaction time or attention load are often assessed only after the interaction, without using real-time immersive sensor data, which limits adaptive responsiveness. To tackle these
limitations, we leveraged a comprehensive multimodal dataset - VRWalking, which recorded timestamped eye-tracking metrics, physiological signals (heart rate and galvanic skin response), and behavioral performance data during real-time engagement in a VR environment. We developed a unified multitask model based on the MultiPatchFormer architecture, which processes multimodal VR signals through dual patch projection branches for gaze and
classification inputs. The model employs multiscale patch embeddings, cross-attention between gaze and classification pathways, channel attention, and transformer encoders to jointly predict continuous user gaze and classify reaction time, cognitive load (mental load and physical load). Our methodology achieved excellent predictive performance: 95.64% for reaction time, 98.01% for mental load, and 97.45% for physical load, with a MAPE (Mean Absolute Percentage Error) of 15.24% for gaze prediction. We applied Shapley Additive explanations (SHAP) analysis to interpret the model’s behavior across all features, including eye-tracking, head-tracking, and physiological signals. The analysis revealed which features most influenced the predictions of user gaze, reaction time, mental load, and physical load. Our methods, while based only on the VR-Walking dataset, demonstrated strong performance across all tasks,
suggesting promising potential for real-world VR applications such as interactive training systems that respond to user attention lapses, educational platforms that adapt to cognitive load, and performance assessments that consider physiological indicators.
Advisor: Anika Jewst - AR-TMT: Investigating the Impact of Distraction Types on Attention and Behavior in AR-based Trail Making Test
Baek, S., Qu, Z., & Gorlatova, M. (2025, November). In Proceedings of the 2025 31st ACM Symposium on Virtual Reality Software and Technology (pp. 1-11).
Despite the growing use of AR in safety-critical domains, the field lacks a systematic understanding of how different types of distraction affect user behavior in AR environments. To address this gap, we present AR-TMT, an AR adaptation of the Trail Making Test that spatially renders targets for sequential selection on the Magic Leap 2. We implemented distractions in three categories: top-down, bottom-up, and spatial distraction based on Wolfe’s Guided Search model, and captured performance, gaze, motor behavior, and subjective load measures to analyze user attention and behavior. A user study with 34 participants revealed that top-down distraction degraded performance through semantic interference, while bottom-up distraction disrupted initial attentional engagement. Spatial distraction destabilized gaze behavior, leading to more scattered and less structured visual scanning patterns. We also found that per-
formance was correlated with attention control (R2 = .20 - .35) under object-based distraction conditions, where distractors possessed task-relevant features. The study offers insights into distraction mechanisms and their impact on users, providing opportunities for generalization to ecologically relevant AR tasks while underscoring the need to address the unique demands of AR environments.
Advisor: Anika Jewst
Useful Resources
Example of a good presentation (video on the website under the Presentation section, note how little text is needed, and how much has been visualized to create an intuitive understanding).
General writing tips for scientific papers (mainly intended for writing scientific articles, but also good to use for summaries).