Seminar Computer Vision WS'25/26
Seminar
Prof. Dr.-Ing. Martin Eisemann
Hörerkreis: Bachelor & Master
Kontakt: seminarcv@cg.cs.tu-bs.de
Modul: INF-STD-66, INF-STD-68
Vst.Nr.: 4216031, 4216032
Topic: Recent research in Visual Computing

Latest News
If you would like to be provided with a presentation notebook for your talk, please let us know in advance.
Content
In this seminar we discuss current research results in computer vision, visual computing and image/video processing. The task of the participants is to understand and explain a certain research topic to the other participants. In a block seminar in the middle of the semester the background knowledge required for the final talks will be presented in oral presentations and at the end of the semester, the respective research topic is presented in an oral presentation. This must also be rehearsed beforehand in front of another student and his/her suggestions for improvement must be integrated.
Participants
The course is aimed at bachelor's and master's students from the fields of computer science (Informatik), IST, business informatics (Wirtschaftsinformatik), and data science.
Registration takes place centrally via StudIP. The number of participants is initially limited to 8 students, but can be extended in the kickoff if necessary.
Important Dates
All dates listed here must be adhered to. Attendance at all events is mandatory.
- 04.08.2025 12:00 - 08.08.2025 12:00: Registration via Stud.IP
- 28.10.2025, 10:30-12:00: Kickoff Meeting (G30, ICG)
- 03.11.2025: End of the deregistration period
- 13.11.2025, 10:30-12:00, G30 (ICG): Gather topics for fundamentals talk
- 10.12.2025: Submission of presentation slides for fundamentals talk (please use the following naming scheme: Lastname_FundamentalsPresentation_SeminarCV.pdf)
- 11.12.2025, 09:00 - 12:00, G30 (ICG): Fundamentals presentations, Block
- Till 14.01.2026: Trial presentation for final presentation (between tandem partners from fundamentals talk)
- 21.01.2026: Submission of presentation slides for final talk (ALL participants!) (please use the following naming scheme: Lastname_FinalPresentation_SeminarCV.pdf)
- 22.01.2026, 09:00 - 15:00, G30 (ICG): Presentations - Block Event Part 1
- 29.01.2026, 09:00 - 15:00, G30 (ICG): Presentations - Block Event Part 2 (probably not needed)
Registered students have the option to deregister until 03.11.2025 at the latest. For a successful deregistration it is necessary to deregister with the seminar supervisor.
The respective drop-offs are done by email to seminarcv@cg.cs.tu-bs.de , and your advisor, and if necessary by email to the tandem partner. Unless otherwise communicated, submissions must be made by 11:59pm on the submission day.
If you have any questions about the event, please contact seminarcv@cg.cs.tu-bs.de.
Format
- The topics for the final talks will be distributed amongst the participants during the Kickoff event.
- The topics for the fundamentals talks will be distributed amongst the participants during the second meeting.
- The topics will be presented in approximately 20 minute presentations followed by a discussion, see important dates.
- For the on-site lectures, a laptop of the institute or an own laptop can be used. If the institute laptop is to be used, it is necessary to contact seminarcv@cg.tu-bs.de in time, at least two weeks before the presentations. In this case, the presentation slides must be made available at least one week before the lecture.
- The presentations will be given on site. If, for some reason, the presentations take place online, Big Blue Button will be used as a platform. In this case, students need their own PC with microphone. In addition, a video transmission during the own lecture would be desirable. If these requirements cannot be met, it is necessary to contact seminarcv@cg.cs.tu-bs.de in time.
- The language for the presentations can be either German or English.
- The presentations are mandatory requirements to pass the course successfully.
Files and Templates
- Kickoff-Slides
- Slide-Template (optional usage)
Topics
- MineVRA: Exploring the Role of Generative AI-Driven Content Development in XR Environments through a Context-Aware Approach
(Santarnecchi et al.) IEEE VR 25
The convergence of Artificial Intelligence (AI), Computer Vision (CV), Computer Graphics (CG), and Extended Reality (XR) is driving innovation in immersive environments. A key challenge in these environments is the creation of personalized 3D assets, traditionally achieved through manual modeling, a time-consuming process that often fails to meet individual user needs. More recently, Generative AI (GenAI) has emerged as a promising solution for automated, context-aware content generation. In this paper, we present MineVRA (MultImodal generative artificial iNtelligence for contExt-aware Virtual Reality Assets), a novel Human-In-The-Loop (HITL) XR framework that integrates GenAI to facilitate coherent and adaptive 3D content generation in immersive scenarios. To evaluate the effectiveness of this approach, we conducted a comparative user study analyzing the performance and user satisfaction of GenAI-generated 3D objects compared to those generated by Sketchfab in different immersive contexts.
Paper
Advisor: Anika Jewst - Generative AI for Context-Aware 3D Object Creation Using Vision-Language Models in Augmented Reality
(Behravan et. al) IEEE AixVR 25
We present a novel Artificial Intelligence (AI) system that functions as a designer assistant in augmented reality (AR) environments. Leveraging Vision Language Models (VLMs) like LLaVA and advanced text-to-3D generative models, users can capture images of their surroundings with an Augmented Reality (AR) headset. The system analyzes these images to recommend contextually relevant objects that enhance both functionality and visual appeal. The recommended objects are generated as 3D models and seamlessly integrated into the AR environment for interactive use. Our system utilizes open-source AI models running on local systems to enhance data security and reduce operational costs. Key features include context-aware object suggestions, optimal placement guidance, aesthetic matching, and an intuitive user interface for real-time interaction. By addressing the challenge of providing context-aware object recommendations in AR, our system expands the capabilities of AI applications in this domain. It enables users to create personalized digital spaces efficiently, leveraging AI for contextually relevant suggestions.
Paper
Advisor: Anika Jewst - Contextual Matching Between Learning and Testing Within VR Does Not Always Enhance Memory Retrieval
(Mizuho et. al) VRST 2024
Episodic memory is influenced by environmental contexts, such as location and auditory stimuli. The most well-known effect is the reinstatement effect, which refers to the phenomenon where contextual matching between learning and testing enhances memory retrieval. Previous studies have investigated whether the reinstatement effect can be observed within immersive virtual environments. However, only a limited number of studies have reported a significant reinstatement effect using virtual reality, while most have failed to detect it. In this study, we re-examined the reinstatement effect using 360-degree video-based virtual environments. Specifically, we carefully selected virtual environments to elicit different emotional responses, which has been suggested as a key factor in inducing a robust reinstatement effect in the physical world.
Paper
Advisor: Anika Jewst - TransparentGS: Fast Inverse Rendering of Transparent Objects with Gaussians
(Huang et al.) SIGGRAPH 2025
TransparentGS adapts 3D Gaussian Splatting to transparent materials. It introduces transparent Gaussian primitives, a probe-based light-field encoding, and a depth-guided query to manage refraction. The pipeline reconstructs glass-like objects from multi-view images faster, renders scenes in real-time and reports high accuracy with fewer artifacts in challenging scenes, indicating potential value for graphics and vision research.
Paper | Project Page
Advisor: Jannis Möller - LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans
(Huang et al.) arXiv 2025
We propose LiteReality, a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas. LiteReality not only reconstructs scenes that visually resemble reality but also supports key features essential for graphics pipelines—such as object individuality, articulation, high-quality physically based rendering materials, and physically based interaction. At its core, LiteReality first performs scene understanding and parses the results into a coherent 3D layout and objects, with the help of a structured scene graph. It then reconstructs the scene by retrieving the most visually similar 3D artist-crafted models from a curated asset database. Later, the Material Painting module enhances the realism of retrieved objects by recovering high-quality, spatially varying materials. Finally, the reconstructed scene is integrated into a simulation engine with basic physical properties applied to enable interactive behavior. The resulting scenes are compact, editable, and fully compatible with standard graphics pipelines, making them suitable for applications in AR/VR, gaming, robotics, and digital twins. In addition, LiteReality introduces a training-free object retrieval module that achieves state-of-the-art similarity performance, as benchmarked on the Scan2CAD dataset, along with a robust Material Painting module capable of transferring appearances from images of any style to 3D assets—even in the presence of severe misalignment, occlusion, and poor lighting. We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets.
Website
Advisor: Anika Jewst - Differentiable Geometric Acoustic Path Tracing using Time-Resolved Path Replay Backpropagation
(Finnendahl et al.) SIGGRAPH 2025
Differentiable rendering has become a key ingredient in solving challenging inverse problems in computer graphics and vision. Existing systems can simulate and differentiate the spatial propagation of light. We exploit the duality of light transport simulations and geometric acoustics to apply differential rendering techniques to established acoustic simulation methods. The resulting system is capable of simulating sound according to the geometrical acoustics model and computing derivatives of the output energy spectrograms with respect to arbitrary parameters of the scene, including materials, emitters, microphones, and scene geometry. Contrary to current differentiable transient rendering, we can handle arbitrary simulation depths and achieve constant memory and linear execution times by presenting a temporal extension of Path Replay Backpropagation. We verify our model against established simulation software, and demonstrate the capabilities of optimization with gradients at examples of inverse acoustics and optimizing room parameters. This opens up a new field of research for acoustic optimization that could be as impactful for the acoustic community as differentiable rendering was for the graphics community.
Paper | Project Page
Advisor: Jannis Möller - FlowIE: Efficient Image Enhancement via Rectified Flow
(Zhu et al.) CVPR 2024
FlowIE is a flow‑based image enhancement framework that maps a simple distribution directly to high‑quality images via conditioned rectified flow. By straightening probability transfer paths, it performs inference an order of magnitude faster than diffusion‑based methods. Leveraging knowledge from pretrained diffusion models, FlowIE handles diverse real‑world degradations in under five steps. The paper further introduces a midpoint‑tangent algorithm derived from Lagrange’s Mean Value Theorem to refine path estimation and improve visual fidelity. Extensive experiments on synthetic and real‑world datasets demonstrate that FlowIE delivers both superior enhancement quality and efficiency.
Paper
Advisor: Fabian Friederichs - Split-Aperture 2-in-1 Computational Cameras
(Shi et al.) ACM TOG 2024
While conventional cameras offer versatility for applications ranging from amateur photography to autonomous driving, computational cameras allow for domain-specific adaption. Cameras with co-designed optics and image processing algorithms enable high-dynamic-range image recovery, depth estimation, and hyperspectral imaging through optically encoding scene information that is otherwise undetected by conventional cameras. However, this optical encoding creates a challenging inverse reconstruction problem for conventional image recovery, and often lowers the overall photographic quality. Thus computational cameras with domain-specific optics have only been adopted in a few specialized applications where the captured information cannot be acquired in other ways. In this work, we investigate a method that combines two optical systems into one to tackle this challenge. We split the aperture of a conventional camera into two halves: one which applies an application-specific modulation to the incident light via a diffractive optical element to produce a coded image capture, and one which applies no modulation to produce a conventional image capture. Co-designing the phase modulation of the split aperture with a dual-pixel sensor allows us to simultaneously capture these coded and uncoded images without increasing physical or computational footprint. With an uncoded conventional image alongside the optically coded image in hand, we investigate image reconstruction methods that are conditioned on the conventional image, making it possible to eliminate artifacts and compute costs that existing methods struggle with. We assess the proposed method with 2-in-1 cameras for optical high-dynamic-range reconstruction, monocular depth estimation, and hyperspectral imaging, comparing favorably to all tested methods in all applications.
Paper
Advisor: Fabian Friederichs - NeRF-Tex: Neural Reflectance Field Textures
(Baatz et al.) CFG 2021
We investigate the use of neural fields for modeling diverse mesoscale structures, such as fur, fabric, and grass. Instead of using
classical graphics primitives to model the structure, we propose to employ a versatile volumetric primitive represented by a
neural reflectance field (NeRF-Tex), which jointly models the geometry of the material and its response to lighting. The NeRF-Tex
primitive can be instantiated over a base mesh to “texture” it with the desired meso and microscale appearance. We condition
the reflectance field on user-defined parameters that control the appearance. A single NeRF texture thus captures an entire space
of reflectance fields rather than one specific structure. This increases the gamut of appearances that can be modeled and provides
a solution for combating repetitive texturing artifacts. We also demonstrate that NeRF textures naturally facilitate continuous
level-of-detail rendering. Our approach unites the versatility and modeling power of neural networks with the artistic control
needed for precise modeling of virtual scenes. While all our training data is currently synthetic, our work provides a recipe that
can be further extended to extract complex, hard-to-model appearances from real images.
Paper | Project Page
Advisor: Fabian Friederichs
Useful Resources
Example of a good presentation (video on the website under the Presentation section, note how little text is needed, and how much has been visualized to create an intuitive understanding).
General writing tips for scientific papers (mainly intended for writing scientific articles, but also good to use for summaries).