Seminar Computer Vision WS'23/24
Seminar
Prof. Dr.-Ing. Martin Eisemann
Hörerkreis: Bachelor & Master
Kontakt: seminarcv@cg.cs.tu-bs.de
Modul: INF-STD-66, INF-STD-68
Vst.Nr.: 4216031, 4216032
Thema: Aktuelle Forschungsergebnisse aus dem Bereich Visual Computing
Latest News
There was a change in the presentation schedule. Talks are now on the 1st and 2nd of Februaryinstead of January 31st.
The talks on the 1st of February will be held in G30 (Seminar room, ICG)
The talks on the 2nd of February will be held in G41B (Hardstyle Lab, ICG).
Content
In this seminar we discuss current research results in computer vision, visual computing and image/video processing. The task of the participants is to write up research reports, to review the work of another student in writing and to improve their own work according to the review of other students. In a block seminar at the end of the semester, the respective research report is presented in an oral presentation. This must also be rehearsed beforehand in front of another student and his/her suggestions for improvement must be integrated.
Participants
The course is aimed at bachelor's and master's students from the fields of computer science (Informatik), IST, business informatics (Wirtschaftsinformatik), and data science.
Registration takes place centrally via StudIP. The number of participants is initially limited to 8 students. Can be extended in the kickoff if necessary.
Important Dates
All dates listed here must be adhered to. Attendance at all events is mandatory.
- 13.07.2023 to 03.08.2023: Registration via Stud.IP
- 24.10.2023, 10:30: Kickoff Meeting (G30, ICG)
- 06.11.2023: End of the deregistration period
- 26.11.2023: Submission of first draft of the written paper (please use the following naming scheme: Lastname_Draft_SeminarCV.pdf)
- 06.12.2023: Submission of the review report (please use the following naming scheme: Lastname_Review_SeminarCV.pdf)
- 20.12.2023: Submission of the revised paper (please use the following naming scheme: Lastname_FinalReport_SeminarCV.pdf)
- Until 19.01.2024: Trial presentation (only between the tandem partners)
- 25.01.2024: Submission of the presentation slides
- 01.02.2024, 09:00 - 13:00, G30 (ICG): Presentations - Block Event Part 1 (Topics 1-6)
- 02.02.2024, 09:00 - 13:00, G41B (ICG): Presentations - Block Event Part 2 (Topics 7-12)
Registered students have the possibility to deregister until 2 weeks after the start of the lectures at the latest. For a successful deregistration it is necessary to deregister with the seminar supervisor.
The respective drop-offs are done by email to seminarcv@cg.cs.tu-bs.de , and your advisor, and if necessary by email to the tandem partner. Unless otherwise communicated, submissions must be made by 11:59pm on the submission day.
If you have any questions about the event, please contact seminarcv@cg.cs.tu-bs.de.
Format
- The topics will be distributed amongst the participants during the Kickoff event.
- For each topic, a report is prepared in latex using the institute template.
The content of the report is a short summary of the work in your own words and the elaboration of the main points with a minimum length of 8 pages. The report should make clear that the topic has been understood and was critically assessed. - Each participant writes a 1-2 page review on a given written report. Particular attention should be paid to the comprehensibility and linguistic style of the summary.
- The topics will be presented in approximately 20 minute presentations followed by a discussion.
- For the on-site lectures, a laptop of the institute or an own laptop can be used. If the institute laptop is to be used, it is necessary to contact seminarcv@cg.tu-bs.de in time, at least two weeks before the presentations. In this case, the presentation slides must be made available at least one week before the lecture.
- The presentations will be given on site. If, for some reason, the presentations take place online, Big Blue Button will be used as a platform. In this case, students need their own PC with microphone. In addition, a video transmission during the own lecture would be desirable. If these requirements cannot be met, it is necessary to contact seminarcv@cg.cs.tu-bs.de in time.
- The language for the presentations can be either German or English.
- The presentation, the written review and the preparation of the report are mandatory requirements to pass the course successfully.
Files and Templates
- Kickoff-Folien
- Latex-Template (mandatory usage). If you want to use Overleaf as your editor you can also copy the following project: https://www.overleaf.com/read/jzdzwkfxkjsm (only readable until you copy the project)
- Slide-Template optional usage.
- Review-Template (mandatory usage)
Topics
- Spatiotemporal reservoir resampling for real-time ray tracing
with dynamic direct lighting
(Bitterli, Benedikt and Wyman, Chris and Pharr, Matt and Shirley, Peter and Lefohn, Aaron and Jarosz, Wojciech) ACM Trans. Graph.
Efficiently rendering direct lighting from millions of dynamic light sources using Monte Carlo integration remains a challenging problem, even for off-line rendering systems. We introduce a new algorithm---ReSTIR---that renders such lighting interactively, at high quality, and without needing to maintain complex data structures. We repeatedly resample a set of candidate light samples and apply further spatial and temporal resampling to leverage information from relevant nearby samples. We derive an unbiased Monte Carlo estimator for this approach, and show that it achieves equal-error 6×-60× faster than state-of-the-art methods. A biased estimator reduces noise further and is 35×-65× faster, at the cost of some energy loss. We implemented our approach on the GPU, rendering complex scenes containing up to 3.4 million dynamic, emissive triangles in under 50 ms per frame while tracing at most 8 rays per pixel.
https://research.nvidia.com/sites/default/files/pubs/2020-07_Spatiotemporal-reservoir-resampling/ReSTIR.pdf
Advisor: Fabian Friederichs - ReSTIR GI: Path resampling for real-time path tracing
(Ouyang, Yaobin and Liu, Shiqiu and Kettunen, Markus and Pharr, Matt and Pantaleoni, Jacopo) Computer Graphics Forum
Even with the advent of hardware-accelerated ray tracing in modern GPUs, only a small number of rays can be traced at each pixel in real-time applications. This presents a significant challenge for path tracing, even when augmented with state-of-the art denoising algorithms. While the recently-developed ReSTIR algorithm [BWP∗20] enables high-quality renderings of scenes with millions of light sources using just a few shadow rays at each pixel, there remains a need for effective algorithms to sample indirect illumination.
This paper introduces an effective path sampling algorithm for indirect lighting that is suitable to highly parallel GPU architectures. Building on the screen-space spatio-temporal resampling principles of ReSTIR, this approach resamples multi-bounce indirect lighting paths obtained by path tracing.
https://research.nvidia.com/publication/2021-06_restir-gi-path-resampling-real-time-path-tracing
Advisor: Fabian Friederichs - Learning multiple-scattering solutions for sphere-tracing of volumetric subsurface effects
(Leonard, Ludwig and Hoehlein, Kevin and Westermann, Ruediger) Computer Graphics Forum
Accurate subsurface scattering solutions require the integration of optical material properties along many complicated light paths. We present a method that learns a simple geometric approximation of random paths in a homogeneous volume of translucent material. The generated representation allows determining the absorption along the path as well as a direct lighting contribution, which is representative of all scattering events along the path. A sequence of conditional variational auto-encoders (CVAEs) is trained to model the statistical distribution of the photon paths inside a spherical region in presence of multiple scattering events.
https://ui.adsabs.harvard.edu/abs/2020arXiv201103082L/abstract
Advisor: Fabian Friederichs - Non-Line-of-Sight Reconstruction Using Efficient Transient Rendering
(Iseringhausen, Julian and Hullin, Matthias B.) ACM Trans. Graph.
Being able to see beyond the direct line of sight is an intriguing prospective and could benefit a wide variety of important applications. Recent work has demonstrated that time-resolved measurements of indirect diffuse light contain valuable information for reconstructing shape and reflectance properties of objects located around a corner. This paper introduces a novel reconstruction scheme that, by design, produces solutions that are consistent with state-of-the-art physically-based rendering. The method combines an efficient forward model (a custom renderer for time-resolved three-bounce indirect light transport) with an optimization framework to reconstruct object geometry in an analysis-by-synthesis sense.
https://light.informatik.uni-bonn.de/non-line-of-sight-reconstruction-using-efficient-transient-rendering/
Advisor: Fabian Friederichs - Low-Cost SPAD Sensing for Non-Line-Of-Sight Tracking, Material Classification and Depth Imaging
(Callenberg, Clara and Shi, Zheng and Heide, Felix and Hullin, Matthias B.) ACM Trans. Graph.
Time-correlated imaging, facilitated by Single-Photon Avalanche Diodes (SPADs), shows promise in lidar ranging, fluorescence lifetime imaging, and non-line-of-sight sensing, yet its high cost has hindered mass market adoption. Cheaper SPADs used in mobile devices as proximity sensors offer a more cost-effective solution, albeit with lower data quality. Through modifying an existing evaluation platform for these affordable SPADs, the paper presents a developed hardware/software system that enables applications like direct time-of-flight (ToF) depth imaging, non-line-of-sight object tracking, and material classification, previously limited to more expensive setups.
https://light.informatik.uni-bonn.de/non-line-of-sight-reconstruction-using-efficient-transient-rendering/
Advisor: Fabian Friederichs - PanoFormer: Panorama Transformer for Indoor 360° Depth Estimation
(Shen et al.) ECCV
Existing panoramic depth estimation methods based on convolutional neural networks (CNNs) focus on removing panoramic distortions, failing to perceive panoramic structures efficiently due to the fixed receptive field in CNNs. This paper proposes the panorama transformer (named PanoFormer) to estimate the depth in panorama images, with tangent patches from spherical domain, learnable token flows, and pano-rama specific metrics.
https://dl.acm.org/doi/abs/10.1007/978-3-031-19769-7_12
Advisor: Jannis Malte Möller - PanelNet: Understanding 360 Indoor Environment via Panel Representation
(Yu, Haozheng and He, Lu and Jian, Bing and Feng, Weiwei and Liu, Shan) CVPR
Leveraging the continuity and gravity of indoor 360 panoramas, PanelNet uses a novel panel representation of 360 images to understand indoor environments. The approach employs a panel geometry embedding network to mitigate panoramic distortion and a Local2Global Transformer to aggregate local and global context. The method excels in indoor 360 depth estimation, layout estimation, and semantic segmentation.
https://openaccess.thecvf.com/content/CVPR2023/html/Yu_PanelNet_Understanding_360_Indoor_Environment_via_Panel_Representation_CVPR_2023_paper.html
Advisor: Jannis Malte Möller - Decomposing NeRF for Editing via Feature Field Distillation
(Kobayashi et al.) NeurIPS
By distilling knowledge from 2D image feature extractors into a 3D feature field, distilled feature fields (DFFs) allow user-specified, query-based local editing of 3D scenes. This process successfully bridges 2D vision and language models to 3D scene representations, enabling effective segmentation and selective editing of NeRFs.
https://pfnet-research.github.io/distilled-feature-fields/
Advisor: Jannis Malte Möller - High-Resolution Image Synthesis With Latent Diffusion Models
(Rombach et al.) CVPR
Diffusion models, effective in image synthesis, are resource-intensive due to their operation in pixel space. Applying these models in the latent space of pretrained autoencoders can reduce complexity while preserving detail and boosting visual fidelity. The introduction of cross-attention layers can enhance the models' general conditioning inputs and facilitate high-resolution synthesis. The resultant Latent Diffusion Models (LDMs) improve performance on various tasks and significantly reduce computational requirements compared to pixel-based models.
https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_High-Resolution_Image_Synthesis_With_Latent_Diffusion_Models_CVPR_2022_paper.html
Advisor: Jannis Malte Möller - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
(Nicho, Alex and Dhariwal, Prafulla and Ramesh, Aditya and Shyam, Pranav and Mishkin, Pamela and McGrew, Bob and Sutskever, Ilya and Chen, Mark) PMLR
We investigate text-conditional image synthesis with diffusion models, comparing CLIP guidance and classifier-free guidance. The latter yields superior results in photorealism and caption similarity. Our large model even surpasses DALL-E, particularly when paired with classifier-free guidance. Furthermore, these models can be fine-tuned for image inpainting, enabling impressive text-driven image editing.
https://proceedings.mlr.press/v162/nichol22a.html
Advisor: Jannis Malte Möller - Fast and Accurate Illumination Estimation Using LDR Panoramic Images for Realistic Rendering
(Cheng et al.) TVCG
High dynamic range (HDR) images are commonly used for generating high-quality realistic rendering effects. Compared to the high-cost HDR imaging technique, low dynamic range (LDR) imaging provides a low-cost alternative and is preferable for interactive graphics applications. However, the limited LDR pixel bit depth significantly bothers accurate illumination estimation using LDR images. The conflict between the realism and promptness of illumination estimation for realistic rendering is yet to be resolved. In this paper, an efficient method that accurately infers illuminations of real-world scenes using LDR panoramic images is proposed. It estimates multiple lighting parameters, including locations, types and intensities of light sources.
https://ieeexplore.ieee.org/abstract/document/9887904
Advisor: Steve Grogorick - Luminance Attentive Networks for HDR Image and Panorama Reconstruction
(Yu et al.) CGF
It is very challenging to reconstruct a high dynamic range (HDR) from a low dynamic range (LDR) image as an ill-posed problem. This paper proposes a luminance attentive network named LANet for HDR reconstruction from a single LDR image. We propose a novel normalization method called “HDR calibration “for HDR images stored in relative luminance, calibrating HDR images into a similar luminance scale according to the LDR images, while specifically paying attention to the under-/over-exposed areas. In addition, we propose an extended network called panoLANet for HDR panorama reconstruction from an LDR panorama.
https://onlinelibrary.wiley.com/doi/full/10.1111/cgf.14412
Advisor: Steve Grogorick
Useful Resources
Example of a good presentation (video on the website under the Presentation section, note how little text is needed, and how much has been visualized to create an intuitive understanding).
General writing tips for scientific papers (mainly intended for writing scientific articles, but also good to use for summaries).