The work still has a long way to go, but it can already identify Kirby.
Researchers at the University of Maryland have turned eye reflections into (somewhat discernible) 3D scenes. The work builds on Neural Radiance Fields (NeRF), an AI technology that can reconstruct environments from 2D photos. Although the eye-reflection approach has a long way to go before it spawns any practical applications, the study (first reported by Tech Xplore) provides a fascinating glimpse into a technology that could eventually reveal an environment from a series of simple portrait photos.
The team used subtle reflections of light captured in human eyes (using consecutive images shot from a single sensor) to try to discern the person’s immediate environment. They began with several high-resolution images from a fixed camera position, capturing a moving individual looking toward the camera. They then zoomed in on the reflections, isolating them and calculating where the eyes were looking in the photos.
The results (here’s the entire set animated) show a decently discernible environmental reconstruction from human eyes in a controlled setting. A scene captured using a synthetic eye (below) produced a more impressive dreamlike scene. However, an attempt to model eye reflections from Miley Cyrus and Lady Gaga music videos only produced vague blobs that the researchers could only guess were an LED grid and a camera on a tripod — illustrating how far the tech is from real-world use.
The team overcame significant obstacles to reconstruct even crude and fuzzy scenes. For example, the cornea introduces “inherent noise” that makes it difficult to separate the reflected light from humans’ complex iris textures. To address that, they introduced cornea pose optimization (estimating the position and orientation of the cornea) and iris texture decomposition (extracting features unique to an individual’s iris) during training. Finally, radial texture regularization loss (a machine-learning technique that simulates smoother textures than the source material) helped further isolate and enhance the reflected scenery.
Despite the progress and clever workarounds, significant barriers remain. “Our current real-world results are from a ‘laboratory setup,’ such as a zoom-in capture of a person’s face, area lights to illuminate the scene, and deliberate person’s movement,” the authors wrote. “We believe more unconstrained settings remain challenging (e.g., video conferencing with natural head movement) due to lower sensor resolution, dynamic range, and motion blur.” Additionally, the team notes that its universal assumptions about iris texture may be too simplistic to apply broadly, especially when eyes typically rotate more widely than in this kind of controlled setting.
Still, the team sees their progress as a milestone that can spur future breakthroughs. “With this work, we hope to inspire future explorations that leverage unexpected, accidental visual signals to reveal information about the world around us, broadening the horizons of 3D scene reconstruction.” Although more mature versions of this work could spawn some creepy and unwanted privacy intrusions, at least you can rest easy knowing that today’s version can only vaguely make out a Kirby doll even under the most ideal of conditions.