Technical University of Munich Researchers Develop AI Robot That Uses Language Models and 3D Vision to Locate Misplaced Objects

TUM researcher Benjamin Bogenberger combines three-dimensional vision with language models.

(IN BRIEF) Researchers at the Technical University of Munich have developed a robot capable of locating misplaced objects by combining three-dimensional vision with language models that encode contextual knowledge about everyday environments. The robot constructs an accurate spatial map of its surroundings and uses information from language models to determine where an item is most likely to be found, allowing it to search rooms far more efficiently than random scanning. By assigning probability scores to different locations, the robot can prioritize likely search areas and identify changes in its environment with high accuracy. The technology demonstrates how visual perception and language-based reasoning can be integrated to improve robotic autonomy. Future developments will focus on enabling robots to physically interact with their surroundings, allowing them to search inside cupboards and drawers. The research contributes to broader efforts at TUM to develop intelligent robots capable of operating in complex real-world environments.

(PRESS RELEASE) MUNICH, 12-Mar-2026 — /EuropaWire/ — Technical University of Munich (TUM) researchers have developed a robot capable of locating misplaced objects by combining three-dimensional vision with knowledge derived from language models. The system integrates information gathered from its surroundings with contextual knowledge to determine where an object is most likely to be found, enabling it to search more efficiently.

The robot, created at Prof. Angela Schoellig’s Learning Systems and Robotics Lab, is designed as a mobile platform with a camera mounted on top. While visually simple in appearance, the machine represents a significant step forward in robotics because it combines visual perception with reasoning capabilities that help it understand how objects relate to everyday environments.

The search robot has a 3D camera on board to find lost items such as glasses.

To locate an item such as a misplaced pair of glasses, the robot first scans its surroundings and generates a detailed spatial map of the room. Although the camera initially captures two-dimensional images, each pixel contains depth information that allows the system to reconstruct a precise three-dimensional representation of the environment. This spatial model is updated continuously, providing centimeter-level accuracy. A connected laptop processes the images to determine which objects are visible and how they relate to human activity.

According to Angela Schoellig, the project aims to equip robots with the ability to interpret their environments in a meaningful way. Developing this kind of contextual awareness is essential for machines that must operate autonomously in real-world settings where layouts and objects frequently change. The approach is particularly relevant for future applications such as humanoid robots working in industrial facilities or robotic assistants operating in domestic care environments.

A key element of the system is its ability to translate general knowledge from language models into instructions the robot can use. For instance, the robot understands that everyday items like glasses are commonly placed on surfaces such as tables or window sills, while locations such as stovetops or sinks are unlikely storage spots. The language model identifies relationships between objects and environments, and this information is converted into a format that the robot can incorporate into its spatial map.

Within the map, the robot assigns numerical probabilities to different locations, continuously recalculating where the searched object is most likely to appear. As a result, the robot prioritizes high-probability areas rather than scanning the room randomly. Tests conducted by the research team show that this approach allows the robot to locate objects nearly 30 percent more efficiently compared with random search strategies. Artificial intelligence plays a dual role in this system: it powers both the visual recognition process and the contextual reasoning provided by the language model.

Another important capability of the robot is its ability to remember previously captured images of its environment. By comparing past and current images, the system can detect changes within the room. If a new object suddenly appears in the scene, the robot identifies the difference with an accuracy of around 95 percent and marks the area as a highly probable location for the item being searched.

The research team is already planning the next stage of development. Future versions of the robot will be designed to search inside cupboards, drawers and other enclosed spaces. Achieving this will require the robot to interact physically with its surroundings rather than simply observing them.

To accomplish this, the robot will need robotic arms and hands capable of opening doors and drawers while understanding how different mechanisms work. It must determine how a cupboard opens, identify handles and apply the correct motion to manipulate them. These capabilities would enable robots to perform more complex search tasks in everyday environments where objects are often hidden from view.

The research behind the system is detailed in the scientific paper Where did I leave my glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments, authored by Benjamin Bogenberger, Oliver Harrison, Orrin Dahanaggamaarachchi, Lukas Brunke, Jingxing Qian, Siqi Zhou and Angela P. Schoellig. The study was published on March 3, 2026 in IEEE Robotics and Automation Letters.

The work is also connected to the Munich Institute of Robotics and Machine Intelligence (TUM MIRMI), an interdisciplinary research institute at TUM dedicated to advancing robotics and artificial intelligence. The institute brings together expertise from nearly 80 university chairs to develop innovative robotic and AI-driven technologies across areas including healthcare, environmental monitoring, mobility, work, and security.

Publications

Where did I leave my glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments; Benjamin Bogenberger, Oliver Harrison, Orrin Dahanaggamaarachchi, Lukas Brunke, Jingxing Qian, Siqi Zhou, Angela P. Schoellig; IEEE Robotics and Automation Letters, 3. März 2026; ieeexplore.ieee.org/document/11359697

Further information and links

  • Scientific video: https://utiasdsl.github.io/semi-static-semantic-exploration/
  • Prof. Angela Schoellig is a member of the board of the Munich Institute of Robotics and Machine Intelligence (TUM MIRMI). The institute is an integrative research institute at the Technical University of Munich (TUM) that focuses on robotics and AI. The institute brings together expertise in key areas of robotics, including perception and data science. Nearly 80 TUM chairs are networked within MIRMI to develop innovative robotic and AI-supported solutions for the environment, health, mobility, work, as well as security and defense. TUM MIRMI is headed by Prof. Lorenzo Masia. Further information can be found at www.mirmi.tum.de.

Media Contacts:

Corporate Communications Center
Andreas Schmitz
presse@tum.de

Contacts to this article:

Prof. Angela Schoellig
Chair of Safety, Performance and Reliability for Learning Systems
Technical University of Munich (TUM)
angela.schoellig@tum.de

SOURCE: Technical University of Munich

MORE ON TECHNICAL UNIVERSITY OF MUNICH, TUM, ETC.:

EDITOR'S PICK:

Comments are closed.