Latin America. Video surveillance systems are beginning to include Visual-Language (VLM) models in order to facilitate searches and interactions through everyday language, avoiding the need for technical filters or specialized training.
This booming technology, based on artificial intelligence, makes it easier for users to issue instructions in natural language and obtain exact results from images or recordings.
Visual-Language models have the ability to decode what the user is looking for, whether it is an object, an individual, or a particular circumstance, incorporating factors such as context, space, and time. This technology aims to reduce the complexity of handling conventional systems, which require the management of several filters and pre-established attributes.
A case of this functionality was observed in a recent demonstration, in which a VLM-based video surveillance system carried out complex searches such as: "The individual in red suit who entered through the front door before night", without requiring technical intervention.
"Imagine being able to ask a security system: 'Show me a blue Toyota car in the park yesterday', and to do it, without complex filters, without endless menus, without prior training," reads one of the descriptions presented in the context of this technology.
The developers indicate that the VLM's potential is not limited to retrospective searches. Its application is also projected in alerts generated in real time, under rules expressed in natural language. For example: "Let me know if someone comes into the warehouse wearing a helmet after 8 p.m."
Among the companies that are already working on this type of solution is VIVOTEK, a manufacturer of Taiwanese origin with a presence in Latin America.


