United States. Researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), in collaboration with other institutions, devised a system to better ensure privacy in video images from surveillance cameras. Called "Privid," the system allows analysts to send video data queries and adds some noise (additional data) to the end result to ensure a person can't be identified. The system is based on a formal definition of privacy, "differential privacy," which allows access to aggregated statistics on private data without revealing personally identifiable information.
Usually, analysts would only have access to the entire video to do whatever they wanted with it, but Privid makes sure the video isn't a free buffet. MIT explains that "honest analysts can gain access to the information they need, but that access is restrictive enough that malicious analysts can't do too much with it." To enable this, instead of running the code on the entire video in one go, Privid splits the video into small parts and runs the processing code on each part. Instead of getting results from each piece, segments are added and that extra noise is added. (There's also information about the error limit you'll get in your result, perhaps a 2 percent margin of error, given the additional noisy data added.)
For example, the code could generate the number of people observed in each video snippet, and the aggregation could be the "sum," to count the total number of people covering their faces, or the "average" to estimate crowd density.
Privid allows analysts to use their own deep neural networks that are common for video analysis today. This gives analysts the flexibility to ask questions that Privid's designers didn't anticipate. In a variety of videos and queries, Privid had an accuracy of 79 to 99 percent of a non-private system.
"We're at a stage right now where cameras are practically ubiquitous. If there's a camera on every street corner, every place you go, and if someone could process all those videos together, you can imagine that entity builds a very accurate timeline of when and where a person has gone," says PhD student Frank Cangialosi MIT CSAIL, lead author of a paper on Privid. "People are already concerned about gps location privacy – video data together could capture not only their location history, but also moods, behaviors, and more at each location."
Privid introduces a new notion of "privacy based on duration," which decouples the definition of privacy from its compliance: obfuscation, if your privacy goal is to protect all people, the compliance mechanism must work to find the people to protect, who may or may not do so perfectly. With this mechanism, you don't need to specify everything completely and you're not hiding more information than necessary.
Let's say we have a video overlooking a street. Two analysts, Alice and Bob, claim they want to count the number of people passing by every hour, so they send a video processing module and request a total sum.
The first analyst is the city planning department, which hopes to use this information to understand tread patterns and plan city sidewalks. Their model counts people and generates this count for each video fragment.
The other analyst is malicious. They expect to identify themselves every time "Charlie" passes in front of the camera. Their model only looks for Charlie's face and generates a large number if Charlie is present (i.e. the "signal" they are trying to extract), or zero otherwise. His hope is that the sum will not be zero if Charlie was present.
From Privid's perspective, these two consultations appear identical. It's difficult to reliably determine what your models might be doing internally or what the analyst expects to use the data for. This is where the noise comes in. Privid executes both queries and adds the same amount of noise for each. In the first case, because Alice was counting all the people, this noise will only have a small impact on the result, but it will probably not affect the utility.
In the second case, since Bob was looking for a specific signal (Charlie was only visible during a few fragments), the noise is enough to prevent them from knowing whether Charlie was there or not. If they see a non-zero result, it could be because Charlie was actually there, or because the model generates "zero," but the noise made it non-zero. Privid didn't need to know anything about when or where Charlie appeared, the system just needed to know an approximate upper limit on how long Charlie could appear, which is easier to specify than to find out the exact locations, on which the above methods are based. .
The challenge is to determine how much noise to add: Privid wants to add just enough to hide everyone, but not so much that it's useless to analysts. Adding noise to the data and insisting on queries throughout the time windows means that your result will not be as accurate as it could be, but the results will still be useful and provide better privacy.
Source: MIT.


