Harvard CS263: Object Tracking and Image Search

Object Tracking in the Presence of Occlusions via a Camera Network
Ercan, et al.

Distributed Image Search in Sensor Networks
Yan, et al.

The first paper that we discussed this past Thursday addressed the problem of tracking a static object with multiple occluders (objects blocking the camera’s view of the desired object). The occluders could be moving or static. Compared to the other papers we have discussed so far, this one was heavy on the mathematics, mainly because the paper was focused on explaining the model used to simulate the situation. Also, the only idea that was specific to sensor networks was the idea of reducing the entire image from a camera to one single scan line to minimize the amount of information transferred on the network. It was concluded that this paper comprised in part the “Information Processing” realm of sensor networks.

The authors state the many assumptions that were made, and it was unclear to some of us what the system would or would not detect in special situations. However, we came to the consensus that the assumptions were valid given the constraints on the mathematical tools that the authors used. Something that was not brought up in the paper but addressed in the discussion was the situation of partial occlusions. The process of reducing the entire frame down to one scan line could be problematic in dealing with such situations and some members of the discussion felt that this reduction simplified the assumptions too much.

One person brought up an interesting point questioning whether it is necessary to even track a single object with many occluders. Is it not enough to know whether or not the object is in the room? This was under the presumption that this object tracking system would be deployed for security in such areas as malls, airports, etc. This binary output would be much simpler to implement since each camera just has to report whether or not the desired object is present in its line of vision. It was also hypothesized by some that this work was just a preliminary step towards placing cameras on mobile platforms and the system would then be used to track objects from such things as vehicles, plane, etc.

The second paper we discussed dealt with the problem of distributed image search where each sensor captures images. The user can send ad-hoc or continuous queries to a proxy that would obtain the best search results from the entire network. First we established the assumption that the SIFT algorithm will do a good enough job of translating image information and that the main problem then becomes capturing and querying images on a sensor network. Although it was stated in the paper, it was worthwhile to remind everyone that this system only works when all the objects being monitored similar. For example, one cannot search for a book cover among images of bird species. In this case, the best result visterms would most likely be irrelevant to the search.

After going through the process in which each mote stores images and visterms in an inverted index table, we went on to discuss the evaluation section. One person brought up the fact that there is no end-to-end experiment to measure the amount of time it takes actually pull images from the remote sensor nodes. It would have been nice to see such an experiment since images can be quite large (on the order of 1.6kB) and transferring each image would likely take on the order of 4 packets. There was some skepticism about the power usage value for the CC2420 radio shown in Table 1. One person made the case that the value 214.0 mW was too high and the actual value should be significantly smaller.

Overall, the paper was received very well and we agreed that every issue pertaining to distributed image search was addressed, from storing images on a constrained platform to searching for them. However, the image capture rate of one image every 30 seconds was viewed as unrealistic. A better value would have been on the order of 10Hz. Also, after doing some quick calculations, we found that if the motes had been battery-powered, their lifetime would be on the order of 1.5 hours, which is unacceptable in a widespread deployment. This led to the conclusion that the authors conducted the experiments with motes plugged into an AC power source, rather than batteries. This brought up the larger question: Are motes a sustainable platform on which to implement distributed image search? We agreed that the authors did a nice job of heavily optimizing their solution to work on motes, but, if this still results in a 1.5 hr battery lifetime, then maybe the technology is just not there yet to handle such problems. It was brought up that maybe the gap between what the motes allow and what they are being used for is too wide for the application of distributed image search.

Harvard CS263

Saturday, April 18, 2009

Object Tracking and Image Search

Course Links

Followers

Blog Archive