Thursday, April 9, 2009

Regiment and Macrolab

The Regiment Macroprogramming System
Ryan Newton et al.
MacroLab: A Vector-based Macroprogramming Framework for Cyber-Physical Systems
Timothy Hnat et al.

Last Thursday in class we discussed Regiment and Macrolab, continuing our tour through programming models for wireless sensor networks. First we looked at Regiment's evaluation, generally agreeing that number of messages sent was a reasonable rough estimate. However, the number of retries and low-power-listening evesdropping would exponentially increase with network density and affect the energy efficiency. Then we looked at node failure, which would drop the entire subtree until the next tree formation round (epoch). There didn't seem to be a good measure of reliability built in, how many nodes contributed to the value at a given time. Overall we preferred the region metaphor in Regiment to the "vectors" in Macrolab, though most of us are not Matlab or database folks.

We discussed where it fell in the programming language spectrum, deciding eventually that it was between tiered and declarative as it had controls to manage granularity. The big differences between the different languages and programming environments we've read about is in the mental models. Earlier systems addressed problems in great detail just above the hardware level while the more recent papers present intermediate and high level modes of thinking about the problem. The ability to define nested regions is the real difference between Regiment and TinyDB, along with time being a primitive. We returned once again to the problem of time synchronization, what values were really reported at the base station and how accurately they reflected the current state of the sensor space. TinyDB merges old data with new as packets get relayed slowly from the outer reaches of the network.

Someone suggested we might be able to fix this by adding a local timestamp as part of the gathered data and buffering the last bunch of values, returning the one that best matches the requested time when queried. But we still need to map between local and global time to adjust for clock drift. Distributed algorithms in general need a lot more information in order to obtain precise measurements - network depth can have a huge impact on the time required to know everything that happened at a specific instant even discounting the overhead of time synchronization.

We ended up deciding the main contributions were defining functions over regions, allowing programmers to deal with mid-level abstractions with more flexibility than TinyDB but still taking care of all the details of compiling global network definitions to individual node code.

We then moved on to Macrolab, which defined programs in terms of the data needed. Several expressed concerns about trusting their decomposer to generate reasonable code, which others likened to the relational database revolution. Others worried about the cost model, though all thought the idea of where to store the data under what network and workload conditions was a real concern worth study. The paper fell just short of demonstrating run-time adaptivity that would be extremely useful. Even with run-time adaptivity, poorly balanced workloads could cause thrashing between modes and poor overall behavior.