Harvard CS263: February 2009

Friday, February 27, 2009

Flush and RCRT: Transporting your data reliably

Flush: A Reliable Bulk Transport Protocol for Multihop Wireless Networks
(Kim, et al.)

RCRT: Rate-Controlled Reliable Transport for Wireless Sensor Networks
(Paek, et al.)

Today we discussed two protocols that solve the problem of reliable and rate limited transport in wireless sensor networks. First, we discussed the similarities between the two designs and came to the conclusion that both papers use end-to-end NACKS, link-layer ARQ, and provide some sort of mechanism to communicate control information from the nodes. One thing that was not explicitly stated in either of the papers was the different applications for which Flush and RCRT were designed. This addressed some of the skepticism I had about the feasibility of these implementations for general sensor network deployments, since each paper makes assumptions that are not be universally applicable.

I thought the Flush paper somewhat oversimplified the problem of source to sink data transport by accounting for only 1 data flow at any instant. This means that the sink has to directly request data from each source node. While the design works great from a single flow, it also means that all the congestion and rate control information only goes one way: from the source to the sink. There is no mention of the Flush design working in the other direction, and packets from the sink are transported reliably by each node transmitting to the next node in the routing path with ACK confirmation at each step. Flush could probably be modified to handle both directions with some additional overhead.

In the intro, the authors mention that Flush can adapt to varying network conditions. When I first read this, I assumed that Flush would adapt to nodes leaving the network and re-entering or scenarios where the routing path changes. In the evaluation section, I was disappointed to see that the test for adaptability was just having the nodes drop packets at a probability of 0.5 for an interval of 10 seconds. Therefore, the adaptability claim wasn’t properly addressed in terms of node failures, and all other experiments were conducted by freezing the routing tree for every run.

The main discussion point concerning RCRT was the “totalitarian” nature of the relationship between the sink and the network. In other words, when a node’s RTT for repairing a loss exceeds 4, the sink limits the rate of EVERY node by using a multiplicative rule. To bring the network’s rate back up again, every node has to fall below the threshold. We inferred that the authors did this maybe because modifying the rates of each node on an individual basis is too difficult. The network becomes too complex, and it becomes exceedingly difficult for the sink to understand exactly how the nodes affect and interact with each other.

One of the more general philosophical points dealt with the inevitable choice between avoiding congestion and detecting congestion and taking subsequent action. In the RCRT paper, we saw that the network “heals” itself as congestion increases. However, IFRC takes the opposite approach and does what is necessary to prevent the network from being congested. We came to the conclusion that the latter approach will most always lead to better energy utilization because there is never any wasted “effort”. In other words, packet loss leads to the wasted energy since the amount utilized cannot be regained.

Tuesday, February 24, 2009

Flooding? Trickle and RBP

Trickle: A Self-Regulating Algorithm for Code Propagation and Maintenance in Wireless Sensor Networks

(Levis et al.)

RBP: Robust Broadcast Propagation in Wireless Networks

(Stann et al. SenSys’06)

Today we discussed “flooding” protocols and quickly entered a discussion regarding the definition of flooding. The RBP paper defines flooding as a network-level mechanism that sends information to all nodes in a connected network. However, the RBP paper does not actually introduce a flooding protocol, but rather a robust broadcast protocol that can be used in a flooding protocol… very confusing. I agree this is a semantics discussion, but an important one when discussing the minutia of these protocols. Trickle, on the other hand, is a form a flooding, but more specifically a consensus algorithm. We agreed that there was nothing inherent to Trickle that made it suitable for only code propagation. In general, it can be considered a consensus protocol that is used in this case for pre-code propagation.

Many of us missed the oh so important point that Trickle does not actually perform code propagation, despite its title. The authors even mention this in section 5.1: “The actual code propagation mechanism is outside the scope of Trickle…” What Trickle actually does is form consensus throughout the network as to what is the most recent version of code. Using this consensus, nodes may request updates if they are out of date, but the actual propagation of code is not done by Trickle… hmm, sneaky. I don’t understand why the authors chose to market Trickle in this manner. Why couldn’t they introduce Trickle as a consensus forming algorithm for WSN that, by the way, can be used in a code propagation application? A consensus algorithm would be more general and more useful to other people than what Trickle is now.

We talked about Trickle’s usefulness in real world applications. Someone pointed out that Trickle requires nodes to be always on, which severely impacts node lifetimes by wasting energy. We briefly discussed how we could implement Trickle with LPL and quickly realized that this is not such an easy problem as one might think. With LPL you have a duty cycle and a preamble before each packet. For an efficient network with long duty cycles and preambles, the latency of a transmission means that code will take long to propagate through the network. You can dynamically adjust your duty cycles, but both nodes must agree on the check interval… you can’t do this independently. Maybe you can use RBP to flood the network with the check intervals… maybe not. Another issue that came up was that in Trickle you are limited to a 30 byte routine payload. This seems very unrealistic and very much tailored to Maté. A general code propagation algorithm should accept larger code payloads.

Even though this argument might not apply to these papers because they are a couple of years old, here it goes. Almost all papers that we have read so far claim scalability and demonstrate that claim using simulators. However, most use a small number of nodes in their real-world evaluations (20-40). A common explanation is that they did not have enough motes for a larger test. However, given today’s test beds (Harvard’s MoteLab, Ohio State’s Kansei… etc) with over 200 nodes there is no excuse not to perform large scale experiments. Most of these test beds give out accounts to researchers and are created for this purpose. The next time I see a new paper that does a real world experiment with 5 motes on their desk, but claims scalability, I will be very skeptical.

Sunday, February 22, 2009

Networking: Dozer and IP is dead

Dozer: Ultra-Low Power Data Gathering in Sensor Networks
(Nicolas Burri, Pascal von Rickenbach, and Roger Wattenhofer, ISPN '07)

IP is Dead, Long Live IP for Wireless Sensor Networks
(Jonathan Hui and David Culler, SenSys '08)

Thursday we started the discussion by contrasting the two papers at a high level. They both address the problem of networking in wireless sensor networks at all levels with very different philosophies: minimalist vs. maximalist. Dozer is perfectly orchestrated, hand tailored to reduce radio duty cycles while maintaining low message loss under a light load (sampling the environment every minute and a half and beacon messages every half minute), while IPv6 focuses on interoperability and standardization.

Particularly clever bits in Dozer were using a random jitter after TDMA to alleviate transmission conflicts between different branches of the routing tree. Another was tracking the time drift instead of resetting the clock, as earlier papers have mentioned how sensitive applications can be to small perturbations in timing. Dozer forgoes CSMA and low power listening, gaining efficiency by allowing child nodes to sleep through their siblings transmission time and using random (but predictable for parents and children via shared seed) backoffs. Nodes follow two TDMA schedules, one dictated by their parent and one they generate for their children.

Looking for failure modes and things we would have designed differently, we discussed the problems with adding new nodes in a TDMA scheme, epoch growth negatively impacting turnaround time, and the problems with selecting parents mainly on hopcount. Sniffing in low-power-listening is used to gain an idea about link quality and contention, which can't be used when specifically powering off the radio during other's turns. Selecting parents based on hopcount will frequently mean choosing a node at the edge of the nodes detection range, which can be a lower quality connection and result in lost packets thus increasing latency. One simple addition would be to factor in RSSI when detecting and choosing parents in the beacon phases. Another would be to track failed packets and switch parents after a certain threshold.

Dozer buffers one message per node to relay upstream, which some thought might mean one per child and others one per descendant (child/grandchild/etc), and it refuses to accept another packet from that child until the current message has been relayed. Thus bursty network behavior will cause congestion for a long time.

We spent a few minutes interpreting the graphs, figuring out what amount of time was represented in each and what the low duty cycle actually was (2% and .2% are an order of magnitude different, and are each stated in different parts of the paper). One graph we would have liked to see was transmission loss over time - did the network converge towards stability or were the losses constant throughout? One participant took issue with their energy evaluation: radio transmission normally dominates the energy costs, but is it still overwhelming at .2% duty cycle?

Wrapping up out discussion on Dozer, we agreed they presented a full working vertical solution instead of attacking a single layer, which is a holy grail of WSN networking. The low transmission rate assumption limits their applicability, but it was openly admitted and discussed.

Moving on to IPv6 on WSN, we examined why we would even want to try such a implementation and came up with interoperability and standardization. Several wanted our sensor nodes to be able to communicate with other devices (an actuator motor locking a door), to reuse existing tools and applications from regular networking contexts, and to make motes more accessible to programmers. One participant was at the conference talk and relayed how differently this paper was presented there. Jonathan started with "Here is an efficient, working system", went into the benchmarks/graphs, and then described the problem domain and what they encountered along the way of building it (which is the inverse of most systems talks).

The underlying idea is to move the narrow waist up from the active message layer to the network layer by adopting a recognized standard. The network processing community is concerned with quality of service and reliability, however WSN's are not just about moving raw data to the base station, local aggregation can also be important (local max or average of sensor values). So, using IPv6 loses the local topology benefits of tweaking our own MAC and networking protocols but gains a lot in accessibility to normal programmers. Chirps allow nodes to power down their radios when they know they won't need to listen to a message, which greatly improves performance. Synchronous acks allow the application to decide whether to retransmit and to whom.

We briefly talked about Zigbee and Bluetooth as heavyweight protocols, over-engineered for their frequent uses. The internet evolved more from tradition instead of standards. Processing was done at endpoints as much as possible, though some companies like Akamai are moving away from that now. It's governed more by a rough consensus in running code than strict adherence to particular standards, which is a very different social model. The IP paper is also coming from that philosophy, complex but flexible and demonstrate a full working example before expecting anyone to listen.

Thursday, February 19, 2009

t-kernel: Providing Reliable OS Support to Wireless Sensor Networks
(Lin Gu and John Stankovic, Sensys 2006)

Resource Aware Programming in the Pixie OS
(Konrad Lorincz et al, Sensys 2008 )

Today we took the plunge into OS design space for wireless sensor
networks. The TinyOS system encourages a component based design where
higher level components abstract away lower level or the hardware
components. While designing any application of reasonable complexity
the programming model quickly becomes a web of components and I have
found it difficult to follow the chain of execution without looking
into the app.c file. A lot of people would argue that TinyOS is not an
operating system but rather a collection of drivers, exposing too many
knobs for the end application to manage. At the other end of the
spectrum are systems like Eon which seeks to abstract away most of the
system complexity and exposes perhaps a little too less. As the
application complexity grows, it would become necessary to support
higher level services like protection domains and virtualized
resources, but the question is do we just strip down existing OSs or
do we need to come up with something more clever.

Gu's paper brought out some interesting debates in the class. Firstly,
it seems like the translations would have been easier at compile time.
One member thought that this approach was taken because it was easier
to implement. Another member pointed out that it was easier to move a
smaller binary to the flash. Moreover it was clear that reasoning
about resources was no longer straight forward (assuming that you
wanted to trace/ model resource consumption). What would have been a
simple call to switch on the LED and measure its power draw, can now
be preceded by swaps to and from the flash, making it difficult to
measure the energy consumed by the operation. Thus one form of
simplicity was traded off for reliability guarantees. Many people
complained that the paper did not bring out the reliability aspect of
the system. Some people felt that it would have been more convincing
if could be shown that the system was able to detect bugs in popular
existing "wild" programs or using other fault injection techniques.
Perhaps the only guarantee that the system provides is that a buggy
application would cause a callback to the kernel. A member argues that
this functionality could have been implemented with a hardware timer
that caused the node to reboot if it hung. The participants also felt
that the code bloat was perhaps excessive but did not necessarily
achieve the performance acceleration associated with such bloating.
Finally, one member vehemently defended t-kernel as a convoluted way
to write a micro-kernel whereas another member thought that is a
software fault isolation on the mote platform.

The Pixie paper was presented at this year's Sensys (along with
lance). It was pointed out that a number of operating systems,
specially those for laptops or mobile phones, tend to be resource
aware. However they might not expose those parameters to the end
application, eg DVFS allows the kernel to slow down operations without
the application really noticing. Having coming into this project late
the data-flow model does seem somewhat intuitive. However the fact
that tickets are not enforced implies that the scratchpad versions of
my programs bypass them. Working with Pixie for the class projects
would definitely help uncover shortcomings with the existing system.
For a while we have been planning on writing a profiling tool that
measures the steady state resource consumption of individual stages
and hopefully that will come out soon.

All these papers make me wonder if we are focusing too much on
managing individual nodes. After all the purpose of the entire WSN is
to monitor a phenomenon and I think that the time is right to take a
more holistic approach towards resource management. Perhaps it would
be easier to reason about the system as a whole than in its pieces.

Tuesday, February 17, 2009

Operating System Design for WSNs and the nesC Programming Language

The nesC Language: A Holistic Approach to Networked Embedded Systems

(Gay et. al., PLDI 2003)

The Emergence of Networking Abstractions and Techniques in TinyOS

(Levis et. al.)

For Thursday’s discussion, we widened our scope a bit focusing on two papers which addressed overarching aspects of OS design for WSNs. We began by discussing various aspects of nesC, a C-based language designed specifically for programming wireless sensor networks. One aspect of nesC which sparked some debate was the ‘static’ nature of the language, meaning that all memory allocation for a program is set at compile time. While some people agreed with the authors that this design offered the programmer security in knowing that a mote left running for months at a time would not run out of memory, a number of people took issue with this approach. One issue with this static approach to memory allocation is that programmers may be forced to make inefficient use of memory when buffering data of an unpredictable size, since they must allocate all the memory they might need up front. Worse, if one wants to take code written nesC for a particular mote platform and port it to one with different memory constraints it may be necessary to manually re-write all the parts of the code which allocate buffers. Some people argued that this problem could be solved by good programming practice, using constants to clearly define your buffer size in your program header.

I particularly liked one suggested solution to this problem: an automatic buffer-sizing program which could run at compile time and configure buffers appropriately. This might be accomplished by adding a range syntax when defining buffers, allowing the compiler to pick a specific buffer allocation at compile time given the memory constraints of other components in the program and the hardware platform. It seems to me that something like this would go a long way toward making the TinyOS vision of reusable components (more of) a reality.

Another key focus of nesC is dealing with concurrency. In nesC, code can be run one of two ways: asynchronously from an interrupt handler, or synchronously as part of a scheduled 'task'. The creators of nesC are particularly concerned about 'data races' resulting from concurrent attempts to access a single variable. If it is necessary to ensure that particular block of code cannot be interrupted, we can declare a section of code "atomic." On this point, a number of people worried about the way in which nesC implements atomicity: it turns off all interrupts. This struck me as a rather dramatic approach, and several people worried that a long section of atomic code might lead a mote to miss important sensor information, or render it unable to keep up with the stringent timing requirements of some of the MAC protocols we've been discussing. One person offered a potential fix for this problem, suggesting that the compiler could try to model program timing and detect situations in which the program could be liable to miss a timer interrupt or violate other key timing constraints.

One aspect of nesC which everyone seemed to appreciate was the use of interfaces and parameters to ensure a clean program structure. While this structure can be overly verbose in trivial programs, it makes program structure considerably more readable and helps to enforce good programming practices. As we considered the potential future of nesC and TinyOS, asking whether they would still be prevalent 10 years from now, this clean structure was especially important. The clean relatively simple structure of both nesC and TinyOS were votes in their favor in this regard, especially when we considered alternative UNIX-type approaches.

Personally, I'm looking forward to actually trying out nesC over the next few weeks before I make any final calls as to its strengths and weaknesses. I like the clean structure and syntax of the language, and the almost hardware-like metaphor of wiring components together, but I'm curious to see how these aspects of the language will actually feel when I sit down to write some real code.

Monday, February 16, 2009

Research projects and collaboration

Mike Mitzenmacher posted an interesting article about doing collaborative research on his own blog, which is intended for students in his graduate course (CS222) this term, but is just as relevant for this class as well. Good reading.

Tuesday, February 10, 2009

Z-MAC and Component-Based MAC Architecture

Z-MAC: a Hybrid MAC for Wireless Sensor Networks
(Rhee et al., SenSys 2005)

A Component-Based MAC Architecture for Power-Efficient Media Access Control in Wireless Sensor Networks
(Rhee et al., SenSys 2005)

Today we continued our discussion of MAC protocols by discussing Z-MAC, a hybrid CSMA / TDMA protocol. CSMA is ideal for low traffic conditions, but can yield low throughput under high contention. TDMA can efficiently schedule nodes and maintain high channel utilization, but slots are wasted under low traffic conditions, there are other problems, such as distributed slot assignment. Z-MAC is proposed as a hybrid of these two approaches, primarily utilizing CSMA but switching to TDMA mode when under high contention. This should, in theory, improve performance at both ends of the utilization spectrum, while mitigating many of TDMA's drawbacks by gracefully degrading to CSMA.

The class raised several interesting points about Z-MAC (specifically the TDMA element). One main concern was the practical difficulties involved in slot assignment. Z-MAC does initial slot assignment at startup, but simultaneous startup may be impractical in real-world deployments. Even if the practical difficulties of slot assignment are handled smoothly, the links are assumed to be static until the setup phase is run again. This may not be entirely true in a real deployment either, as even the paper states that "sensor networks may undergo frequent topology changes." While distributed slot assignment is certainly possible, many of us felt that this may be more difficult to achieve in practice than implied.

We also discussed the HCL (High Contention Level) and LCL (Low Contention Level) modes employed by the protocol. HCL is triggered by a ECN (Explicit Contention Notification) message, and nodes enter a TDMA-like mode, where slot owners have a smaller contention window (and thus, priority for their assigned slots). One major point that arose during our discussion focuses on why Z-MAC allows nodes in a one-hop neighborhood to contend for a neighbor's slot when that neighbor does not utilize it's own slot in HCL mode? For example, consider a three-node chain in HCL mode: A--B--C, assigned slots 1,2,3 respectively. If Slot=2, and the owner (B) does not transmit, then both A and C could contend for the same slot and transmit, since they are both in the one-hop neighborhood of B. This could cause a hidden terminal problem. Thus, the only way of avoiding hidden terminals is actually to limit slot contention to the slot owner(s). However, upon further discussion, it was suggested that if a slot is not going to be utilized, then deliberately allowing a situation where a hidden terminal could arise is better than just wasting the slot entirely. On the other hand, energy efficiency is important in WSNs -- so some argued against allowing the one-hop neighborhood to contend for a slot, because transmissions might be useless, wasting precious energy.

This brought us to our next topic: energy utilization. Clearly, the overhead of Z-MAC (such as slot assignment / clock synchronization) will utilize more energy than a simple CSMA-only scheme requiring none of this overhead. The big question for us: Do the energy savings of HCL under high contention really justify the energy cost of this overhead? We feel this would be an interesting area for deep exploration, particularly through the lens of practical workloads.

We also discussed practical issues regarding time synchronization. In an application with very low network utilization, synchronizing clocks every 100 packets may result in large amounts of clock drift between nodes. If traffic suddenly surges, and ECN messages are broadcasted, when nodes switch into HCL mode, they may have extremely out-of-sync clocks. It's not entirely apparent that the LCL to HCL switch is necessarily a smooth one under a situation like this.

Ultimately, we wondered how Z-MAC would perform under real-world workloads. We agreed that situations where nodes are transmitting at near-maximum capacity are rare in real-world sensor networks, and want to see more a detailed analysis of all the popular MAC protocols through a variety of workloads. This brought us back to our previous points regarding the need for standardized MAC protocol benchmarks. In general, while we thought that Z-MAC was a strong idea, we had difficulty becoming comfortable with the exact dynamics of the system. For practical system deployment, we were undecided whether it would make sense to pick this hybrid MAC, or a simpler system with less overhead, like B-MAC.

Although we ran out of time before fully discussing the component-based MAC architectures, we did get a chance to hit on a few high-level topics. Before even discussing any details of the MLA (MAC Layer Architecture), we discussed the overall implications of creating a framework for creating MAC protocols. As the authors write, "using these components, developers can quickly construct new MAC protocols that meet the demands of their specific applications." While we felt that reuse and structure is a good thing, is it necessarily good to have a framework for creating more MAC protocols? Do we really need to have a library of hundreds or thousands of MAC protocols that we can choose from when developing sensor network applications? In general, most of us felt that tailoring the MAC protocol for specific applications is a good thing. However, many felt that the alternative of having more complete MAC protocols with many more tuning knobs, or perhaps even a self-tuning MAC protocol, could be a viable alternative to creating a framework for developing entirely new MAC protocols.

Monday, February 9, 2009

The MAC soupbowl

http://www.st.ewi.tudelft.nl/~koen/MACsoup/index.php

In view of our current fascination with MAC layer protocols, I found this wonderful website that talks about each of them briefly.

Sunday, February 8, 2009

MDW unplugged

I've decided to start a personal blog where I get a chance to rant and rave about topics dear to my heart, mostly dealing with sensornets, but not necessarily related to this course. For my first post I talk a bit about the relative difficulties of sensor nets research versus other areas of systems. Feel free to check it out and comment.

Friday, February 6, 2009

B-MAC and Taming the underlying challenges

Versatile Low Power Media Access for Wireless Sensor Networks
(Polastre et al., Sensys 2004)

Taming the Underlying Challenges of Reliable Multihop Routing in Sensor Networks
(Woo et al., Sensys 2003)

Today we had another round of interesting discussions on another two low level issues in sensor networks: radio channel characteristics and low power MAC design.

We started by talking about what clear channel assessment (CCA) means when used to detect transmissions. CCA tells you when the channel is 'clear' instead of when someone is actually transmitting a packet. The B-MAC showed that using a simple threshold is a poor approach but applying EWMA filter to the CCA result performs reasonably well. Then we discussed hidden terminal problem and benefit/overhead of RTS-CTS scheme as we talked about the comparison of B-MAC and S-MAC. Many of us thought the main difference is configurability. B-MAC exposes more knobs to the application, such as LPL preamble length, check interval and Acks, while S-MAC parameters are not controllable by other layers of the system. The result is higher performance from B-MAC because with application-specific knowledge one can turn on and off these low level features to minimize the overhead of the MAC protocol. On the flip side, this also means that the application programmer needs to know how to turns these knobs that are related to each other. It might seem that each node can tune these parameters dynamically and optimize based on the local conditions. However, it is often not so easy because neighboring nodes need to agree on many of the parameters. For example, you would almost always want the preamble length to be longer than check intervals in order to guarantee waking up of the receiver. Turning on and off Acks could be very confusing too. So, one might ask whether B-MAC is providing too much flexibility and forcing the application to be more complicated.

Another interesting question that was brought up today was: why not use power control, a knob that's there provided by the radio chip? Cell phones do it all the time to save power. Motes should do it too. Lowering transmission power can reduce the size of your neighborhood and therefore possibly increase overall throughput. I think this is an interesting option that hasn't been fully exploited by existing sensor network systems. Or, perhaps it makes higher layers, such as routing protocol, too difficult so people discarded the idea?

In the B-MAC paper, S-MAC was beaten up in almost every metric considered in the paper. With such observation, students asked about how the WSN community usually evaluates MAC protocols. The answer seems to be that every MAC paper chooses different application domains and metrics to show the protocol proposed in the paper performs better than other alternatives but there really hasn't been a complete study to compare them in a standard set of representative environments objectively. Given that so many MAC protocols have been proposed, perhaps it is a good time to do such a study.

One more interesting point that was brought up in class today was: what makes sensor networks a different research area than mobile ad hoc networks (MANET) that has been around for a long time. This is similar to the question we asked on Tuesday on why motes are so different than other computers that it needs new OSes. The papers we read so far mention very little about MANET work while WSN shares some of the challenges with MANET, such as multi-hop routing in wireless networks. I think the distinction still comes from the focus on resource constraints of sensor nodes. MANET community has mostly assumed that the nodes in the network are at least PDA or laptop class of devices. Many of them are even constantly powered (e.g. in military vehicles) so the processing power and energy available to the nodes are plenty. So, MANET research tends to focus on infrastructureless network architecture (no cell tower, no base station), node mobility and dealing with rapid topology changes of the network instead of efficiently managing system resources on the moble nodes. Another distinction is research approach. For example, a lot of MANET research results were derived from simulation based on simple unit-disc model. WSN research community, instead, puts more effort into real experiments to evaluate protocols running on top of real radio.

Tuesday, February 3, 2009

TinyOS and Great Duck Island

System Architecture Directions for Networked Sensors
(Hill et al., ASPLOS 2000)

An Analysis of a Large Scale Habitat Monitoring Application
(Szewczyk et al., Sensys '04)

Today was our first proper class and we had a fantastic discussion! Everyone came prepared to talk about the papers, and I think it helped to do a little icebreaking before we got into the discussion itself. I'm really looking forward to the rest of the semester.

I wanted to kick off the class with these two "classic" papers on sensor networks, to help lay the groundwork for the rest of the term. These are great papers, and though I've read them a few times, it is interesting to revisit them now. TinyOS is almost 10 years old, and it's nice to return to the vintage work with some new perspective.

On reading both papers again, I was struck at how low level they both are. The ASPLOS paper really goes bottom-up from the hardware platform and does not indulge in laying out a broader perspective for the field of sensor networks as a whole. Neither paper portends the fetish for complex in-network processing or fancy decentralized algorithms that has taken over much of the community. (I'm one of those fetishists myself, but it's interesting how straight these papers play it.)

We started out talking about what is fundamentally different about sensor networks than conventional computing systems. Most folks seemed to have drunk the TinyOS Kool-Aid and agreed it was all about high concurrency, limited memory, and limited energy. I wanted to press on the point of high concurrency a bit. Aren't ALL computer systems fundamentally concurrent and event-driven? All operating systems are driven by interrupts (keyboard and mouse events, packets coming in, timers going off); are sensor networks really more concurrent than conventional systems? OK, well maybe it has something to do with the lack of buffering, and energy limitations. Now we are getting somewhere - what is fundamental about this? Finally we hit on what I think is essential here: that the limited resources conflate with the concurrency demands to force the programmer to be really "intimate" with their events. You can't hide behind threads or processes anymore: events are part and parcel of what you deal with as a programmer in this domain, so the OS structure had better expose them in a lean and flexible way.

I also wanted to challenge the assumption that motes need to be so small and memory-constrained. Where did this idea come from? The ASPLOS paper more or less assumes it; but if you look carefully at a lot of the potential applications it's not so obvious that you can't get away with, say, a PDA-class device. Strikingly, some of the early design decisions on the mote hardware had some pretty significant impact on the design of TinyOS. My favorite example is that RFM radio, which requires that you bit-bang it to transmit or receive; requiring a context switch every 50 usec (once per bit)! I wonder what would have happened had Jason Hill chosen a better radio or more powerful MCU when he designed the WeC mote. We could be in a very different place today.

But, why the asceticism? TinyOS forces you to tie both hands behind your back and peck at the keyboard with your nose to program... no threads, no sockets, heck, not even malloc or printf. Was this some kind of self-flagellation, doing penance for years of building bigger and faster computers (NOW)?

To understand this lineage one has to go back to the origin of the Berkeley TinyOS effort, which originated with Kris Pister's "Smart Dust" project. That group was developing mm^3 devices integrating extremely simple computation, communication, and sensing. David Culler had the bright idea to build a version of this using off-the-shelf components, to approximate that "plausible future" and study the possibilities enabled by this space. So from the get-go you're talking very resource constrained, and (this is critical) SMALL devices. That fundamental tenet of the sensor network design space was established early on and largely continues to this day; I still see sensor networks based around 16-bit MCUs and the CC2420 when they could just as well use a proper processor and 802.11 radio (for example, if the sensors are powered).

I wouldn't call this merely an academic exercise; but I think we learned a great deal about the fundamentals of this new design space by forcing ourselves to abandon some of our long-held attachments to such luxuries as virtual memory and protection domains. Had we tried to shoehorn PDAs running Linux and make them look kinda, sorta like "sensor networks" I think we would have missed an important opportunity to dig deeper and reveal some essential truths.

Intro slides posted

For anyone who missed class last Thursday, I've posted the introductory slides for CS263, which summarizes what the class is about (and perhaps more importantly, how you will be graded!).

Harvard CS263