Harvard CS263: Data Aggregation: TinyDB, Beyond Average, and Synopsis Diffusion

(Madden, et al.)

(Hellerstein, et al.)

(Nath, et al.)

Today, we addressed the topic of data aggregation. We began by
exploring the contents of the ACQP and Beyond Average papers. Starting
with ACQP, we were quite pleased to see that the issue of power/energy
consumption was finally being brought to the forefront of a paper
related to sensor networks again (in the previous week, there was a
shared longing for papers that addressed the topic of energy more
directly). Moreover, the ACQP paper presented what one student
suggested was a new "programming model for sensor networks," but, in
particular, was a paradigm of viewing sensor networks as instruments
for scientific studies.

As began to philosophize about this idea, we began discussing the
questions that undergird the distinctions between the systems and
database communities. To a first approximation, we understand that the
systems community tends to approach problems from the bottom-up while
the database community goes from the top-down. A DB example we
discussed regarded the use of query optimizers and other DB internals
to take the declarative statements of a language like SQL and hide the
details of returning the correct answer as quickly and efficiently as
possible. The systems folks in the room tended to agree that we like
to approach problems a (relatively) low-level base.

Ultimately, we were trying to get at the question: Should the system
optimize queries for you? For DB folks, this seemed like a
no-brainer. We even tended to think about the benefits an analogous
(although, not a perfect analogy) of compiler optimization.

But this is where we began to wonder whether the optimizations
presented (e.g., priotization schemes) in this paper seemed
appropriate for sensor networks. While in the compiler analogy,
compilers rearrange instructions but the execution of, say, a
statically-bounded for-loop won't suddenly decided to stop at 10
iterations rather than 20, we questioned the example prioritization
schemes as winavg and delta seemed to defy the correctness that some
sensor network users would desire for their applications. However, we
acknowledged that perhaps specific applications may find those schemes
useful or reasonable, but a skepticism continued to linger on this
point.

Moreover, we felt that the Beyond Average paper might in fact be
breaking the abstraction barrier that it set out to establish
(high-level declarative language - SQL - hiding the underlying details
of sensor network operation). The one example from which this
sentiment emerged was in the Query-Handoff Implementation for tracking
(i.e., Fig 8). The use of SIGNAL EVENT to trigger an ON EVENT
statement appeared a lot like an imperative programming language
method or function call. We weren't certain anymore if the high-level
approach to sensor network application writing was appropriate or
possible after realizing this was what the authors were demonstrating.

In general, we appreciated the abstractions that the DB folks aimed
for but we felt uneasy that this approach to programming sensor
networks would be feasible in the future.

Of course, we also spent some time also examining the Synopsis
Diffusion paper. Mostly, we sought to understand the COUNT algorithm
(which seeks to know how many nodes are present in the network) to
better understand Synopsis Generation, Fusion, and Evaluation. We felt
that there was a little blackbox magic going on for COUNT, especially
in the SE() phase, but we didn't get too hung up on this process.

We seemed to unanimously appreciate the presentation of necessary and
sufficinet properties for ODI-correctness; however, we wished that the
authors would have spent more time discussing how to create
ODI-correct Generations and Fusions.

We had a lot of papers to get through on Tuesday, but each of them
seemed to generate a fair amount of dialogue and interest from the
class, allowing us to explore the range of DB and systems research
philosophy to details of proofs and ODI-correctness.

Harvard CS263

Thursday, March 12, 2009

Data Aggregation: TinyDB, Beyond Average, and Synopsis Diffusion

Course Links

Followers

Blog Archive