Aggregation and Degradation in JetStream: Streaming analytics in the wide area
Abstract
用两种方法解决了数据和带宽之间的问题,解决了数据过期(tale)的问题
Our adaptive control mechanisms are responsive enough to keep end-to-end latency within a few seconds, even when available bandwidth drops by a factor of two, and are flexible enough to express practical policies.
即使带宽下降两倍仍能取得低延迟
Introduction
- 聚合(Aggregation),降级(Degradation),MapReduce
- 传感器、存储、处理器的价格远比带宽价格便宜,于是带宽成为瓶颈,或者带宽被过量供应了(因为不能自适应)
- 降级(Dgradation)往往带来准确性(accuracy)的降低,于是作者希望使用最低程度的降级
- 将聚合(Aggregation),降级(Degradation)融合进串流(streaming)系统的挑战
- 存储系统支持实时(real-time)聚合(Aggregation)
- 实施秒级的降级(adaptation performed on a timescale of seconds)来取得低延迟
- 用户能够使用足够强大的(expressive)语言来自主定义策略,
- We consider our architecture and its associated interfaces to be the key contribution of this paper. 作者认为他们的系统架构和API是关键贡献
Design Overview
- Integrating structured storage
- Reducing data volumes
- Programming model
Adaptive Degradation
- many useful degradations have a ==data-dependent== bandwidth savings
- data since the last marker was generated over k seconds,records the time t between seeing the last marker and receiving this acknowledgment,use k/t,as availablility
- if (k > t), k/t > 1,means bandwidth is enough
- if(k<t),k/t < 1,means bandwidth becomes scarce
- “By default, send all images at maximum fidelity from CCTV cameras to a central repository. If bandwidth is insufficient, switch to sending images at 75% fidelity, then 50% if there still isn’t enough bandwidth. Beyond that point, reduce the frame rate, but keep the images at 50% fidelity.”
Degradation
The best degradation for a given application depends not only on the statistics of the data, but also on the set of queries that may be applied to the data.
Evaluation
总结
本文的侧重点在系统,在传统的流处理系统的聚合部分,对于降级部分,有个比较好的开端。