Dynamic Scheduling of Approximate Telemetry Queries

Network telemetry systems provide critical visibility into the state of traffic flowing through modern computer networks. While significant progress has been made by leveraging programmable switch hardware to scale these systems to high and time-varying traffic workloads, less attention has been paid towards efficiently utilizing limited hardware resources in the face of dynamics such as the composition of traffic as well as the number and types of queries run at a given point in time. To efficiently handle traffic and query dynamics we develop DynATOS, the first scheduling system for running network traffic queries on constrained switch hardware while adapting to changing query and resource requirements. DynATOS leverages a novel time-division approach to approximation and multiplexes switch hardware resources among submitted queries using an optimization formulation. We prototype and evaluate DynATOS on a runtime-programmable switch hardware telemetry module.

Architecture of DynATOS
Architecture of DynATOS

Why DynATOS?

  • State-of-the art telemetry systems fail to meet the demands of simultaneous traffic and query dynamics.

  • Recent breakthroughs in runtime programmable switch hardware enable dynamic telemetry systems, however, no prior works leverage these breakthroughs to efficiently handle traffic and query dynamics.

  • DynATOS is the first dynamic telemetry systems designed to handle both traffic and query dynamics in the highly constrained setting of a single switch ASIC.

Key Points

  • DynATOS handles dynamic traffic scenarios with high accuracy where static approaches fail. Above, we show performance of DynATOS compared to sketch-based methods on a trace from MAWILab with pronounced traffic dynamics. After a change in traffic composition in epoch 20, DynATOS retains high accuracy while other approaches suffer considerable accuracy degradation.
  • DynATOS efficiently multiplexes limited switch resources when executing dynamic query workloads through a novel subepoch-based approach to approximation and scheduling. Above, we show performance of DynATOS on dynamic query workloads with different query arrival rates. DynATOS maintains high query satisfaction rates even when queries arrive at a mean rate of one per second.
DDoS
DDoS
DDoS
Port scan
DDoS
Superspreader
DDoS
TCP new connections
  • The query result approximation method in DynATOS offers similar benefits as other approximation methods while providing statistically sound accuracy estimates for each epoch without assumptions about traffic characteristics. Above, we compare DynATOS with sketch-based methods in the tradeoff between accuracy and load on collector. While performance is similar, DynATOS often offers more flexibility in the tradeoff space.

Publications

Revisiting Network Telemetry in COIN: A Case for Runtime Programmability
Chris Misa, Ramakrishnan Durairajan, Reza Rejaie, Walter Willinger
IEEE Network, Special Issue on In-Network Computing: Emerging Trends for the Edge-Cloud Continuum, September 2021.
[PAPER]

Dynamic Scheduling of Approximate Telemetry Queries
Chris Misa, Walt O’Connor, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
(To appear) In Proceedings of 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI’22)
Renton, WA, April 2022.
[PAPER]

Resources

_

Team Members

_

Acknowledgements

We thank Shahram Davari and Broadcom, Inc. for providing hardware and technical support for our testbed evaluation. This work is supported by the National Science Foundation through CNS 1850297, a Ripple faculty fellowship, and a Ripple graduate fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of NSF, Ripple, or Broadcom.

Ripple Ripple Broadcom
Last updated Oct. 5th, 2021.