DEFESA DE DISSERTAÇÃO DE MESTRADO

SIMPLIFYING SELF-ADAPTIVE DISTRIBUTED STREAM PROCESSING IN C++

20/03/2023 - 13h31

ALUNO: Júnior Henrique Löff
ORIENTADOR: Dr. Luiz Gustavo Leão Fernandes
COORIENTADOR: Dr. Dalvan Jair Griebler
BANCA EXAMINADORA: Dra. Lúcia Maria de Assumpção Drummond (IC/UFF), Dr.
Fernando Luis Dotti (PPGCC/PUCRS)
DATA: 30 de março de 2023
LOCAL: Videoconferência
HORÁRIO: 14:00

Link para acessar a videoconferência

RESUMO:
Data sources such as IoT sensors, user activity logs, health surveillance, and video streaming are becoming ubiquitous
worldwide. Often, these sources produce big amounts of raw data, which traditional computing systems based on a store-first and compute-later batch paradigm struggle to handle. Stream processing is an effective solution that can manage these massive workloads while meeting low-latency and highthroughput requirements. However, developing a streaming system from scratch is a challenging endeavor. Distributed stream processing systems (DSPS) like Apache Flink and Apache Storm already provide many abstractions for transparent fault-tolerance, scheduling, communication protocols, and many other mechanisms that assist programmers in writing distributed parallel code. These tools are mostly written in higher-level programming languages like Java and Scala. Nevertheless, C/C++ distributed computing systems are preferred for high-performance computing (HPC), but in this domain, programmers lack high-level programming abstraction options. Consequently, C++ programmers usually rely on low-level MPI for coordinating distributed applications. Also, when using MPI, programmers often employ a static programming model to write their distributed applications, opposite to stream processing which dynamically deals with irregular workloads that vary in content, format, size, and input rate. Streaming systems should allow reconfiguration to self-adapt in response to data flow spikes, slowdowns, and load-balancing issues. This work aims to address these challenges by investigating the adaptability aspects of distributed streaming systems. For that, we introduce a new C++ framework called MPR (Message Passing Runtime), which simplifies the implementation of distributed stream processing applications. The framework relies on MPI´s message-passing communication and implements many programming abstractions, including data transfer, serialization, load balancing, and back pressure. Moreover, we design a novel runtime system that supports MPR´s adaptability capabilities. The runtime system implements algorithms to handle dynamic process creation and includes a consensus protocol for distributed process coordination. The experimental analysis reveals that MPR´s dynamic runtime system can achieve performance comparable to a static MPI implementation. In addition, we also conduct experiments to evaluate and characterize MPR´s adaptability capabilities. The characterization experiments show that MPR can readily self-configure itself in response to workload variations. Thanks to this work, MPR´s runtime system on top of MPI is now a valuable tool that can be used to test and evaluate other self-adaptive algorithms for distributed stream processing.

Compartilhe

Informações do evento

Data
30/03/2023

Próximos eventos


Outros eventos Ver todos

  • Últimos publicados
  • Mais visualizados