21/07/2022 - 08h57

DEFESA DE PROPOSTA DE TESE – Programa de Pós-Graduação em Ciência da Computação


ALUNO: Eliã Rafael de Lima Batista

ORIENTADOR: Dr. Fernando Luís Dotti

COORIENTAÇÃO: Dr. Fernando Pedone (CSI/USI)

BANCA EXAMINADORA: Dr. Alysson Neves Bessani (DI/ULisboa), Dr. Luiz Gustavo Leão Fernandes (PPGCC/PUCRS)

DATA: 05 de agosto de 2022

LOCAL: Videoconferência

HORÁRIO: 09:00
Link para acessar a videoconferência:

State machine replication (SMR) is a widely used approach to providing fault tolerance in distributed systems. In SMR, replicas execute requests deterministically and in the same order to ensure state consistency across the system. With respect to system performance, there are three main aspects to consider: request execution, communication across distributed replicas, and state management. In this proposal, we aim to study enhancements to each one of those three aspects. In the execution phase, we study techniques to allow concurrent execution of requests in SMR, while keeping determinism. One of these techniques, called early scheduling, trades scheduling freedom for simplicity, allowing to expedite decisions during scheduling. However, we identified some weaknesses and then propose improvements to early scheduling, namely the use of busy-wait synchronization and work-stealing mechanisms. We fully implement our proposed improvements and we present the results, compared to more classic approaches. Regarding the communication layer, we aim to study enhancements to a protocol called ByzCast [15]. This protocol provides a communication abstraction that uses a tree as an overlay to define the communication pattern among nodes in the system. We propose a different approach that replaces ByzCast´s tree with a fully connected directed acyclic graph (DAG). We argue that a tree implies the involvement of intermediary non-destination nodes to propagate a message to lower nodes. On the other hand, with a fully connected DAG, messages could be propagated directly to all other destinations respecting the properties of a genuine algorithm. Regarding state management, we propose to study enhancements to the state management module of a popular SMR framework by developing an optimized data structure. Such a data strucuture should be maintained by the replication library, instead of by the application layer, capable of storing data in separate partitions that can be independently transferred and validated. Such a structure would enhance state management and recovery protocols on SMR, providing fast and secure state transfer by enabling nonfaulty replicas to send state partitions in parallel,
and fast state validation by enabling the recovering replica to validate each partition separately during recovery.