Asynchronous Federated Retrieval-Augmented Generation

Supervisor: Rajshekar Kolichala

Author: N/A

Abstract

This thesis intends to design and deployment of an asynchronous Federated RAG system, a privacy-focused framework for medical QA trained on partitioned datasets and optimized for heterogeneous client environments. The work offers practical experience in the end-to-end development pipeline, including data partitioning and embedding, fine-tuning compact language models with triplet and cross-entropy losses, implementing RAG pipelines with vector databases, and applying staleness-aware aggregation strategies tailored for federated deployment. Participants will gain exposure to real-world system integration, applied federated machine learning workflows, and collaborative research in AI privacy and efficiency.