This document outlines the roadmap for the project.
- Env Setup: Set up the docker workflow, scripts, and project structure.
- Benchmarking Data: Benchmarking data load and preprocessing, and comparison with CUDA.
- MNIST: A dataset of handwritten digits (0-9) commonly used for image classification tasks.
- Fashion-MNIST (FMNIST): A dataset of Zalando's article images used for image classification tasks.
- cProfile: Profiling the code to identify bottlenecks.
- PyTorch Profiler: Profiling the code to identify bottlenecks.
- FedAvg: Baseline Federated Averaging.
- MNIST: A dataset of handwritten digits (0-9) commonly used for image classification tasks.
- Fashion-MNIST (FMNIST): A dataset of Zalando's article images used for image classification tasks.
- CIFAR-10/100: Datasets containing 10 or 100 classes of 32x32 color images, widely used for image recognition tasks.
- Sparse Update: Federated Averaging with Sparse updates.
- Quantization: Federated Averaging with Quantization.
- Intel Gaudi Profiler: Profiling the code to identify bottlenecks.
- Shakespeare Dataset: A character-level dataset built from Shakespeare’s plays, used for next-character prediction tasks.
- Sentiment140: A dataset for sentiment analysis containing 1.6 million tweets labeled as positive, negative, or neutral.
- EvoFed: Evolutionary Federated Learning.
- Reddit Dataset: A dataset of user comments from Reddit structured for federated learning tasks like next-word prediction or topic modeling.
- FA-LoRA: Frozen-A Low-Rank Adaptation.
- MAPA (under review): Model-Agnostic Projection Adaptation.
- Testing: Implementing unit tests and integration tests for the project.
- FedAdagrad: Adaptive Gradient-based FL optimization.
- FedYogi: Variant of Adam optimized for FL.
- FedAdam: Adam optimizer adapted for federated setups.
- SCAFFOLD: Control variates for correcting local updates.
- Alpaca: General instruction tuning dataset with 52k samples.
- Alpaca-GPT4: GPT-4-generated instruction-response pairs.
- FinGPT: Financial sentiment dataset with 77k samples.
- LoRA: Low-Rank Adaptation for computational and communication efficiency.
- MA-LoRA (under review): Model-Agnostic Low-Rank Adaptation.
- CI/CD Pipeline: Implementing a CI/CD pipeline for the project.
- MedAlpaca: Medical instruction dataset with 34k samples.
- Code-Alpaca: Code generation dataset with 20k samples.
- MathInstruct: Mathematical instruction tuning dataset with 225k samples.
- UltraFeedback: Value alignment dataset emphasizing helpfulness.
- HH-RLHF: Harmlessness and helpfulness preference dataset with 161k samples.
We will continue to update the roadmap as we progress through the project. The main objective for 2026 will be maintaining the project, adding new algorithms, datasets, and compression techniques, and improving the existing ones, based on recent publications and community feedback.