Skip to content

Latest commit

 

History

History
107 lines (72 loc) · 5.11 KB

ROADMAP.md

File metadata and controls

107 lines (72 loc) · 5.11 KB

Roadmap

This document outlines the roadmap for the project.


2024 3Q

General:

  • Env Setup: Set up the docker workflow, scripts, and project structure.
  • Benchmarking Data: Benchmarking data load and preprocessing, and comparison with CUDA.

Federated Learning Algorithms:

Dataset Benchmarks:

  • MNIST: A dataset of handwritten digits (0-9) commonly used for image classification tasks.
  • Fashion-MNIST (FMNIST): A dataset of Zalando's article images used for image classification tasks.

Compression Techniques:

2024 4Q

General:

  • cProfile: Profiling the code to identify bottlenecks.
  • PyTorch Profiler: Profiling the code to identify bottlenecks.

Federated Learning Algorithms:

  • FedAvg: Baseline Federated Averaging.

Dataset Benchmarks:

  • MNIST: A dataset of handwritten digits (0-9) commonly used for image classification tasks.
  • Fashion-MNIST (FMNIST): A dataset of Zalando's article images used for image classification tasks.
  • CIFAR-10/100: Datasets containing 10 or 100 classes of 32x32 color images, widely used for image recognition tasks.

Compression Techniques:

2025 1Q

General:

  • Intel Gaudi Profiler: Profiling the code to identify bottlenecks.

Dataset Benchmarks:

  • Shakespeare Dataset: A character-level dataset built from Shakespeare’s plays, used for next-character prediction tasks.
  • Sentiment140: A dataset for sentiment analysis containing 1.6 million tweets labeled as positive, negative, or neutral.

Compression Techniques:

  • EvoFed: Evolutionary Federated Learning.

2025 2Q

General:

Federated Learning Algorithms:

  • FedProx: Federated Proximal to address heterogeneity.
  • FedAvgM: Federated Averaging with Momentum.

Dataset Benchmarks:

  • Reddit Dataset: A dataset of user comments from Reddit structured for federated learning tasks like next-word prediction or topic modeling.

Compression Techniques:

2025 3Q

General:

  • Testing: Implementing unit tests and integration tests for the project.

Federated Learning Algorithms:

  • FedAdagrad: Adaptive Gradient-based FL optimization.
  • FedYogi: Variant of Adam optimized for FL.
  • FedAdam: Adam optimizer adapted for federated setups.
  • SCAFFOLD: Control variates for correcting local updates.

Dataset Benchmarks:

  • Alpaca: General instruction tuning dataset with 52k samples.
  • Alpaca-GPT4: GPT-4-generated instruction-response pairs.
  • FinGPT: Financial sentiment dataset with 77k samples.

Compression Techniques:

  • LoRA: Low-Rank Adaptation for computational and communication efficiency.
  • MA-LoRA (under review): Model-Agnostic Low-Rank Adaptation.

2025 4Q

General:

  • CI/CD Pipeline: Implementing a CI/CD pipeline for the project.

Federated Learning Algorithms:

Dataset Benchmarks:

  • MedAlpaca: Medical instruction dataset with 34k samples.
  • Code-Alpaca: Code generation dataset with 20k samples.
  • MathInstruct: Mathematical instruction tuning dataset with 225k samples.
  • UltraFeedback: Value alignment dataset emphasizing helpfulness.
  • HH-RLHF: Harmlessness and helpfulness preference dataset with 161k samples.

Compression Techniques:

2026

We will continue to update the roadmap as we progress through the project. The main objective for 2026 will be maintaining the project, adding new algorithms, datasets, and compression techniques, and improving the existing ones, based on recent publications and community feedback.