Aggregation Strategies Guide

Comprehensive guide to model aggregation strategies in federated learning.

What is Aggregation?

Aggregation combines model updates from multiple clients into a single global model:

graph TB
    A["Client 1 Update: w₁"] --> B["Aggregator"]
    C["Client 2 Update: w₂"] --> B
    D["Client 3 Update: w₃"] --> B
    B --> E["Global Model: w_global = f(w₁, w₂, w₃)"]

FedAvg (Federated Averaging)

Standard averaging aggregation algorithm.

Formula

w_global = Σ(n_i / N) × w_i

where:
- n_i = number of samples on client i
- N = total samples
- w_i = model weights from client i

Configuration

aggregation_strategy: "FedAvg"
fed_avg: true

Characteristics

Simple and effective
Works well with IID data
Fast convergence (IID case)
Struggles with non-IID data
Can diverge under heterogeneity

Usage Example

from src.core.aggregator.fed_avg_aggregator_base import FedAvgAggregator

aggregator = FedAvgAggregator()

# Aggregate client updates
updates = [client1_update, client2_update, client3_update]
weights = [0.3, 0.3, 0.4]  # Based on data sizes

aggregated = aggregator.aggregate(updates, weights)

When to Use

IID or nearly-IID data
Balanced client data
Baseline experiments
Well-behaved convergence expected

Performance

# Example configuration
aggregation_strategy: "FedAvg"
data_distribution_kind: "20"      # Low non-IID
number_of_clients: 10
federated_learning_rounds: 50

Expected Results:

Good convergence with IID data
Accuracy: 90-95% (MNIST)
Convergence rounds: 50-100

FedProx (Federated Proximal)

Addresses non-IID challenges with proximal term.

Formula

minimize_w Σ F_i(w) + μ/2 × ||w - w_global||²

where:
- F_i(w) = local loss on client i
- μ = proximal coefficient
- w_global = global model

Intuition

Adds regularization term to keep local models close to global model:

graph TB
    A["Local Objective<br/>(fit local data)"] --> C["FedProx Objective"]
    B["Proximal Regularization<br/>(stay close to global)"] --> C

Configuration

aggregation_strategy: "FedProx"
fed_avg: false

# Optional: Proximal coefficient (default varies)
# fedprox_mu: 0.01

Characteristics

Handles non-IID data better
Stable with heterogeneous data
Converges with heterogeneity
Additional hyperparameter (μ)
Slightly slower than FedAvg

Usage Example

from src.core.aggregator.fed_prox_aggregator_base import FedProxBase

aggregator = FedProxBase(mu=0.01)

# Aggregate with proximal term
aggregated = aggregator.aggregate(
    updates=updates,
    weights=weights,
    global_model=w_global
)

When to Use

Highly non-IID data
Unbalanced client data
Heterogeneous client environments
Need stable convergence

Performance

# Example configuration for non-IID
aggregation_strategy: "FedProx"
data_distribution_kind: "90"      # High non-IID
number_of_clients: 20
federated_learning_rounds: 100

Expected Results:

Better convergence with non-IID
Accuracy: 85-90% (non-IID CIFAR-10)
More stable training

Comparison

Convergence Behavior

graph LR
    A["Rounds"] --> |"FedAvg"| B["Peak Accuracy"]
    B --> |"Divergence"| C["Unstable"]
    A --> |"FedProx"| D["Steady Improvement"]
    D --> |"Convergence"| E["Stable"]

Feature Comparison

Feature	FedAvg	FedProx
IID Data	Excellent	Excellent
Non-IID Data	Poor	Good
Stability	Can diverge	Stable
Hyperparameters	0	1 (μ)
Computation	Fast	Slightly Slower
Implementation	Simple	Complex

Accuracy vs Non-IID Level

Non-IID Level	FedAvg	FedProx
IID	100%	100%
20%	92%	95%
50%	85%	90%
90%	75%	88%

Configuration Examples

Example 1: IID Data with FedAvg

aggregation_strategy: "FedAvg"
fed_avg: true

data_distribution_kind: "iid"
number_of_clients: 10
federated_learning_rounds: 50

model_type: "resnet18"
dataset_type: "cifar10"

Example 2: Non-IID Data with FedProx

aggregation_strategy: "FedProx"
fed_avg: false

data_distribution_kind: "90"
number_of_clients: 20
federated_learning_rounds: 100

model_type: "resnet18"
dataset_type: "cifar10"

Example 3: Highly Heterogeneous

aggregation_strategy: "FedProx"
fed_avg: false

data_distribution_kind: "90"
dirichlet_beta: 0.01        # Very non-IID
number_of_clients: 50
federated_learning_rounds: 200

model_type: "resnet50"
dataset_type: "cifar100"

Best Practices

Choosing Strategy

graph TD
    A["Check Data Distribution"] --> B{IID or Skewed?}
    B -->|IID/Balanced| C["Use FedAvg"]
    B -->|Non-IID/Skewed| D["Use FedProx"]
    C --> E["Monitor Convergence"]
    D --> E
    E --> F{Issues?}
    F -->|Diverging| G["Switch to FedProx"]
    F -->|Oscillating| H["Reduce Learning Rate"]

Resources

Configuration Guide: Full config reference
Basic Examples: Practical examples
Models & Datasets: Available models

Next: Explore Basic Examples to get more familiar with the concepts of federated learning.