Complete guide to configuring FedPilot for your federated learning experiments. Learn how to customize every aspect of your training pipeline through YAML configuration files.
Configuration File Structure
A FedPilot configuration is a YAML file that defines all aspects of your federated learning experiment:
Show full configuration template
# ============================================================# 0. EXPERIMENT IDENTITY & RUNTIME# ============================================================federation_id:"0.0.32"# Unique identifier for this experimentproduction_mode:false# If true, reduce extra logging and checks# ============================================================# 1. DEVICE & RESOURCE CONFIGURATION# ============================================================device:"cuda"# "cpu" or "cuda"gpu_index:0# Single GPU: 0, 1, 2, etc.# Multi-GPU: "0:3" (GPUs 0,1,2)random_seed:42# Reproducibilityplacement_group_strategy:"SPREAD"# Ray placement: SPREAD/PACK/STRICT_PACK/STRICT_SPREAD# ============================================================# 2. MODEL & FRAMEWORK CONFIGURATION# ============================================================runtime_engine:"torch"# "torch", "tensorflow", "onnx"model_type:"cnn"# See Model Referencetransformer_model_size:"base"# For BERT/ViT: "base", "large"pretrained_models:false# Use pre-trained weightsdataset_type:"fmnist"# See Dataset Reference# ============================================================# 3. TRAINING CONFIGURATION# ============================================================learning_rate:0.001# Learning rateoptimizer:"sgd"# "sgd", "adam", "rmsprop"loss_function:"CrossEntropy"# Loss function typeweight_decay:null# L2 regularisation (null = use framework default)# Training loop parametersnumber_of_epochs:1# Local epochs per roundtrain_batch_size:64# Local training batch sizetest_batch_size:128# Evaluation batch sizetransform_input_size:28# Input image size for transforms# ============================================================# 4. DATA DISTRIBUTION & SAMPLING# ============================================================# Data distributiondata_distribution_kind:"20"# Options: "iid", "20", "50", "90", "dir"dirichlet_beta:0.1# Beta for Dirichlet distribution (for "dir")desired_distribution:null# null = automatic, or provide custom distribution# Sampling and federation sizenumber_of_clients:10# Total number of clientsclient_sampling_rate:1.0# Fraction of clients sampled per round# ============================================================# 5. FEDERATED LEARNING PARAMETERS & TOPOLOGY# ============================================================# Federation schema and topologyfederated_learning_schema:"DecentralizedFederatedLearning"# e.g. "DecentralizedFederatedLearning", "CentralizedFederatedLearning"federated_learning_topology:"ring"# e.g. "star", "ring", "k_connected", "custom"k_value:2# For "k_connected" topology: number of neighbours# Optional adjacency matrix for custom graphsadjacency_matrix_file_name:"adjacency_matrix_2.csv"# CSV for custom graph (or null)draw_topology:false# If true, draw and save a visualisation of the graphclient_role:"train"# Role of this client in the federation (e.g. "train")# Core FL parametersfederated_learning_rounds:8# Total number of FL roundsaggregation_strategy:"FedAvg"# "FedAvg", "FedProx", etc.fed_avg:true# Enable FedAvg aggregation (legacy flag)aggregation_sample_scaling:false# Scale updates by client sample size (advanced)# Clusteringdo_cluster:true# Enable clustering across clientsclustering_period:6# How often to re-cluster (in rounds)pre_computed_data_driven_clustering:false# Use external clustering assignments# Early stoppingstop_avg_accuracy:0.99# Stop when average accuracy reaches this threshold# Model checkpointingsave_before_aggregation_models:false# Save local client models before aggregationsave_global_models:false# Save global model after each round# ============================================================# 6. DISTANCE & SIMILARITY METRICS# ============================================================distance_metric:"cosine"# "cosine", "euclidean", "coordinate"distance_metric_on_parameters:true# true = distance on parameters, false = alternative spacedynamic_sensitivity_percentage:false# If true, adapt sensitivity over timesensitivity_percentage:100# Percentage of "most important" parameters or chunksremove_common_ids:false# Remove coordinates common to many clients (advanced)# Neighbour-based settings (for k-NN style algorithms)client_k_neighbors:null# If set, limit number of neighbours per client# ============================================================# 7. MODEL COMPRESSION & OPTIMISATION# ============================================================# Chunking (model compression)chunking:false# Enable model segmentation into chunkschunking_with_gradients:false# Must be true with chunking=true for importance analysischunking_parts:100# Number of chunks to split model intochunking_random_section:false# true=random selection, false=importance-based selection# Pruningdo_pruning:false# Enable model pruningpruning_threshold:0.1# Pruning sensitivity threshold# ============================================================# 8. DIFFERENTIAL PRIVACY# ============================================================dp_enabled:false# Enable DP-SGDdp_epsilon:1.0# Privacy budget (lower = more privacy)dp_delta:1e-5# Failure probabilitydp_clipping_norm:1.0# Gradient clipping thresholddp_noise_multiplier:0.1# Noise scale for DP# ============================================================# 9. LOGGING, METRICS & OUTPUT# ============================================================mean_accuracy_to_csv:true# Export mean accuracy per round to CSVray_dashboard:true# Start or expose Ray dashboardray_dashboard_port:8266# Port for Ray dashboard# Fine-grained metrics controlmetrics:round:true# Per-round summary metricsmemory:true# Memory usage metricsperformance:true# Timing/throughput metricscommunication:true# Communication volume/latency metricssystem:true# System-level metrics (CPU/GPU etc.)convergence:true# Convergence metrics across roundsthroughput:true# Global throughput metricsavailability:true# Availability / liveness of clients# ============================================================# 10. ADVANCED ANALYSIS & FAIRNESS# ============================================================shapley:false# Enable Shapley value computation for clientsshapley_type:"value"# Type of Shapley metric (e.g. "value")use_global_accuracy_for_noniid:true# Use a global accuracy metric under non-IID settings# ============================================================# 11. NOTES# ============================================================# The fields near the end (aggregation_sample_scaling, client_k_neighbors,# shapley, shapley_type, use_global_accuracy_for_noniid, etc.) are often# automatically added using framework defaults when you run the config# validator or auto-fill tools.
Configuration Examples
Please note that you cannot just copy paste these values inside your config.yaml file and start, as these examples are simplified versions that are missing some values. It is recommended to use make fill-config to populate missing values after copying an example, or use make config to select a base configuration and then modify it accordingly.
Example 1: Quick Testing Setup
Show YAML for Example 1 (Quick Testing Setup)
# Fast training for quick testingdevice:"cuda"gpu_index:0random_seed:42model_type:"cnn"dataset_type:"mnist"learning_rate:0.01optimizer:"sgd"number_of_clients:5number_of_epochs:1train_batch_size:128federated_learning_rounds:10clustering_period:2data_distribution_kind:"iid"aggregation_strategy:"FedAvg"mean_accuracy_to_csv:true
Example 2: Realistic Non-IID Scenario
Show YAML for Example 2 (Realistic Non-IID Scenario)
# Enhanced chunking for compressiondevice:"cuda"gpu_index:0learning_rate:0.001model_type:"resnet50"dataset_type:"cifar100"number_of_clients:20federated_learning_rounds:50aggregation_strategy:"FedAvg"# Model Compressionchunking:truechunking_with_gradients:truechunking_parts:50# Communication optimizationsensitivity_percentage:80# Send only top 80%dynamic_sensitivity_percentage:truemean_accuracy_to_csv:true
Example 5: Decentralized Federated Learning
Show YAML for Example 5 (Decentralized Federated Learning)
# Less privacy (better utility)dp_enabled:truedp_epsilon:10.0# Higher epsilon = more privacy lost# More privacy (worse utility)dp_epsilon:1.0# Lower epsilon = more private
Tip 5: Communication Efficiency
# Reduce communication overheadchunking:truechunking_with_gradients:truesensitivity_percentage:80# Send only important 80%dynamic_sensitivity_percentage:true
Configuration Workflow
Step 1: Choose Base Configuration
# Browse available templates
make config
# Select one from templates/# e.g., templates/lenet/label-20/encryption-free/fl.yaml
Step 2: Customize for Your Needs
# Modify parameters in config.yamlnumber_of_clients:20# More clientsfederated_learning_rounds:100# More roundslearning_rate:0.005# Adjust learning rate
Step 3: Validate Configuration
make validate-config
Step 4: Run Training
make run
# or
python main.py
Step 5: Monitor & Analyze
make logs # View training logs
Advanced Configurations
Show advanced configuration examples
Multi-GPU Training
device:"cuda"gpu_index:"0:4"# Use GPUs 0, 1, 2, 3placement_group_strategy:"spread"# Distribute tasksnumber_of_clients:8train_batch_size:128# Larger batch with multiple GPUs
Research Experiment
# Multiple variations for hyperparameter searchlearning_rate:0.001aggregation_strategy:"FedAvg"# Data heterogeneity studiesdata_distribution_kind:"90"dirichlet_beta:0.01# DP privacy studiesdp_enabled:truedp_epsilon:1.0