CS | Federated Learning (No.1)
1-0 Quick Break
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized.
The term federated learning was introduced in 2016 by McMahan et al. : “We term our approach Federated Learning, since the learning task is solved by a loose federation of participating devices (which we refer to as clients) which are coordinated by a central server.” An unbalanced and non-IID (identically and independently distributed) data partitioning across a massive number of unreliable devices with limited communication bandwidth was introduced as the defining set of challenges.
Privacy is paramount
A key property of many of the problems discussed is that they are inherently interdisciplinary—solving them likely requires not just machine learning, but techniques from distributed optimization, cryptography, security, differential privacy, fairness, compressed sensing, systems, information theory, statistics, and more.
Applying FL to other applications：
Cross-device federated learning and federated data analysis are now being applied in consumer digital products. Google makes extensive use of federated learning in the Gboard mobile keyboard , as well as in features on Pixel phones and in Android Messages. While Google has pioneered cross-device FL, interest in this setting is now much broader, for example: Apple is using cross-device FL in iOS 13, for applications like the QuickType keyboard and the vocal classifier for “Hey Siri”; doc.ai is developing cross-device FL solutions for medical research, and Snips has explored cross-device FL for hotword detection.
Cross-silo applications have also been proposed or described in myriad domains including finance risk prediction for reinsurance , pharmaceuticals discovery , electronic health records mining , medical data segmentation [15, 139], and smart manufacturing .
Broader Definition of Federated Learning
Federated learning is a machine learning setting where multiple entities (clients) collaborate in solving a machine learning problem, under the coordination of a central server or service provider. Each client’s raw data is stored locally and not exchanged or transferred; instead focused updates intended for immediate aggregation are used to achieve the learning objective.
this definition distinguishes federated learning from fully decentralized (peer-to-peer) learning techniques
Table 1 contrasts both cross-device and cross-silo federated learning with traditional single-datacenter distributed learning across a range of axes.
For the remainder of this paper, we consider the cross-device FL setting unless otherwise noted, though many of the problems apply to other FL settings as well.
1-1 The Cross-Device Federated Learning Setting
This section takes an applied perspective, and unlike the previous section, does not attempt to be definitional.
Rather, the goal is to describe some of the practical issues in cross-device FL and how they might fit into a broader machine learning development and deployment ecosystem.
1-1-1 The Lifecycle of a Model in Federated Learning
The FL process is typically driven by a model engineer developing a model for a particular application.
For example, a domain expert in natural language processing may develop a next word prediction model for use in a virtual keyboard.
At a high level, a typical workflow is:
The model engineer identifies a problem to be solved with FL.
If needed, the clients (e.g. an app running on mobile phones) are instrumented to store locally (with limits on time and quantity) the necessary training data. In many cases, the app already will have stored this data (e.g. a text messaging app must store text messages, a photo management app already stores photos). However, in some cases additional data or metadata might need to be maintained, e.g. user interaction data to provide labels for a supervised learning task.
Simulation prototyping (optional)
The model engineer may prototype model architectures and test learning hyperparameters in an FL simulation using a proxy dataset.
Federated model training
Multiple federated training tasks are started to train different variations of the model, or use different optimization hyperparameters.
(Federated) model evaluation
After the tasks have trained sufficiently (typically a few days, see below), the models are analyzed and good candidates selected. Analysis may include metrics computed on standard datasets in the datacenter, or federated evaluation wherein the models are pushed to held-out clients for evaluation on local client data.
Finally, once a good model is selected, it goes through a standard model launch process, including manual quality assurance, live A/B testing (usually by using the new model on some devices and the previous generation model on other devices to compare their in-vivo performance), and a staged rollout (so that poor behavior can be discovered and rolled back before affecting too many users). The specific launch process for a model is set by the owner of the application and is usually independent of how the model is trained. In other words, this step would apply equally to a model trained with federated learning or with a traditional datacenter approach.
1-1-2 A Typical Federated Training Process (Step 4 Above)
We now consider a template for FL training that encompasses the Federated Averaging algorithm of McMahan et al. and many others.
A server (service provider) orchestrates the training process, by repeating the following steps until training is stopped (at the discretion of the model engineer who is monitoring the training process):
The server samples from a set of clients meeting eligibility requirements. For example, mobile phones might only check in to the server if they are plugged in, on an unmetered wi-fi connection, and idle, in order to avoid impacting the user of the device.
The selected clients download the current model weights and a training program (e.g. a TensorFlow graph) from the server.
Each selected device locally computes an update to the model by executing the training program, which might for example run SGD on the local data (as in Federated Averaging).
The server collects an aggregate of the device updates. For efficiency, stragglers might be dropped at this point once a sufficient number of devices have reported results. This stage is also the integration point for many other techniques which will be discussed later, possibly including: secure aggregation for added privacy, lossy compression of aggregates for communication efficiency, and noise addition and update clipping for differential privacy.
The server locally updates the shared model based on the aggregated update computed from the clients that participated in the current round.
Table 2 gives typical order-of-magnitude sizes for the quantities involved in a typical federated learning application on mobile devices.
1-2 Federated Learning Research
Needless to say, most researchers working on federated learning problems will likely not be deploying production FL systems, nor have access to fleets of millions of real-world devices.
The need for simulation also has ramifications for the presentation of FL research. While not intended to be authoritative or absolute, we make the following modest suggestions for presenting FL research that addresses the open problems we describe:
It is important to precisely describe the details of the particular FL setting of interest, particularly when the proposed approach makes assumptions that may not be appropriate in all settings.
Of course, details of any simulations should be presented in order to make the research reproducible.
Privacy and communication efficiency are always first-order concerns in FL, even if the experiments are simulations running on a single machine using public data.
Section 2 builds on the ideas in Table 1, exploring other FL settings and problems beyond the original focus on cross-device settings.
Section 3 then turns to core questions around improving the efficiency and effectiveness of federated learning
Section 4 undertakes a careful consideration of threat models and considers a range of technologies toward the goal of achieving rigorous privacy protections. As with all machine learning systems, in federated learning applications there may be incentives to manipulate the models being trained, and failures of various kinds are inevitable;
These challenges are discussed in Section 5.
Finally, we address the important challenges of providing fair and unbiased models in Section 6.