Learning to control networked-coupled subsystems with unknown dynamics

Speaker: Aditya Mahajan

Date: Thu, Oct 28, 2021

Location: Online

Conference: Workshop on Mean Field Games on Networks

Subject: Mathematics

Class: Scientific

Abstract:

Large-scale systems comprising of multiple subsystems connected over a network arise in a number of applications including power systems, traffic networks, communication networks, and some economic systems. A common feature of such systems is that the state evolutions and costs are coupled, i.e., the state evolution and local cost of one subsystems depend not only on its own state and control action, but also on the state and control actions of other subsystems in the network.

We consider the problem of designing control strategies for such systems when some of the parameters of the system model are not known. Due to the unknown parameters, the control problem is also a learning problem. Directly using existing reinforcement learning algorithms on such network coupled subsystems would incur $O(n^{1.5} \sqrt{T})$ regret over a horizon $T$, where the $n$ is the number of subsystems. This superlinear dependence on $n$ is prohibitive in large scale networked systems because the regret per subsystem (which is $O(\sqrt{nT})$ grows with the size of the network.

We consider networks where the dynamics coupling may be represented by a symmetric matrix (e.g., the adjacency or Laplacian matrix corresponding to a undirected weighted graph) and the cost coupling matrix have the same eigenvalues as the dynamics coupling. We use spectral decomposition of the coupling matrices to decompose the system into (L + 1) systems which are only coupled through the noise, where L is the rank of the coupling matrix. We show that, when the system model is known, the optimal control input at each subsystem can be computing by solving (L+1) decoupled Riccati equations.

Using this structure of the planning solution, we propose a new Thompson sampling algorithm and show that its regret is bounded by $O(n\sqrt{T})$, which increases linearly in the size of the network. We present numerical simulations to illustrate our results on mean-field control and general low-rank networks.

Joint work with Shuang Gao, Sagar Sudhakara Ashutosh Nayyar, and Ouyang Yi