**Theory and Algorithms in Data Science**

This seminar provides a forum for researchers working in foundational areas of Data Science (and related fields). All talks are held at the Alan Turing Institute. For a complete list of talks held in previous years, please click here.

The organizers are
Mihai Cucuringu and
Vidit Nanda.
We represent the
Low-dimensional structure as well as the
Topology and geometry research interest groups within the Turing Institute. **If you'd like to speak in this seminar,
please email Vidit at vnanda@turing.ac.uk**

# Upcoming Talks

## 30 Apr 2019: **Haeran Cho**, University of Bristol

## 09 May 2019: **Degui Li**, University of York

## 16 May 2019: **Ioannis Kosmidis**, University of Warwick

## 11 Jun 2019: **Jia Chen**, University of York

# Recent Talks

## 26 Mar 2019: **David Barber**, University College London

** Speaker **: David Barber, University College London

** Title **: Spread Divergences

**Abstract**: For distributions p and q with different support, the divergence D(p||q) may not exist. We define a spread divergence on modified p and q and describe sufficient conditions for the existence of such a divergence. We give examples of using a spread divergence to train implicit generative models, including linear models (Principal Components Analysis and Independent Components Analysis) and non-linear models (Deep Generative Networks). We show how a general privacy preserving machine learning mechanism is a natural application of the spread divergence.

**Room and Time**: Augusta, 14h - 15h

## 12 Mar 2019: **Jack Jewson**, University of Warwick

** Speaker **: Jack Jewson, University of Warwick

** Title **: Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with beta-Divergences

**Abstract**: We present the first robust Bayesian Online Changepoint Detection algorithm through General Bayesian Inference (GBI) with beta-divergences. The resulting inference procedure is doubly robust for both the parameter and the changepoint (CP) posterior, with linear time and constant space complexity. We provide a construction for exponential models and demonstrate it on the Bayesian Linear Regression model. In so doing, we make two additional contributions: Firstly, we make GBI scalable using Structural Variational approximations that are exact as beta goes to 0. Secondly, we give a principled way of choosing the divergence parameter beta by minimizing expected predictive loss on-line. Reducing False Discovery Rates of CPs from over 90% to 0% on real world data, this offers the state of the art.

Joint work with Jeremias Knoblauch and Theo Damoulas

**Room and Time**: Enigma, 13h - 14h

## 07 Mar 2019: **Edgar Dobriban**, University of Pennsylvania

** Speaker **: Edgar Dobriban, University of Pennsylvania

** Title **: How to deal with big data? Understanding large-scale distributed regression

**Abstract**: Modern massive datasets pose an enormous computational burden to practitioners. Distributed computation has emerged as a universal approach to ease the burden: Datasets are partitioned over machines, which compute locally, and communicate short messages. Distributed data also arises due to privacy reasons, such as with medical databases. It is important to study how to do statistical inference and machine learning in a distributed setting. In this talk, we present results about one-step parameter averaging in statistical linear models under data parallelism. We do linear regression on each machine, and take a weighted average of the parameters. How much do we lose compared to doing linear regression on the full data? Here we study the performance loss in estimation error, test error, and confidence interval length in high dimensions, where the number of parameters is comparable to the training data size. We discover several key phenomena. First, averaging is not optimal, and we find the exact performance loss. Second, different problems are affected differently by the distributed framework. Estimation error and confidence interval length increases a lot, while prediction error increases much less. These results match numerical simulations and a data analysis example. To derive these results, we rely on recent results from random matrix theory, where we also develop a new calculus of deterministic equivalents as a tool of broader interest.

**Room and Time**: Augusta, 14h-15h

## 28 Feb 2019: **Seok Young Hong**, University of Nottingham

** Speaker **: Seok Young Hong, University of Nottingham

** Title **: Nonparametric estimation of infinite order regression and its application to finance.

**Abstract**: In this walk, we study nonparametric estimation of the infinite order regression with stationary and weakly dependent data. We propose a Nadaraya-Watson type estimator that operates with an infinite number of conditioning variables. We propose a bandwidth sequence that shrinks the effects of long lags, so the influence of all conditioning information is modelled in a natural and flexible way, and the econometric issues of omitted information bias and specification error are effectively handled. We establish the asymptotic properties of the estimator under a wide range of static and dynamic regressions framework, thereby allowing various kinds of conditioning variables to be used. We establish pointwise/uniform consistency and the CLTs. We show that the convergence rates are at best logarithmic, and depend on the smoothness of the regression, the distribution of the marginal regressors and their dependence structure in a non-trivial way via the Lambert W function. As an application, the proposed theory is applied to investigate an important question in finance. We report some new empirical evidence on the dynamics of risk-return relation over time and its link with the macroeconomy, and also add supporting evidence for explaining some major puzzles in finance.

**Room and Time**: Enigma, 14h-15h

## 21 Feb 2019: **Weining Wang**, City, University of London

** Speaker **: Weining Wang, City, University of London

** Title **: LASSO-Driven Inference in Time and Space

**Abstract**: We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak dependence. A sequence of large-scale regressions with LASSO is applied to reduce the dimensionality, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the $Z$-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure.
Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors.

**Room and Time**: Augusta, 14h - 15h

## 19 Feb 2019: **Tengyao Wang**, University College London

** Speaker **: Tengyao Wang, University College London

** Title **: Isotonic regression in general dimensions

**Abstract**: We study the least squares regression function estimator over the class of real-valued functions on [0,1]^d that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order n^{-min(2/(d+2),1/d)} in the empirical L_2 loss, up to poly-logarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on k hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of (k/n)^{min(1,2/d)}, again up to poly-logarithmic factors. Previous results are confined to the case d \leq 2. Finally, we establish corresponding bounds (which are new even in the case d=2) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to poly-logarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.

**Room and Time**: Augusta, 11h-12h

## 14 Feb 2019: **Lyudmila Grigoryeva**, Universität Konstanz

** Speaker **: Lyudmila Grigoryeva, Universität Konstanz

** Title **: Reservoir Computing (RC). Part II: Applicability of RC systems with unbounded inputs and their empirical performance.

**Abstract**: We show an extension of the results presented in the previous talk to unbounded inputs which is a particularly natural setup for stochastic signals. We complement the universality results with an empirical assessment of performance in applications to time series forecasting. In particular, we focus on predicting financial realized covariance matrices which is a question of much importance for practitioners and researchers. We examine the empirical performance of RC in comparison with many conventional state-of-the-art econometric models for various realized covariance estimators, periods, and dimensions. We show that universal RC families consistently demonstrate superior predictive ability for various empirical designs. We conclude the presentation outlining the work in progress and future lines of research.

**Room and Time**: Augusta, 14:40h-15:15h

## 14 Feb 2019: **Juan-Pablo Ortega**, Universität St. Gallen + Centre National de Recherche Scientifique

** Speaker **: Juan-Pablo Ortega, U St Gallen + CNRS

** Title **: Reservoir Computing (RC). Part I: RC paradigm and universal approximation properties of RC systems.

**Abstract**: A relatively recent family of dynamic machine learning paradigms known collectively as reservoir computing is presented which is capable of unprecedented performances in the forecasting of deterministic and stochastic processes. We then focus on the universal approximation properties of the most widely used families of reservoir computers in applications. These results are a much awaited generalization to the dynamic
context of previous well-known static results obtained in the context of neural networks. The universal approximation properties with respect to L^\infty and L^p -type criteria of three important families of reservoir computers with stochastic discrete-time semi-infinite inputs are shown. First, it is proved that linear reservoir systems with either polynomial or neural network readout maps are universal. More importantly, it is proved that the same property holds for two families with linear readouts, namely, state-afine systems and echo state networks. The linearity in the readouts is a key feature in supervised machine learning applications. It guarantees that these systems can be used in high-dimensional situations and in the presence of large datasets.

**Room and Time**: Augusta, 14h - 14:35h

## 07 Feb 2019: **Xinghao Qiao**, London School of Economics

** Speaker **: Xinghao Qiao, London School of Economics

** Title **: A General Theory for Large-Scale Curve Time Series via Functional Stability Measure

**Abstract**: Modelling a large bundle of curves arises in a broad spectrum of real applications. However,
existing literature relies primarily on the critical assumption of independent curve observations.
In this talk, we provide a general theory for large-scale Gaussian curve time series, where the temporal
and cross-sectional dependence across multiple curve observations exist and the number of functional variables,
p, may be large relative to the number of observations, n. We propose a novel functional stability measure
for multivariate stationary processes based on their spectral properties and use it to establish some useful
concentration bounds on the sample covariance matrix function. These concentration bounds serve as a
fundamental tool for further theoretical analysis, in particular, for deriving nonasymptotic upper bounds
on the errors of the regularized estimates in high dimensional settings. As functional principle component
analysis (FPCA) is one of the key techniques to handle functional data, we also investigate the concentration
properties of the relevant estimated terms under a FPCA framework. To illustrate with an important application,
we consider vector functional autoregressive models and develop a regularization approach to estimate
autoregressive coefficient functions under the sparsity constraint. Using our derived nonasymptotic results,
we investigate the theoretical properties of the regularized estimate in a "large p, small n" regime. The
finite sample performance of the proposed method is examined through simulation studies.

**Room and Time**: Augusta, 14h-15h

## 05 Feb 2019: **Clifford Lam**, London School of Economics

** Speaker **: Clifford Lam, London School of Economics

** Title **: Testing for High-Dimensional White Noise

**Abstract**: Testing for white noise is a classical yet important problem in statistics, especially for diagnostic checks in time series modeling and linear regression. For high-dimensional time series in the sense that the dimension p is large in relation to the sample size T, the popular omnibus tests including the multivariate Hosking and Li-McLeod tests are extremely conservative, leading to substantial power loss. To develop more relevant tests for high-dimensional cases, we propose a portmanteau-type test statistic which is the sum of squared singular values of the first q lagged sample autocovariance matrices. It, therefore, encapsulates all the serial correlations (upto the time lag q)
within and across all component series. Using the tools from random matrix theory and assuming both p and T diverge to infinity, we derive the asymptotic normality of the test statistic under both the null and a specific VMA(1) alternative hypothesis. As the actual implementation of the test requires the knowledge of three characteristic constants of the population cross-sectional covariance matrix and the value of the fourth moment of the standardized innovations, non-trivial estimations are proposed for these parameters and their integration leads to a practically usable test. Extensive simulation confirms the excellent finite-sample performance of the new test with accurate size and satisfactory power for a large range of finite (p, T) combinations, therefore ensuring wide applicability in practice. We also talk about alternative tests using maximal correlations. Results can be used even for nonlinear processes and non-iid data.

**Room and Time**: Augusta, 14h-15h

## 29 Jan 2019: **Xiaowen Dong**, University of Oxford

** Speaker **: Xiaowen Dong, University of Oxford

** Title **: Learning Quadratic Games on Networks

**Abstract**: Individuals, or organisations, cooperate with or compete against one another in
a wide range of practical situations. In the economics literature, such strategic
interactions are often modelled as games played on networks, where an
individual's payoff depends not only on her action but also that of her neighbours.
The current literature has largely focused on analysing the characteristics of
network games in the scenario where the structure of the network, which is
represented by a graph, is known beforehand. It is often the case, however,
that the actions of the players are readily observable while the underlying
interaction network remains hidden. In this talk, I will introduce two novel
frameworks for learning, from the observations on individual actions, network
games with linear-quadratic payoffs, and in particular the structure of the
interaction network. Both frameworks are based on the Nash equilibrium of
such games and involve solving a joint optimisation problem for the graph
structure and the individual marginal benefits. We test the proposed frameworks
on both synthetic and real world examples, and show that they can effectively
and more accurately learn the games than the baselines. The proposed
approach has both theoretical and practical implications for understanding
strategic interactions in a network environment.

**Room and Time**: TBA, 14h - 15h