**Theory and Algorithms in Data Science**

This seminar provides a forum for researchers working in foundational areas of Data Science (and related fields). All talks are held at the Alan Turing Institute.

The organizers are
Mihai Cucuringu,
Armin Eftekhari,
Vidit Nanda and
Hemant Tyagi.
We represent the
Low-dimensional structure as well as the
Topology and geometry research interest groups within the Turing Institute. **If you'd like to speak in this seminar,
please email Vidit at vnanda@turing.ac.uk**

# Upcoming Talks

## 21 May 2018: **Volkhan Cevher**, EPFL

** Speaker **: Volkhan Cevher,
École Polytechnique Fédérale de Lausanne

** Title **: Trade-offs in resource constrained machine learning systems

**Abstract**: Massive data poses a fundamental challenge to learning algorithms,
which is captured by the following computational dogma: the running time of an
algorithm increases with the size of its input data. The available computational
power, however, is growing slowly relative to data sizes. Hence, large-scale machine
learning problems of interest require increasingly more time to solve.

Our research demonstrates that this dogma is false in general, and supports an emerging perspective in computation: data should be treated as a resource that can be traded off with other resources, such as running time. For data acquisition and communications, we have also shown related sampling, energy, and circuit area trade-offs.

This talk will summarize our work confronting these challenges by building on the new mathematical foundations on how we generate data via sampling, how we set up learning objectives that govern our fundamental goals, and how we optimize these goals to obtain solutions and to make optimal decisions. We then demonstrate task-specific, end-to-end trade-offs (e.g., samples, power, computation, storage, and statistical precision) in broad domains.

**Room and Time**: Augusta, 14h-15h

## 23 May 2018: **Paul Constantine**, University of Colorado Boulder

** Speaker **: Paul Constantine,
University of Colorado Boulder

** Title **: Tools and techniques for subspace-based parameter space dimension
reduction

**Abstract**:
Scientists and engineers use computer simulations to study relationships between a physical model's input parameters and its
output predictions. However, thorough parameter studies---e.g., constructing response surfaces, optimizing, or averaging---are
challenging, if not impossible, when the simulation is expensive and the model has several inputs. To enable parameter studies
in these instances, the engineer may attempt to reduce the dimension of the model's input parameter space using techniques such as
sensitivity analysis or variable screening to identify unimportant variables that can be fixed for model analysis. Generalizing
classical coordinate-based reduction, there are several emerging subspace-based dimension reduction tools that identify important
directions in the input parameter space with respect to a particular model output. I will motivate and provide an overview of
subspace-based dimension reduction techniques and discuss strategies for exploiting such low-dimensional structures---including
analysis and computation---to enable otherwise infeasible parameter studies. For more information,
see activesubspaces.org.

**Room and Time**: Augusta, 14h-15h

## 25 Jun 2018: **Stephen Wright**, University of Wisconsin-Madison

** Speaker **: Stephen J Wright,
University of Wisconsin-Madison

** Title **: TBA

**Abstract**: TBA

**Room and Time**: Enigma, 14h-15h

# Past Talks

## 15 May 2018: **Heng Guo**, University of Edinburgh

** Speaker **: Heng Guo,
University of Edinburgh

** Title **: A polynomial-time approximation algorithm for all-terminal network reliability

**Abstract**: All-terminal network reliability is the probability that, in a
undirected graph, assuming each edge fails independently, the remaining graph is
still connected. We will present a fully polynomial-time randomised approximation
scheme (FPRAS) for this problem. Our main contribution is to confirm a conjecture
by Gorodezky and Pak (Random Struct. Algorithms, 2014), that the expected running
time of the "cluster-popping" algorithm in bi-directed graphs is bounded by a
polynomial in the size of the input.

Joint work with Mark Jerrum (QMUL)

**Room and Time**: Augusta, 15h-16h

## 30 Apr 2018: **Matteo Barigozzi**, London School of Economics

** Speaker **: Matteo Barigozzi,
London School of Economics

** Title **: Sequential testing for structural stability in approximate factor models

**Abstract**: We develop an on-line monitoring procedure to detect a change in a
large approximate factor model. Our statistics are based on a well-known property of the
(r + 1)-th eigenvalue of the sample covariance matrix of the data (having defined r as the
number of common factors): whilst under the null the (r + 1)-th eigenvalue is bounded,
under the alternative of a change (either in the loadings, or in the number of factors
itself) it becomes spiked. Given that the sample eigenvalue cannot be estimated consistently
under the null, we regularise the problem by randomising the test statistic in conjunction
with sample conditioning, obtaining a sequence of i.i.d., asymptotically chi-square statistics
which are then employed to build the monitoring scheme. Numerical evidence shows that our
procedure works very well in finite samples, with a very small probability of false detections
and tight detection times in presence of a genuine change-point.

**Room and Time**: Augusta, 14h-15h

## 27 Apr 2018: **Guy Nason**, University of Bristol

** Speaker **: Guy Nason, University of Bristol

** Title **: Network Time Series

**Abstract**:
A network time series is a multivariate time series where the individual series are known to be linked by some underlying network structure. Sometimes this network is known a priori, and sometimes the network has to be inferred, often from the multivariate series itself. Network time series are becoming increasingly common, long, and collected over a large number of variables. We are particularly interested in network time series whose network structure changes over time.

We describe some recent developments in the modeling and analysis of network time series via network autoregressive integrated moving average (NARIMA) process models. NARIMA models provide a network extension to a familiar environment that can be used to extract valuable information and aid prediction. As with classical ARIMA models, trend can impair the estimation of NARIMA parameters. The scope for trend removal is somewhat wider with NARIMA models and we exhibit some possibilities.

We will illustrate the operation of NARIMA modeling on data sets arising from human and veterinary epidemiology.

This is joint work with Kathryn Leeming (Bristol), Marina Knight (York) and Matt Nunes (Lancaster).

**Room and Time**: Augusta, 14h-15h

## 13 Apr 2018: **Hao Chen**, University of California, Davis

** Speaker **: Hao Chen, University
of California, Davis

** Title **: Change-point Detection for Multivariate and Object Data

**Abstract**:
After observing snapshots of a network, can we tell if there has been a change in dynamics? We developed a general
nonparametric framework for change-point detection utilizing similarity among observations. This new approach, which relies
on graph-based tests, can be applied to high-dimensional data, as well as data from non-Euclidean sample spaces,
such as network data. In addition, this framework is combined with a new way of permutation, circular block permutation with
a random starting point, to further deal with local dependence, which is common in such data sequences. Analytic approximations
to the significance of this graph-based scan statistics are derived for
both the independence and local dependence scenarios, facilitating their
applications to real datasets. We illustrate the methods through the
analysis of a phone-call network from the MIT Reality Mining.

**Room and Time**: Augusta, 14h-15h

## 06 Apr 2018: **François-Xavier Briol**, University of Warwick

** Speaker **: François-Xavier Briol, University of Warwick

** Title **: Bayesian Quadrature for Multiple Related Integrals.

**Abstract**: Bayesian probabilistic numerical methods are a set of tools
providing posterior distributions on the output of numerical methods. The use
of these methods is usually motivated by the fact that they can represent our
uncertainty due to incomplete/finite information about the continuous mathematical
problem being approximated. In this work, we demonstrate that this paradigm can
provide additional advantages, such as the possibility of transferring information
between several numerical methods. This allows users to represent uncertainty in
a more faithful manner and, as a by-product, provide increased numerical efficiency.
We propose the first such numerical method by extending the well-known Bayesian
quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.

**Room and Time**: Augusta, 14h-15h

## 27 Mar 2018: **Pranay Seshadri**, University of Cambridge

** Speaker **: Pranay Seshadri, University of Cambridge

** Title **: On Dimension Reduction Investigations in Turbomachinery

**Abstract**: Blades in modern jet engines are parameterized by 20-300 design variables. It is impossible to visualize, and more importantly, effectively explore such vast design spaces. Yet such an exploration may be required not solely for optimization and uncertainty quantification endeavors, but more importantly, for understanding the physics that underpins key designs characteristics. For instance, when carrying out an optimization, the designs that are often associated with high efficiency are likely to be different from those associated with high flow capacity and high pressure ratio. Likewise, designs that are insensitive to variations in operational boundary conditions are likely to be different from those that are sensitive under the very same conditions. So what is the physical rationale for different design characteristics and how do we identify them?

Our goal is to be able to identify key linear combinations of designs that influence certain outputs, especially when the total number of design variables is large. These salient concerns motivate an output-based—i.e., efficiency, flow capacity or pressure ratio—dimension reduction strategy. In this talk, a new algorithm for subspace-based dimension reduction is introduced. It combines manifold optimization with classical Gaussian process regression. The fundamental idea we exploit is that for many physics-driven problems, outputs can be expressed as a non-linear function of a few linear combinations of all the variables. There are parallels between this idea (known in the applied mathematics community under the moniker of `ridge functions'), neural networks, active subspaces and classical sufficient dimension reduction methods. We test our algorithm on a range of different data sets where the number of dimensions vary from 2 to 200. Our results demonstrate that if the problem does admit a ridge structure, our technique provides near optimal recovery. Finally, we describe how the computed dimension reducing subspace can be used for optimization, uncertainty quantification and more importantly for gaining physical insight into turbomachinery fluid mechanics.

**Room and Time**: Augusta, 15h-16h

## 27 Mar 2018: **Wei Dai**, Imperial College London

** Speaker **: Wei Dai, Imperial College London

** Title **: Blind Estimation of Origin-Destination Flows in Networks

**Abstract**: This work focuses on the inverse problem of estimating origin-destination (OD) flows from link flows. OD flow estimation is essential to many network analysis tasks, such as urban planning, as OD information specifies the travel demands in the network. Approaches in the literature for OD estimation typically require some prior information of the OD flows as well as the mapping from OD flows to link flows. In particular, a linear model is often used to map OD flows to link flows and the corresponding linear operator is commonly referred to as the traffic assignment. In practice, prior information of the traffic assignment can be inaccurate and incomplete. Furthermore, even with full information of the traffic assignment, the OD flow estimation problem is still ill-posed and therefore further prior information of OD flows is required.

The main contribution of this work is estimating OD flows with no or little prior information. Inspired by the terminology blind deconvolution in signal processing community, we refer to the OD estimation without knowledge of traffic assignment as blind estimation. Different from the literature where linear systems are built on OD flows, a new linear model is introduced of which the inquired flows are specified with only their origins but not their destinations. The new model preserves the OD information and at the same time reduces the dimension of the inquired flows in orders of magnitude. Due to this dimension reduction, the OD flow estimation problem becomes well-posed when the corresponding traffic assignment information is available. More importantly it is possible to estimate the OD flows and the traffic assignment simultaneously with no prior information of them. Detailed discussions are provided regarding the uniqueness of solutions for the simultaneous estimation. Simulations are presented for the demonstration purpose.

**Room and Time**: Augusta, 14h-15h

## 20 Mar 2018: **Jacek Brodzki**, University of Southampton

** Speaker **: Jacek Brodzki, University of Southampton

** Title **: Geometry, Topology, and the structure of data

**Abstract**: Modern data is astonishing in its variety, and is far removed from anything that
could be held in spreadsheet or analysed using traditional methods. Through intense
research effort, topology has emerged as a source of novel methodology to provide
insight into the structure of very complex, high dimensional data. It provided us
with tools like persistent homology, which are used to compute numerical topological
characteristics of the data. More recently, these methods have been augmented by
geometric insights, which are valuable in capturing the structure of and
relationships between complex shapes.

In this talk, I will provide an overview of new techniques from topology and geometry and illustrate them on particular examples. We will discuss applications of persistent homology to the study of the set of CT scans of human lungs. The geometric part of the talk will concern classification of three-dimensional shapes through synchronisation.

**Room and Time**: Augusta, 13h-14h

## 06 Mar 2018: **Coralia Cartis**, University of Oxford

** Speaker **: Coralia Cartis, University of Oxford

** Title **: Complexity of Nonconvex Optimization: The Story So Far

**Abstract**: Establishing the global rate of convergence of standard algorithms or their
global evaluation complexity for nonconvex smooth optimization problems is a natural but
challenging aspect of algorithm analysis. In the last decade, substantial progress has been
made and continues to be made, in this direction. We review some of the key developments,
illustrating the crucial role played by cubic regularisation methods, a credible alternative to
linesearch and trust-region.

We then focus on two recent results. Firstly, we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size. We investigate the worst-case evaluation complexity of these algorithms, when the level of sufficient smoothness of the objective and its derivatives may be unknown or may even be absent, and we find that the methods accurately reflect in their complexity the (unknown) degree of smoothness of the objective/derivatives and satisfy increasingly better bounds with the order of the derivatives available. We then focus on the antipodal context, when no derivatives of the objective are available, and the local models in adaptive cubic regularisation methods are constructed for example, by sampling of function values. We show that the evaluation complexity of the ensuing methods do not change in the order of the accuracy from their deterministic counterparts, increasing only by a constant factor which depends on the probability of the sampled models being occasionally, sufficiently accurate.

Time permitting, we will also discuss the stochastic case when even function evaluations may be inaccurate. This work is joint with Nick Gould (Rutherford Appleton Laboratory, UK), Philippe Toint (University of Namur, Belgium) and Katya Scheinberg (Lehigh University, USA).

**Room and Time**: Augusta, 12h-13h

## 06 Mar 2018: **Ruggerio Bellio**, University of Udine

** Speaker **: Ruggero Bellio, University of Udine

** Title **: Modern Likelihood-Frequentist Inference

**Abstract**: This talk illustrates some results of our recent paper Modern Likelihood-Frequentist
Inference, which is an exposition of the development of likelihood asymptotics obtained by many
workers. The exposition is intended to be accessible to a wide audience. It does this by charting a
narrow course through the developments over more than 30 years. This covers the likelihood ratio
approximation to the distribution of the parameter estimator, approximate ancillary conditioning,
transformation to a highly accurate approximation to the likelihood ratio test, and related topics.

The talk focuses in particular on the R package likelihoodAsy,
that accompanies the paper. The package implements some tools for likelihood inference, including in particular
the Skovgaard version of Barndorff-Nielsen's modified signed likelihood ratio test statistic r^{*}.
The main point of this is to approximate the *sample space derivatives*, that held up this
theory for a decade until the Skovgaard breakthrough. The user needs only to provide R code for the likelihood
function, for generating a sample from the model, and for the formulation of the interest parameter.
The latter is allowed to be a rather general function of the model parameter, and this is the most important
feature of the package.

This is a joint work with Donald A. Pierce, Oregon State University.

Pierce, D.A. and Bellio, R. (2017). Modern likelihood-frequentist inference. *International Statistical Review*, **85**, 519-541.

**Room and Time**: Augusta, 11h-12h

## 22 Feb 2018: **Yuji Nakatsukasa**, University of Oxford

** Speaker **: Yuji Nakatsukasa, University of Oxford

** Title **: Global optimization via eigenvalues

**Abstract**: While non-convex optimization problems are generally regarded as
difficult or computationally intractable, the Courant-Fischer theorem for symmetric
eigenvalue problems suggests that each eigenpair corresponds to a stationary point
of a non-convex optimization problem (in this case the Rayleigh quotient). More
generally, matrix eigenvalue problems can be regarded as an important class of
optimization problems that can be solved efficiently. This observation suggests
conversely that perhaps a global solution for some non-convex optimization problems
could be obtained by solving an eigenvalue problem. In this work we identify such
optimization problems, and show that they lead to a variety of eigenvalue problems,
including generalized, polynomial and multiparameter eigenvalue problems, all of
which can be solved reliably and in polynomial time, and essentially non-iteratively.
Specifically, we consider non-convex QCQP (quadratically constrained quadratic
programming) QCQP is known to be NP-hard in general (when the number of constraints
k is not fixed), and for k ≥ 2 fixed, it has been an open problem whether there are
polynomial-time solvable until recently. We show that (i) for QCQP with one
constraint, k = 1, it can be reduced to essentially a single generalized eigenvalue
problem, (ii) for QCQP with two constraints, k = 2, the problem can be reduced to
a two-parameter eigenvalue problem, and (iii) unconstrained optimization problems
with cubic regularization can be reduced to quadratic eigenvalue problems.

Based on joint work with Satoru Adachi, Satoru Iwata, Shinsaku Sakaue and Akiko Takeda (University of Tokyo).

**Room and Time**: Augusta, 14h-15h

## 20 Feb 2018: **Mu Niu**, U of Plymouth + **Pokman Cheung**, Goldman Sachs

** Speakers **: Mu Niu, University of Plymouth and Pokman Cheung, Goldman Sachs

** Title **: Intrinsic Gaussian processes on complex constrained domains

**Abstract**:
We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation
and regression on manifolds with a primary focus on complex constrained domains
or irregular-shaped spaces arising as subsets or submanifolds of R^{2},
R^{3} and beyond.
For example, in-GPs can accommodate spatial domains arising as complex subsets of
Euclidean space. in-GPs respect the potentially complex boundary or interior
conditions as well as the intrinsic geometry of the spaces. The key novelty of the
proposed approach is to utilise the relationship between heat kernels and the
transition density of Brownian motion on manifolds for constructing and
approximating valid and computation- ally feasible covariance kernels.
This enables in-GPs to be practically applied in great generality, while
existing approaches for smoothing on constrained domains are limited to simple
special cases. The broad utilities of the in-GP approach is illustrated through
simulation studies and data examples.

**Room and Time**: Augusta, 14h-15h

## 09 Jan 2018: **Sebastian Stich**, EPFL

** Speaker **: Sebastian Stich, École Polytechnique Fédérale de Lausanne

** Title **: Adaptive importance sampling for convex optimization

**Abstract**: In the last years, randomized optimization algorithms, and especially
coordinate descent methods attracted more and more attention in the optimization
community. These methods often replace the efficient -- but often relatively expensive --
iterations of the classical methods with much cheaper -- but less efficient -- updates.
Consequently, such methods can be applied to problems of very big size. This application
is sometimes also theoretically justified by very attractive worst-case efficiency
guarantees.

In this talk, we will first provide a tutorial-like introduction to random decent methods and review their complexity. We will discuss variable metric or quasi-Newton type of methods [1] and accelerated schemes [2]. In the second part of the talk we further introduce the concept of (adaptive) importance sampling and present a scheme that computes the optimal sampling frequencies based on safe estimates of the gradient [3]. The proposed sampling is provably the best sampling with respect to the given bounds and always better than uniform sampling and fixed importance sampling. We conclude by discussing open problems in this field of research.

[1] Joint work with B. Gartner, C. Muller (Math. Prog. 2016)

[2] Joint work with Y. Nesterov (SIAM J. Opt. 2017)

[3] Joint work with A. Raj, M. Jaggi (NIPS 2017)

**Room and Time**: Augusta, 14h-15h

# (2017)

## 20 Dec 2017: **Nicolas Boumal**, Princeton University

** Speaker **: Nicolas Boumal, Princeton University

** Title **: Near optimal bounds for phase synchronization

**Abstract**: Phase synchronization consists in estimating n phases (unit-modulus complex numbers) given measurements of all relative phases. We assume the measurements are corrupted by i.i.d. Gaussian noise. The maximum likelihood estimator is the solution to a non-convex optimization problem, NP-hard in general. Yet, it has been observed some years ago that the maximum likelihood estimator can be computed in polynomial time through semidefinite relaxation, even in the face of tremendous amounts of noise.

In this talk, I will present proof techniques that allow to explain this observation. This is based on the generalized power method, replicas to reduce statistical dependence, and dual certificates.

Joint work with Yiqiao Zhong, Afonso Bandeira and Amit Singer.

**Room and Time**: Augusta, 11h-12h

## 20 Dec 2017: **Rajen Shah**, University of Cambridge

** Speaker **: Rajen Shah, University of Cambridge

** Title **: The xyz algorithm for fast interaction search in high-dimensional data

**Abstract**: When performing regression on a dataset with p variables, it is often of interest to go beyond using main effects and include interactions as products between individual variables. However, if the number of variables p is large, as is common in genomic datasets, the computational cost of searching through O(p^2) interactions can be prohibitive. In this talk I will introduce a new randomised algorithm called xyz that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. The underlying idea is to transform interaction search into a much simpler closest pair problem. We will see how strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(p^u) operations for 1 < u < 2 depending on their strength. An application of xyz to a genome-wide association study shows how more than 10^11 interactions can be screened in minutes using a standard laptop. This is joint work with Gian Thanei and Nicolai Meinshausen (ETH Zurich).

**Speaker Bio**: Rajen Shah is a lecturer in Statistics at the Statistical Laboratory, University of Cambridge, having completed his PhD under the supervision of Richard Samworth (also at Cambridge) in 2014. His main research interests are in high-dimensional statistics and large-scale data analysis. Rajen is currently an Associate Editor for the Journal of the Royal Statistical Society Series B, and a Turing Fellow.

**Room and Time**: Augusta, 14h-15h

## 28 Nov 2017: **Adilet Otemissov**, University of Oxford

** Speaker **: Adilet Otemissov, University of Oxford

** Title **: Dimensionality reduction techniques for global optimization.

**Abstract**: **Abstract**: (Joint work with Coralia Cartis) The problem of finding the most
extreme value of a function, also known as global optimization, is a challenging
task. The difficulty is associated with the exponential increase in the computational
time for a linear increase in the dimension. This is known as the ``curse of
dimensionality''. In this talk, we demonstrate that such challenges can be overcome
for functions with low effective dimensionality – functions which are constant along
certain linear subspaces. Such functions can often be found in applications, for
example, in hyper-parameter optimization for neural networks, heuristic algorithms
for combinatorial optimization problems and complex engineering simulations.

We propose the use of random subspace embeddings within a(ny) global minimisation algorithm, extending the approach in Wang et al. (2013). We introduce two techniques (REGO and AREGO) that transform the high-dimensional optimization problem into a low-dimensional problem. REGO is formulated by adding constraints in the low-dimensional embedded space and AREGO by including constraints in the original space. We derive probabilistic bounds on the success of the dimension-reduced problems and present encouraging numerical results.

**Speaker Bio**: Adilet is a second year DPhil Student at the Alan Turing Institute and University of Oxford. He graduated from Nazarbayev University with a BSc in Mathematics and then pursued a master's degree at the University of Manchester. His research interests include high-dimensional non-convex and global optimization.

**Room and Time**: Augusta, 13h-14h

## 21 Nov 2017: **Rodrigo Mendoza Smith**, University of Oxford

** Speaker **: Rodrigo Mendoza Smith, University of Oxford

** Title **: Embarrassingly parallel reduction of boundary matrices in Persistent Homology

**Abstract**: Topological Data Analysis is a relatively new paradigm in
data-science that models datasets as point-clouds sampled from a shape embedded
in an Euclidean space. The inference problem is to estimate the essential
topological features of the underlying shape from a point-cloud sampled from
it. This is done through a technique called persistent homology which is a
mathematical formalism based on algebraic topology. A necessary step in the
persistent homology pipeline is the reduction a so-called boundary matrix,
which is a process reminiscent to Gaussian elimination. In this talk, I present
a number of structural dependencies in boundary matrices and use them to design
a novel embarrassingly parallel algorithm for their reduction. Simulations on
synthetic examples show that the computational burden can be conducted in a
small fraction of the number of iterations needed by traditional methods. For
example, numerical experiments show that for a boundary matrix with 10^4
columns, the reduction completed to within 1% in about ten iterations as
opposed to nearly approximately eight thousand iterations for traditional
methods.

**Speaker Bio**: Rodrigo Mendoza Smith is a DPhil candidate in Applied Mathematics working with Prof. Jared Tanner. He is affiliated with the Numerical Analysis Group in Oxford and with the Alan Turing Institute in London. He is working on the design of provably fast numerical algorithms for sparse and low-rank data models. His research interests lie in numerical computing, compressed-sensing, optimisation, machine-learning, network theory and computational algebraic topology. His motivation and long term goal is to contribute to the understanding of Artificial Intelligence and its interplay with geometry, statistics and computability

**Room and Time**: Augusta, 14h-15h

## 16 Nov 2017: **Guillame Lecue**, CREST + ENSAE

** Speaker **: Guillame Lecue, Center for Research in Economics and Statistics

** Title **: Learning from MOM’s principles

**Abstract**: (Joint work with Matthieu Lerasle)
We obtain theoretical and practical performances for median of means estimators.

From a theoretical point of view, estimation and prediction error bounds achieved by the MOM estimators hold with exponentially large probability -- as in the gaussian framework with independent noise-- under only weak moments assumptions on the data and without assuming independence between the noise and the design. Moreover, MOM procedures are robust since a large part of the data may have nothing to do with the oracle we want to reconstruct. Our general risk bound is of order max(minimax rate of convergence in the i.i.d. setup, (number of outliers)/number of observations)). In particular, the number of outliers may be as large as (number of data)*(minimax rate) without affecting the statistical properties of the MOM estimator.

A regularization norm may also be used together with the MOM criterium. In that case, any norm can be used for regularization. When it has some sparsity inducing power we recover sparse rates of convergence and sparse oracle inequalities. For example, the minimax rate s log(d/s)/N of recovery of a s-sparse vector in R^d is achieved by a median-of-means version of the LASSO when the noise has q_0 moments for some q_0>2, the design matrix C_0\log(d) moments and the dataset is corrupted by s log(d/s) outliers. This result holds with exponentially large probability as if the noise and the design were i.i.d. Gaussian random variables.

On the practical side, MOM estimators (and their associated regularized versions) can easily be implemented. Actually, most gradient descent algorithms used to implement (non-robust) estimators like the LASSO can easily be transformed into a robust one by using a MOM approach.

**Room and Time**: Augusta, 15h-16h

## 14 Nov 2017: **Piotr Fryzlewicz**, London School of Economics

** Speaker **: Piotr Fryzlewicz,
London School of Economics

**Title**: Multiscale thinking in data analysis, recursive algorithms, and data-adaptive change-point detection

**Abstract**: The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints.

We then turn more specific and study the canonical problem of a-posteriori detection of multiple change-points in the mean of a piecewise-constant signal observed with noise. Even in this simple set-up, many publicly available state-of-the-art methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods "learn" from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier change-point detection algorithm, Wild Binary Segmentation, and make it data-adaptive by equipping it with a recursive mechanism for deciding "on the fly" how many sub-samples of the input data to draw, and where to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent change-points.

**Bio**: Piotr Fryzlewicz is a professor of statistics in the Department of Statistics at the London School of Economics. He has previously worked at Winton Capital Management, the University of Bristol and Imperial College London. His research interests are in multiscale modelling and estimation, time series, change-point detection, high-dimensional statistical inference and statistical learning. He currently serves as Joint Editor of the Journal of the Royal Statistical Society Series B.

**Room and Time**: Augusta, 14h-15h

## 07 Nov 2017: **Quentin Berthet**, University of Cambridge

** Speaker **: Quentin Berthet, University of Cambridge

** Title **: Bandit Optimization with Upper-Confidence Frank-Wolfe

**Abstract**: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we introduce the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We show upper bounds on the optimization error of this algorithm over various classes of functions, and discuss the optimality of these results.
This is joint work with Vianney Perchet (ENS Paris-Saclay and Criteo Research)

**Speaker Bio**: Quentin Berthet is a Turing Fellow and a Lecturer in the Statslab, in the DPMMS at the University of Cambridge, since 2015. He is a former student of the Ecole Polytechnique, received a Ph.D. from Princeton University in 2014, and was a CMI postdoctoral fellow at Caltech. His research interests are in Statistics, Optimization and Computer Science.

**Room and Time**: Augusta, 14h-15h

## 01 Nov 2017: **Yiannis Koutis**, New Jersey Institute of Technology

** Speaker **: Yiannis Koutis,
New Jersey Institute of Technology

**Title**: A survey of provably fast linear system solvers

**Abstract**: The design of fast linear system solvers has been a very active research area for decades. Solvers based on various approaches, from multigrid methods to incomplete matrix factorization techniques have been quite successful in practical engineering problems, but lack fast convergence guarantees for broader classes of unstructured linear systems that are common in modern data science problems. Solvers that are provably fast for broad classes of unstructured linear systems are a fairly recent discovery. We will survey this recent progress, and discuss key enabling facts from the theory of concentration bounds for random sums of positive matrices. We will also discuss known obstacles that stand on the way to progress for general linear systems.

**Bio**: Yiannis Koutis is an associate professor in the Ying Wu College of Computing at the New Jersey Institute of Technology. He completed his Ph.D. in Compute Science at Carnegie Mellon University, and got his undergraduate Diploma in Computer Engineering and Informatics from the University of Patras in Greece. Before NJIT, he was an associate professor at the University of Puerto Rico, Rio Piedras. In the past he has also worked as a Systems Scientist at Carnegie Mellon University.

**Room and Time**: Augusta, 15h-16h

## 31 Oct 2017: **Dino Sejdinovic**, University of Oxford

** Speaker **: Dino Sejdinovic,
University of Oxford

**Title**: Inference with Approximate Kernel Embeddings

**Abstract**: Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of the applications of the approximate kernel embeddings to Bayesian computation. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data. ArXiv:1703.07596

** Bio**: Dino Sejdinovic is an Associate Professor at the Department of Statistics, University of Oxford, a Fellow of Mansfield College, Oxford, and a Turing Fellow of the Alan Turing Institute. He previously held postdoctoral positions at the Gatsby Computational Neuroscience Unit, University College London and at the Department of Mathematics, University of Bristol. He received a PhD in Electrical and Electronic Engineering from the University of Bristol (2009). His research is at the interface between machine learning and statistical methodology, with a focus on nonparametric and kernel methods.

**Room and Time**: Augusta, 14h-15h

## 17 Oct 2017: **Varun Kanade**, University of Oxford

** Speaker **: Varun Kanade, University of Oxford

** Title **: From which world is your graph?

**Abstract**: Discovering statistical structure from links is a fundamental problem
in the analysis of social networks. Choosing a misspecified model, or
equivalently, an incorrect inference algorithm will result in an invalid
analysis or even falsely uncover patterns that are in fact artifacts of the
model. This work focuses on unifying two of the most widely used link-formation
models: the stochastic blockmodel (SBM) and the small world (or latent space)
model (SWM). Integrating techniques from kernel learning, spectral graph
theory, and nonlinear dimensionality reduction, we develop the first
statistically sound polynomial-time algorithm to discover latent patterns in
sparse graphs for both models. When the network comes from an SBM, the
algorithm outputs a block structure. When it is from an SWM, the algorithm
outputs estimates of each node's latent position. We report experiments
performed on a twitter dataset collected between October and November 2016.
Based on joint work with Cheng Li, Zhenming Liu and Felix Wong.

**Speaker Bio**: Varun Kanade is a Research Lecturer in the Department of Computer Science, Oxford University. He has been a Simons Postdoctoral Fellow at the University of California, Berkeley and an FSMP postdoctoral fellow at ENS, Paris. He obtained his Ph.D. from Harvard University in 2012. His research interests are in machine learning and theoretical computer science and focuses on mathematical foundations of machine learning, in particular computational and statistical tradeoffs and improving our understanding of methods that have been successful in practice.

**Room and Time**: Augusta, 14h-15h

## 09 Oct 2017: **Martin Lotz**, University of Manchester

**Speaker**: Martin Lotz, University of Manchester

**Title**: Randomized dimensionality reduction in persistent homology

**Abstract**: We discuss a few ideas related to dimensionality reduction at the point-cloud level
for the computation of persistent homology. In particular, it is shown that in certain cases, the ambient
dimension of the problem can literally be replaced with an intrinsic notion of dimension related to the
structure of the data, and that such a reduction can be computed efficiently. Some surprising connections
to the theory of suprema of Gaussian processes are presented, and potential extensions discussed.

**Speaker Bio**: Martin Lotz is a Lecturer in Mathematics at the University of Manchester. Previous to his
appointment at Manchester he was a post-doctoral researcher at the City University of Hong Kong, at the University
of Oxford, and as a Research Fellow at the University of Edinburgh, supported by a Leverhulme Trust and Seggie
Brown Fellowship. He studied Mathematics at the ETH (Swiss Federal Institute of Technology) in Zurich, and the
University of Paderborn, where he finished his doctorate in Algebraic Complexity Theory. His interests span a
wide range of mathematics and its applications. His current work explores connections between probability,
optimization, and applications in signal processing, data analysis and inverse problems. Previous work has been
in algebraic complexity theory, computational algebra and geometry, foundations of numerical analysis, and
combinatorics.

**Room and Time**: Augusta, 14h-15h

## 03 Oct 2017: **He Sun**, University of Edinburgh

**Speaker**: He Sun, University of Edinburgh

**Title**: Heat kernels in graphs: A journey from random walks to geometry, and back

**Abstract**: Heat kernels are one of the most fundamental concepts in physics and mathematics.
In physics, the heat kernel is a fundamental solution of the heat equation and connects the Laplacian
operator to the rate of heat dissipation. In spectral geometry, many fundamental techniques are based
on heat kernels. In finite Markov chain theory, heat kernels correspond to continuous-time random walks
and constitute one of the most powerful techniques in estimating the mixing time.

In this talk, we will briefly discuss this line of research and its relation to heat kernels in graphs. In particular, we will see how heat kernels are used to design the first nearly-linear time algorithm for finding clusters in real-world graphs. Some interesting open questions will be addressed in the end.

**Speaker Bio**: He Sun is a senior lecturer in Algorithms and Complexity in the School of
Informatics, University of Edinburgh. Prior to this, he held positions at the Max Planck Institute
for Informatics from 2010 to 2015, and the University of Bristol from 2015 to 2017. His main research
interests range over the fields of spectral graph theory, applied probability and statistics, matrix
analysis, and machine learning.

**Room and Time**: Augusta, 14h-15h

## 29 Sep 2017: **Greg Ongie**, University of Michigan

**Speaker**: Greg Ongie, University of Michigan

**Title**: Learning non-linear models with missing data

**Abstract**: Linear models have had great success for inference problems with missing data.
A key example is low-rank matrix completion, where the missing entries of a matrix are imputed by
exploiting low-dimensional linear structure. However, many real-world datasets fail to have linear
structure but still have low-dimensional non-linear structure that can be exploited. We introduce a
systematic framework for learning low-dimensional non-linear models in problems in missing data.
The approach has close connections to subspace clustering, manifold learning, and kernel methods
in machine learning. As a special case, we focus on the problem of matrix completion where we
assume data belongs to a low-dimensional algebraic variety, i.e., each data point is a solution to
a system of polynomial equations, and study the sampling requirements of this model. We propose an
efficient matrix completion algorithm that is able to recover synthetically generated data up to
the predicted sampling complexity bounds and outperforms traditional low-rank matrix completion
and subspace clustering approaches on real datasets in computer vision and genomics.

**Room and Time**: Enigma, 14h-15h

This talk is part of Greg Ongie's visit via the "Research in small groups" grant for Turing host: Armin Eftekhari.

## 19 Sep 2017: **Richard Samworth**, University of Cambridge

** Speaker **: Richard Samworth, University of Cambridge

** Title **: High dimensional change point estimation via sparse projection

**Abstract**: Change points are a very common feature of big data that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package 'InspectChangepoint'.

**Speaker Bio**: Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in nonparametric and high-dimensional statistics, including problems in shape-constrained inference, classification, variable selection and changepoint estimation, among others.

**Room and Time**: Augusta, 14h-15h.