**Theory and Algorithms in Data Science**

This seminar provides a forum for researchers working in foundational areas of Data Science (and related fields). All talks are held at the Alan Turing Institute.

The organizers are
Mihai Cucuringu,
Armin Eftekhari,
Vidit Nanda and
Hemant Tyagi.
We represent the
Low-dimensional structure as well as the
Topology and geometry research interest groups within the Turing Institute. **If you'd like to speak in this seminar,
please email Vidit at vnanda@turing.ac.uk**

# Upcoming Talks

## 26 Jul 2018: **Yves van Gennip**, University of Nottingham

** Speaker **: Yves
van Gennip, University of Nottingham

** Title **: TBA

**Abstract**: TBA

**Room and Time**: Augusta, 14h-15h

## 28 Aug 2018: **Laurent Jacques**, Université Catholique de Louvain

** Speaker **: Laurent
Jacques, Université Catholique de Louvain

** Title **: TBA

**Abstract**: TBA

**Room and Time**: Augusta, 14h-15h

## 18 Sep 2018: **Hamza Fawzi**, University of Cambridge

** Speaker **: Hamza Fawzi,
University of Cambridge

** Title **: TBA

**Abstract**: TBA

**Room and Time**: Augusta, 14h-15h

## 27 Sep 2018: **Patrick Rebescheni**, University of Oxford

** Speaker **: Patrick Rebeschini,
University of Oxford

** Title **: Multi-Agent Learning: Implicit Regularization and Order-Optimal Gossip

**Abstract**: In distributed machine learning data is stored and processed in multiple locations by different agents. Each agent is represented by a node in a graph, and communication is allowed between neighbours. In the decentralised setting typical of peer-to-peer networks, there is no central authority that can aggregate information from all the nodes. A typical setting involves agents cooperating with their peers to learn models that can perform better on new, unseen data.

In this talk, we present the first results on the generalisation capabilities of distributed stochastic gradient descent methods. Using algorithmic stability, we derive upper bounds for the test error and provide a principled approach for implicit regularization, tuning the learning rate and the stopping time as a function of the graph topology. We also present a new Gossip protocol for the aggregation step in distributed methods that can yield order-optimal communication complexity. Based on non-reversible Markov chains, our protocol is local and does not require global routing, hence improving on existing methods. (Joint work with Dominic Richards)

**Room and Time**: Augusta, 14h-15h

# Past Talks

## 19 Jul 2018: **Yosi Keller**, Bar-Ilan University

** Speaker **: Yosi Keller, Bar-Ilan University

** Title **: Deep Spectral Graph Matching

**Abstract**: In this work we present a Deep Learning based approach for visual
correspondence estimation, by deriving a Deep spectral graph matching network. We
formulate the state-of-the-art unsupervised Spectral Graph Matching (SGM) approach,
as part of an end-to-end supervised deep learning network. Thus allowing to utilize
backpropagation to learn optimal image features, as well as algorithm parameters. For
that we present a transformation layer that converts the learned image feature, within
a pair of images, to an affinity matrix used to solve the matching problem via a new
metric loss function. The proposed scheme is shown to compare favorably with
contemporary state-of-the-art matching schemes when applied to annotated data
obtained from the PASCAL, ILSVRC, KITTI and CUB-2011 datasets. Joint work with
Yoav Liberman.

**Room and Time**: Enigma, 11h-12h

## 17 Jul 2018: **Stephane Chretien**, National Physical Laboratory

** Speaker **: Stephane Chretien,
National Physical Laboratory

** Title **: Column/Feature extraction: old and new results

**Abstract**: A problem of paramount importance in both pure (Restricted Invertibility problem) and applied mathematics (Feature extraction)
is the one of selecting a set of columns from a given matrix, such that the resulting submatrix has its smallest singular value
lying above a pre-specified level. Such problems can be addressed using many different mathematical techniques and have recently
been a topic of renewed interest, due to their strong connections to machine learning, compressed sensing, high dimensional
statistics, etc, but also, quite surprisingly, quantum mechanics via the solution to the celebrated Kadison-Singer conjecture.
In this talk, we will present an overview of the different results available in the literature and propose a new elementary
perturbation approach to derive faster algorithms for column/feature extraction.

**Room and Time**: Augusta, 13h-14h

## 12 Jul 2018: **Paolo Barucca**, University College London

** Speaker **: Paolo Barucca,
University College London

** Title **: Dynamic network models for data mining and time series prediction

**Abstract**: The global financial system can be represented as a large complex
network in which banks, hedge funds and other financial institutions are interconnected
to each other through direct and indirect financial linkages. Recently, a lot of
attention has been paid to the understanding of the mechanisms that can lead to a
breakdown of this network. This can happen when the existing financial links turn
from being a means of risk diversification to channels for the propagation of risk
across financial institutions. In this presentation, I summarize recent developments
in the modeling of financial systems. I focus in particular on dynamic network
approaches, and I also report on recent findings on the empirical structure of
interbank networks. The presentation provides a landscape of the newly arising
interdisciplinary field lying at the intersection of several disciplines, such as
network science, statistical physics, and finance.

**Room and Time**: Augusta, 1330h-1430h

## 03 Jul 2018: **Roman Vershynin**, University of California Irvine

** Speaker **: Roman Vershynin,
University of California, Irvine

** Title **: From number theory to machine learning: hunting for smooth Boolean functions

**Abstract**: The most fundamental kind of functions studied in computer science
are Boolean functions. Most Boolean functions oscillate a lot, which is
analogous to the fact that "most" continuous functions on R are nowhere
differentiable. A "smooth" Boolean function can be generated by taking the
sign of some polynomial of low degree in n variables. Such functions are
called polynomial threshold functions, and they are widely used in machine
learning as classification devices. Surprisingly, we do not know how many
low-degree polynomial threshold functions there exist! An approximate answer
to this question has been known only for polynomials of degree 1, i.e. for
linear functions. In a recent joint work with Pierre Baldi, we found a way
to approximately count polynomial threshold functions of any fixed degree.
This solves a problem of M. Saks that goes back to 1993 and much earlier.
Our argument draws insights from analytical number theory, additive
combinatorics, enumerative combinatorics, probability and discrete geometry.
I will describe some of these connections, focusing particularly on a
beautiful interplay of zeta and Mobius functions in number theory, hyperplane
arrangements in enumerative combinatorics and random tensors in probability
theory.

**Room and Time**: Augusta, 14h - 15h

## 25 Jun 2018: **Stephen Wright**, University of Wisconsin-Madison

** Speaker **: Stephen J Wright,
University of Wisconsin-Madison

** Title **: Practical conditional gradient algorithms

**Abstract**: Interest in the conditional gradient algorithm, proposed by Frank and Wolfe in 1956, has revived in recent years because of its relevance to many data science paradigms. The basic algorithm for optimizing a smooth convex function over a closed convex compact set is appealing for its simplicity and its elementary convergence theory. However, the convergence is too slow for many applications. This talk describes enhancements of conditional gradient that improve its performance in several ways. A common feature of these enhancements is the maintenance of a “basis” of extreme points of the feasible set, being the solutions of the linear oracle from a subset of previous calls. We describe a “lazy” variant in which the linearized objective is minimized only over the convex hull of this basis on most iterations, a variant in which the objective is periodically reoptimized over the convex hull of this basis, and a sparsified variant in which this basis is sometimes reduced in size. (These techniques can be applied in tandem.) We show how convergence rates are affected by these
techniques, and present computational results on a variety of applications.
This talk describes joint work with Sebastian Pokutta, Gabor Braun, Dan Tu, Nikhil Rao, and Parikshit Shah.

**Room and Time**: Enigma, 14h-15h

## 25 Jun 2018: **Will Perkins**, University of Birmingham

** Speaker **: Will Perkins,
University of Birmingham

** Title **: Information Theoretic Thresholds from the Cavity Method

**Abstract**: The stochastic block model is a benchmark model for the computational problem of graph clustering: partitioning a graph of vertices and edges into clusters of “similar” vertices. In the model we divide a set of vertices into q clusters then add edges at random with probabilities that depend on whether the two vertices joined belong to the same cluster or not. The computational problem is to (approximately) determine the original clusters given the graph structure. I will present a result that determines the threshold edge density at which the problem of detecting the clusters becomes information theoretically possible. The proof vindicates the intricate but non-rigorous “Cavity Method” from statistical physics. Based on joint work with A. Coja-Oghlan, F. Krzakala, L. Zdeborova.

**Room and Time**: Augusta, 11h-12h

## 23 May 2018: **Paul Constantine**, University of Colorado Boulder

** Speaker **: Paul Constantine,
University of Colorado Boulder

** Title **: Tools and techniques for subspace-based parameter space dimension
reduction

**Abstract**:
Scientists and engineers use computer simulations to study relationships between a physical model's input parameters and its
output predictions. However, thorough parameter studies---e.g., constructing response surfaces, optimizing, or averaging---are
challenging, if not impossible, when the simulation is expensive and the model has several inputs. To enable parameter studies
in these instances, the engineer may attempt to reduce the dimension of the model's input parameter space using techniques such as
sensitivity analysis or variable screening to identify unimportant variables that can be fixed for model analysis. Generalizing
classical coordinate-based reduction, there are several emerging subspace-based dimension reduction tools that identify important
directions in the input parameter space with respect to a particular model output. I will motivate and provide an overview of
subspace-based dimension reduction techniques and discuss strategies for exploiting such low-dimensional structures---including
analysis and computation---to enable otherwise infeasible parameter studies. For more information,
see activesubspaces.org.

**Room and Time**: Augusta, 14h-15h

## 21 May 2018: **Volkhan Cevher**, École Polytechnique Fédérale de Lausanne

** Speaker **: Volkhan Cevher,
École Polytechnique Fédérale de Lausanne

** Title **: Trade-offs in resource constrained machine learning systems

**Abstract**: Massive data poses a fundamental challenge to learning algorithms,
which is captured by the following computational dogma: the running time of an
algorithm increases with the size of its input data. The available computational
power, however, is growing slowly relative to data sizes. Hence, large-scale machine
learning problems of interest require increasingly more time to solve.

Our research demonstrates that this dogma is false in general, and supports an emerging perspective in computation: data should be treated as a resource that can be traded off with other resources, such as running time. For data acquisition and communications, we have also shown related sampling, energy, and circuit area trade-offs.

This talk will summarize our work confronting these challenges by building on the new mathematical foundations on how we generate data via sampling, how we set up learning objectives that govern our fundamental goals, and how we optimize these goals to obtain solutions and to make optimal decisions. We then demonstrate task-specific, end-to-end trade-offs (e.g., samples, power, computation, storage, and statistical precision) in broad domains.

**Room and Time**: Augusta, 14h-15h

## 15 May 2018: **Heng Guo**, University of Edinburgh

** Speaker **: Heng Guo,
University of Edinburgh

** Title **: A polynomial-time approximation algorithm for all-terminal network reliability

**Abstract**: All-terminal network reliability is the probability that, in a
undirected graph, assuming each edge fails independently, the remaining graph is
still connected. We will present a fully polynomial-time randomised approximation
scheme (FPRAS) for this problem. Our main contribution is to confirm a conjecture
by Gorodezky and Pak (Random Struct. Algorithms, 2014), that the expected running
time of the "cluster-popping" algorithm in bi-directed graphs is bounded by a
polynomial in the size of the input.

Joint work with Mark Jerrum (QMUL)

**Room and Time**: Augusta, 15h-16h

## 30 Apr 2018: **Matteo Barigozzi**, London School of Economics

** Speaker **: Matteo Barigozzi,
London School of Economics

** Title **: Sequential testing for structural stability in approximate factor models

**Abstract**: We develop an on-line monitoring procedure to detect a change in a
large approximate factor model. Our statistics are based on a well-known property of the
(r + 1)-th eigenvalue of the sample covariance matrix of the data (having defined r as the
number of common factors): whilst under the null the (r + 1)-th eigenvalue is bounded,
under the alternative of a change (either in the loadings, or in the number of factors
itself) it becomes spiked. Given that the sample eigenvalue cannot be estimated consistently
under the null, we regularise the problem by randomising the test statistic in conjunction
with sample conditioning, obtaining a sequence of i.i.d., asymptotically chi-square statistics
which are then employed to build the monitoring scheme. Numerical evidence shows that our
procedure works very well in finite samples, with a very small probability of false detections
and tight detection times in presence of a genuine change-point.

**Room and Time**: Augusta, 14h-15h

## 27 Apr 2018: **Guy Nason**, University of Bristol

** Speaker **: Guy Nason, University of Bristol

** Title **: Network Time Series

**Abstract**:
A network time series is a multivariate time series where the individual series are known to be linked by some underlying network structure. Sometimes this network is known a priori, and sometimes the network has to be inferred, often from the multivariate series itself. Network time series are becoming increasingly common, long, and collected over a large number of variables. We are particularly interested in network time series whose network structure changes over time.

We describe some recent developments in the modeling and analysis of network time series via network autoregressive integrated moving average (NARIMA) process models. NARIMA models provide a network extension to a familiar environment that can be used to extract valuable information and aid prediction. As with classical ARIMA models, trend can impair the estimation of NARIMA parameters. The scope for trend removal is somewhat wider with NARIMA models and we exhibit some possibilities.

We will illustrate the operation of NARIMA modeling on data sets arising from human and veterinary epidemiology.

This is joint work with Kathryn Leeming (Bristol), Marina Knight (York) and Matt Nunes (Lancaster).

**Room and Time**: Augusta, 14h-15h

## 13 Apr 2018: **Hao Chen**, University of California, Davis

** Speaker **: Hao Chen, University
of California, Davis

** Title **: Change-point Detection for Multivariate and Object Data

**Abstract**:
After observing snapshots of a network, can we tell if there has been a change in dynamics? We developed a general
nonparametric framework for change-point detection utilizing similarity among observations. This new approach, which relies
on graph-based tests, can be applied to high-dimensional data, as well as data from non-Euclidean sample spaces,
such as network data. In addition, this framework is combined with a new way of permutation, circular block permutation with
a random starting point, to further deal with local dependence, which is common in such data sequences. Analytic approximations
to the significance of this graph-based scan statistics are derived for
both the independence and local dependence scenarios, facilitating their
applications to real datasets. We illustrate the methods through the
analysis of a phone-call network from the MIT Reality Mining.

**Room and Time**: Augusta, 14h-15h

## 06 Apr 2018: **François-Xavier Briol**, University of Warwick

** Speaker **: François-Xavier Briol, University of Warwick

** Title **: Bayesian Quadrature for Multiple Related Integrals.

**Abstract**: Bayesian probabilistic numerical methods are a set of tools
providing posterior distributions on the output of numerical methods. The use
of these methods is usually motivated by the fact that they can represent our
uncertainty due to incomplete/finite information about the continuous mathematical
problem being approximated. In this work, we demonstrate that this paradigm can
provide additional advantages, such as the possibility of transferring information
between several numerical methods. This allows users to represent uncertainty in
a more faithful manner and, as a by-product, provide increased numerical efficiency.
We propose the first such numerical method by extending the well-known Bayesian
quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.

**Room and Time**: Augusta, 14h-15h

## 27 Mar 2018: **Pranay Seshadri**, University of Cambridge

** Speaker **: Pranay Seshadri, University of Cambridge

** Title **: On Dimension Reduction Investigations in Turbomachinery

**Abstract**: Blades in modern jet engines are parameterized by 20-300 design variables. It is impossible to visualize, and more importantly, effectively explore such vast design spaces. Yet such an exploration may be required not solely for optimization and uncertainty quantification endeavors, but more importantly, for understanding the physics that underpins key designs characteristics. For instance, when carrying out an optimization, the designs that are often associated with high efficiency are likely to be different from those associated with high flow capacity and high pressure ratio. Likewise, designs that are insensitive to variations in operational boundary conditions are likely to be different from those that are sensitive under the very same conditions. So what is the physical rationale for different design characteristics and how do we identify them?

Our goal is to be able to identify key linear combinations of designs that influence certain outputs, especially when the total number of design variables is large. These salient concerns motivate an output-based—i.e., efficiency, flow capacity or pressure ratio—dimension reduction strategy. In this talk, a new algorithm for subspace-based dimension reduction is introduced. It combines manifold optimization with classical Gaussian process regression. The fundamental idea we exploit is that for many physics-driven problems, outputs can be expressed as a non-linear function of a few linear combinations of all the variables. There are parallels between this idea (known in the applied mathematics community under the moniker of `ridge functions'), neural networks, active subspaces and classical sufficient dimension reduction methods. We test our algorithm on a range of different data sets where the number of dimensions vary from 2 to 200. Our results demonstrate that if the problem does admit a ridge structure, our technique provides near optimal recovery. Finally, we describe how the computed dimension reducing subspace can be used for optimization, uncertainty quantification and more importantly for gaining physical insight into turbomachinery fluid mechanics.

**Room and Time**: Augusta, 15h-16h

## 27 Mar 2018: **Wei Dai**, Imperial College London

** Speaker **: Wei Dai, Imperial College London

** Title **: Blind Estimation of Origin-Destination Flows in Networks

**Abstract**: This work focuses on the inverse problem of estimating origin-destination (OD) flows from link flows. OD flow estimation is essential to many network analysis tasks, such as urban planning, as OD information specifies the travel demands in the network. Approaches in the literature for OD estimation typically require some prior information of the OD flows as well as the mapping from OD flows to link flows. In particular, a linear model is often used to map OD flows to link flows and the corresponding linear operator is commonly referred to as the traffic assignment. In practice, prior information of the traffic assignment can be inaccurate and incomplete. Furthermore, even with full information of the traffic assignment, the OD flow estimation problem is still ill-posed and therefore further prior information of OD flows is required.

The main contribution of this work is estimating OD flows with no or little prior information. Inspired by the terminology blind deconvolution in signal processing community, we refer to the OD estimation without knowledge of traffic assignment as blind estimation. Different from the literature where linear systems are built on OD flows, a new linear model is introduced of which the inquired flows are specified with only their origins but not their destinations. The new model preserves the OD information and at the same time reduces the dimension of the inquired flows in orders of magnitude. Due to this dimension reduction, the OD flow estimation problem becomes well-posed when the corresponding traffic assignment information is available. More importantly it is possible to estimate the OD flows and the traffic assignment simultaneously with no prior information of them. Detailed discussions are provided regarding the uniqueness of solutions for the simultaneous estimation. Simulations are presented for the demonstration purpose.

**Room and Time**: Augusta, 14h-15h

## 20 Mar 2018: **Jacek Brodzki**, University of Southampton

** Speaker **: Jacek Brodzki, University of Southampton

** Title **: Geometry, Topology, and the structure of data

**Abstract**: Modern data is astonishing in its variety, and is far removed from anything that
could be held in spreadsheet or analysed using traditional methods. Through intense
research effort, topology has emerged as a source of novel methodology to provide
insight into the structure of very complex, high dimensional data. It provided us
with tools like persistent homology, which are used to compute numerical topological
characteristics of the data. More recently, these methods have been augmented by
geometric insights, which are valuable in capturing the structure of and
relationships between complex shapes.

In this talk, I will provide an overview of new techniques from topology and geometry and illustrate them on particular examples. We will discuss applications of persistent homology to the study of the set of CT scans of human lungs. The geometric part of the talk will concern classification of three-dimensional shapes through synchronisation.

**Room and Time**: Augusta, 13h-14h

## 06 Mar 2018: **Coralia Cartis**, University of Oxford

** Speaker **: Coralia Cartis, University of Oxford

** Title **: Complexity of Nonconvex Optimization: The Story So Far

**Abstract**: Establishing the global rate of convergence of standard algorithms or their
global evaluation complexity for nonconvex smooth optimization problems is a natural but
challenging aspect of algorithm analysis. In the last decade, substantial progress has been
made and continues to be made, in this direction. We review some of the key developments,
illustrating the crucial role played by cubic regularisation methods, a credible alternative to
linesearch and trust-region.

We then focus on two recent results. Firstly, we consider a general/new class of adaptive regularization methods, that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size. We investigate the worst-case evaluation complexity of these algorithms, when the level of sufficient smoothness of the objective and its derivatives may be unknown or may even be absent, and we find that the methods accurately reflect in their complexity the (unknown) degree of smoothness of the objective/derivatives and satisfy increasingly better bounds with the order of the derivatives available. We then focus on the antipodal context, when no derivatives of the objective are available, and the local models in adaptive cubic regularisation methods are constructed for example, by sampling of function values. We show that the evaluation complexity of the ensuing methods do not change in the order of the accuracy from their deterministic counterparts, increasing only by a constant factor which depends on the probability of the sampled models being occasionally, sufficiently accurate.

Time permitting, we will also discuss the stochastic case when even function evaluations may be inaccurate. This work is joint with Nick Gould (Rutherford Appleton Laboratory, UK), Philippe Toint (University of Namur, Belgium) and Katya Scheinberg (Lehigh University, USA).

**Room and Time**: Augusta, 12h-13h

## 06 Mar 2018: **Ruggerio Bellio**, University of Udine

** Speaker **: Ruggero Bellio, University of Udine

** Title **: Modern Likelihood-Frequentist Inference

**Abstract**: This talk illustrates some results of our recent paper Modern Likelihood-Frequentist
Inference, which is an exposition of the development of likelihood asymptotics obtained by many
workers. The exposition is intended to be accessible to a wide audience. It does this by charting a
narrow course through the developments over more than 30 years. This covers the likelihood ratio
approximation to the distribution of the parameter estimator, approximate ancillary conditioning,
transformation to a highly accurate approximation to the likelihood ratio test, and related topics.

The talk focuses in particular on the R package likelihoodAsy,
that accompanies the paper. The package implements some tools for likelihood inference, including in particular
the Skovgaard version of Barndorff-Nielsen's modified signed likelihood ratio test statistic r^{*}.
The main point of this is to approximate the *sample space derivatives*, that held up this
theory for a decade until the Skovgaard breakthrough. The user needs only to provide R code for the likelihood
function, for generating a sample from the model, and for the formulation of the interest parameter.
The latter is allowed to be a rather general function of the model parameter, and this is the most important
feature of the package.

This is a joint work with Donald A. Pierce, Oregon State University.

Pierce, D.A. and Bellio, R. (2017). Modern likelihood-frequentist inference. *International Statistical Review*, **85**, 519-541.

**Room and Time**: Augusta, 11h-12h

## 22 Feb 2018: **Yuji Nakatsukasa**, University of Oxford

** Speaker **: Yuji Nakatsukasa, University of Oxford

** Title **: Global optimization via eigenvalues

**Abstract**: While non-convex optimization problems are generally regarded as
difficult or computationally intractable, the Courant-Fischer theorem for symmetric
eigenvalue problems suggests that each eigenpair corresponds to a stationary point
of a non-convex optimization problem (in this case the Rayleigh quotient). More
generally, matrix eigenvalue problems can be regarded as an important class of
optimization problems that can be solved efficiently. This observation suggests
conversely that perhaps a global solution for some non-convex optimization problems
could be obtained by solving an eigenvalue problem. In this work we identify such
optimization problems, and show that they lead to a variety of eigenvalue problems,
including generalized, polynomial and multiparameter eigenvalue problems, all of
which can be solved reliably and in polynomial time, and essentially non-iteratively.
Specifically, we consider non-convex QCQP (quadratically constrained quadratic
programming) QCQP is known to be NP-hard in general (when the number of constraints
k is not fixed), and for k ≥ 2 fixed, it has been an open problem whether there are
polynomial-time solvable until recently. We show that (i) for QCQP with one
constraint, k = 1, it can be reduced to essentially a single generalized eigenvalue
problem, (ii) for QCQP with two constraints, k = 2, the problem can be reduced to
a two-parameter eigenvalue problem, and (iii) unconstrained optimization problems
with cubic regularization can be reduced to quadratic eigenvalue problems.

Based on joint work with Satoru Adachi, Satoru Iwata, Shinsaku Sakaue and Akiko Takeda (University of Tokyo).

**Room and Time**: Augusta, 14h-15h

## 20 Feb 2018: **Mu Niu**, U of Plymouth + **Pokman Cheung**, Goldman Sachs

** Speakers **: Mu Niu, University of Plymouth and Pokman Cheung, Goldman Sachs

** Title **: Intrinsic Gaussian processes on complex constrained domains

**Abstract**:
We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation
and regression on manifolds with a primary focus on complex constrained domains
or irregular-shaped spaces arising as subsets or submanifolds of R^{2},
R^{3} and beyond.
For example, in-GPs can accommodate spatial domains arising as complex subsets of
Euclidean space. in-GPs respect the potentially complex boundary or interior
conditions as well as the intrinsic geometry of the spaces. The key novelty of the
proposed approach is to utilise the relationship between heat kernels and the
transition density of Brownian motion on manifolds for constructing and
approximating valid and computation- ally feasible covariance kernels.
This enables in-GPs to be practically applied in great generality, while
existing approaches for smoothing on constrained domains are limited to simple
special cases. The broad utilities of the in-GP approach is illustrated through
simulation studies and data examples.

**Room and Time**: Augusta, 14h-15h

## 09 Jan 2018: **Sebastian Stich**, EPFL

** Speaker **: Sebastian Stich, École Polytechnique Fédérale de Lausanne

** Title **: Adaptive importance sampling for convex optimization

**Abstract**: In the last years, randomized optimization algorithms, and especially
coordinate descent methods attracted more and more attention in the optimization
community. These methods often replace the efficient -- but often relatively expensive --
iterations of the classical methods with much cheaper -- but less efficient -- updates.
Consequently, such methods can be applied to problems of very big size. This application
is sometimes also theoretically justified by very attractive worst-case efficiency
guarantees.

In this talk, we will first provide a tutorial-like introduction to random decent methods and review their complexity. We will discuss variable metric or quasi-Newton type of methods [1] and accelerated schemes [2]. In the second part of the talk we further introduce the concept of (adaptive) importance sampling and present a scheme that computes the optimal sampling frequencies based on safe estimates of the gradient [3]. The proposed sampling is provably the best sampling with respect to the given bounds and always better than uniform sampling and fixed importance sampling. We conclude by discussing open problems in this field of research.

[1] Joint work with B. Gartner, C. Muller (Math. Prog. 2016)

[2] Joint work with Y. Nesterov (SIAM J. Opt. 2017)

[3] Joint work with A. Raj, M. Jaggi (NIPS 2017)

**Room and Time**: Augusta, 14h-15h

# (2017)

## 20 Dec 2017: **Nicolas Boumal**, Princeton University

** Speaker **: Nicolas Boumal, Princeton University

** Title **: Near optimal bounds for phase synchronization

**Abstract**: Phase synchronization consists in estimating n phases (unit-modulus complex numbers) given measurements of all relative phases. We assume the measurements are corrupted by i.i.d. Gaussian noise. The maximum likelihood estimator is the solution to a non-convex optimization problem, NP-hard in general. Yet, it has been observed some years ago that the maximum likelihood estimator can be computed in polynomial time through semidefinite relaxation, even in the face of tremendous amounts of noise.

In this talk, I will present proof techniques that allow to explain this observation. This is based on the generalized power method, replicas to reduce statistical dependence, and dual certificates.

Joint work with Yiqiao Zhong, Afonso Bandeira and Amit Singer.

**Room and Time**: Augusta, 11h-12h

## 20 Dec 2017: **Rajen Shah**, University of Cambridge

** Speaker **: Rajen Shah, University of Cambridge

** Title **: The xyz algorithm for fast interaction search in high-dimensional data

**Abstract**: When performing regression on a dataset with p variables, it is often of interest to go beyond using main effects and include interactions as products between individual variables. However, if the number of variables p is large, as is common in genomic datasets, the computational cost of searching through O(p^2) interactions can be prohibitive. In this talk I will introduce a new randomised algorithm called xyz that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. The underlying idea is to transform interaction search into a much simpler closest pair problem. We will see how strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(p^u) operations for 1 < u < 2 depending on their strength. An application of xyz to a genome-wide association study shows how more than 10^11 interactions can be screened in minutes using a standard laptop. This is joint work with Gian Thanei and Nicolai Meinshausen (ETH Zurich).

**Speaker Bio**: Rajen Shah is a lecturer in Statistics at the Statistical Laboratory, University of Cambridge, having completed his PhD under the supervision of Richard Samworth (also at Cambridge) in 2014. His main research interests are in high-dimensional statistics and large-scale data analysis. Rajen is currently an Associate Editor for the Journal of the Royal Statistical Society Series B, and a Turing Fellow.

**Room and Time**: Augusta, 14h-15h

## 28 Nov 2017: **Adilet Otemissov**, University of Oxford

** Speaker **: Adilet Otemissov, University of Oxford

** Title **: Dimensionality reduction techniques for global optimization.

**Abstract**: **Abstract**: (Joint work with Coralia Cartis) The problem of finding the most
extreme value of a function, also known as global optimization, is a challenging
task. The difficulty is associated with the exponential increase in the computational
time for a linear increase in the dimension. This is known as the ``curse of
dimensionality''. In this talk, we demonstrate that such challenges can be overcome
for functions with low effective dimensionality – functions which are constant along
certain linear subspaces. Such functions can often be found in applications, for
example, in hyper-parameter optimization for neural networks, heuristic algorithms
for combinatorial optimization problems and complex engineering simulations.

We propose the use of random subspace embeddings within a(ny) global minimisation algorithm, extending the approach in Wang et al. (2013). We introduce two techniques (REGO and AREGO) that transform the high-dimensional optimization problem into a low-dimensional problem. REGO is formulated by adding constraints in the low-dimensional embedded space and AREGO by including constraints in the original space. We derive probabilistic bounds on the success of the dimension-reduced problems and present encouraging numerical results.

**Speaker Bio**: Adilet is a second year DPhil Student at the Alan Turing Institute and University of Oxford. He graduated from Nazarbayev University with a BSc in Mathematics and then pursued a master's degree at the University of Manchester. His research interests include high-dimensional non-convex and global optimization.

**Room and Time**: Augusta, 13h-14h

## 21 Nov 2017: **Rodrigo Mendoza Smith**, University of Oxford

** Speaker **: Rodrigo Mendoza Smith, University of Oxford

** Title **: Embarrassingly parallel reduction of boundary matrices in Persistent Homology

**Abstract**: Topological Data Analysis is a relatively new paradigm in
data-science that models datasets as point-clouds sampled from a shape embedded
in an Euclidean space. The inference problem is to estimate the essential
topological features of the underlying shape from a point-cloud sampled from
it. This is done through a technique called persistent homology which is a
mathematical formalism based on algebraic topology. A necessary step in the
persistent homology pipeline is the reduction a so-called boundary matrix,
which is a process reminiscent to Gaussian elimination. In this talk, I present
a number of structural dependencies in boundary matrices and use them to design
a novel embarrassingly parallel algorithm for their reduction. Simulations on
synthetic examples show that the computational burden can be conducted in a
small fraction of the number of iterations needed by traditional methods. For
example, numerical experiments show that for a boundary matrix with 10^4
columns, the reduction completed to within 1% in about ten iterations as
opposed to nearly approximately eight thousand iterations for traditional
methods.

**Speaker Bio**: Rodrigo Mendoza Smith is a DPhil candidate in Applied Mathematics working with Prof. Jared Tanner. He is affiliated with the Numerical Analysis Group in Oxford and with the Alan Turing Institute in London. He is working on the design of provably fast numerical algorithms for sparse and low-rank data models. His research interests lie in numerical computing, compressed-sensing, optimisation, machine-learning, network theory and computational algebraic topology. His motivation and long term goal is to contribute to the understanding of Artificial Intelligence and its interplay with geometry, statistics and computability

**Room and Time**: Augusta, 14h-15h

## 16 Nov 2017: **Guillame Lecue**, CREST + ENSAE

** Speaker **: Guillame Lecue, Center for Research in Economics and Statistics

** Title **: Learning from MOM’s principles

**Abstract**: (Joint work with Matthieu Lerasle)
We obtain theoretical and practical performances for median of means estimators.

From a theoretical point of view, estimation and prediction error bounds achieved by the MOM estimators hold with exponentially large probability -- as in the gaussian framework with independent noise-- under only weak moments assumptions on the data and without assuming independence between the noise and the design. Moreover, MOM procedures are robust since a large part of the data may have nothing to do with the oracle we want to reconstruct. Our general risk bound is of order max(minimax rate of convergence in the i.i.d. setup, (number of outliers)/number of observations)). In particular, the number of outliers may be as large as (number of data)*(minimax rate) without affecting the statistical properties of the MOM estimator.

A regularization norm may also be used together with the MOM criterium. In that case, any norm can be used for regularization. When it has some sparsity inducing power we recover sparse rates of convergence and sparse oracle inequalities. For example, the minimax rate s log(d/s)/N of recovery of a s-sparse vector in R^d is achieved by a median-of-means version of the LASSO when the noise has q_0 moments for some q_0>2, the design matrix C_0\log(d) moments and the dataset is corrupted by s log(d/s) outliers. This result holds with exponentially large probability as if the noise and the design were i.i.d. Gaussian random variables.

On the practical side, MOM estimators (and their associated regularized versions) can easily be implemented. Actually, most gradient descent algorithms used to implement (non-robust) estimators like the LASSO can easily be transformed into a robust one by using a MOM approach.

**Room and Time**: Augusta, 15h-16h

## 14 Nov 2017: **Piotr Fryzlewicz**, London School of Economics

** Speaker **: Piotr Fryzlewicz,
London School of Economics

**Title**: Multiscale thinking in data analysis, recursive algorithms, and data-adaptive change-point detection

**Abstract**: The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints.

We then turn more specific and study the canonical problem of a-posteriori detection of multiple change-points in the mean of a piecewise-constant signal observed with noise. Even in this simple set-up, many publicly available state-of-the-art methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods "learn" from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier change-point detection algorithm, Wild Binary Segmentation, and make it data-adaptive by equipping it with a recursive mechanism for deciding "on the fly" how many sub-samples of the input data to draw, and where to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent change-points.

**Bio**: Piotr Fryzlewicz is a professor of statistics in the Department of Statistics at the London School of Economics. He has previously worked at Winton Capital Management, the University of Bristol and Imperial College London. His research interests are in multiscale modelling and estimation, time series, change-point detection, high-dimensional statistical inference and statistical learning. He currently serves as Joint Editor of the Journal of the Royal Statistical Society Series B.

**Room and Time**: Augusta, 14h-15h

## 07 Nov 2017: **Quentin Berthet**, University of Cambridge

** Speaker **: Quentin Berthet, University of Cambridge

** Title **: Bandit Optimization with Upper-Confidence Frank-Wolfe

**Abstract**: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we introduce the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We show upper bounds on the optimization error of this algorithm over various classes of functions, and discuss the optimality of these results.
This is joint work with Vianney Perchet (ENS Paris-Saclay and Criteo Research)

**Speaker Bio**: Quentin Berthet is a Turing Fellow and a Lecturer in the Statslab, in the DPMMS at the University of Cambridge, since 2015. He is a former student of the Ecole Polytechnique, received a Ph.D. from Princeton University in 2014, and was a CMI postdoctoral fellow at Caltech. His research interests are in Statistics, Optimization and Computer Science.

**Room and Time**: Augusta, 14h-15h

## 01 Nov 2017: **Yiannis Koutis**, New Jersey Institute of Technology

** Speaker **: Yiannis Koutis,
New Jersey Institute of Technology

**Title**: A survey of provably fast linear system solvers

**Abstract**: The design of fast linear system solvers has been a very active research area for decades. Solvers based on various approaches, from multigrid methods to incomplete matrix factorization techniques have been quite successful in practical engineering problems, but lack fast convergence guarantees for broader classes of unstructured linear systems that are common in modern data science problems. Solvers that are provably fast for broad classes of unstructured linear systems are a fairly recent discovery. We will survey this recent progress, and discuss key enabling facts from the theory of concentration bounds for random sums of positive matrices. We will also discuss known obstacles that stand on the way to progress for general linear systems.

**Bio**: Yiannis Koutis is an associate professor in the Ying Wu College of Computing at the New Jersey Institute of Technology. He completed his Ph.D. in Compute Science at Carnegie Mellon University, and got his undergraduate Diploma in Computer Engineering and Informatics from the University of Patras in Greece. Before NJIT, he was an associate professor at the University of Puerto Rico, Rio Piedras. In the past he has also worked as a Systems Scientist at Carnegie Mellon University.

**Room and Time**: Augusta, 15h-16h

## 31 Oct 2017: **Dino Sejdinovic**, University of Oxford

** Speaker **: Dino Sejdinovic,
University of Oxford

**Title**: Inference with Approximate Kernel Embeddings

**Abstract**: Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of the applications of the approximate kernel embeddings to Bayesian computation. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data. ArXiv:1703.07596

** Bio**: Dino Sejdinovic is an Associate Professor at the Department of Statistics, University of Oxford, a Fellow of Mansfield College, Oxford, and a Turing Fellow of the Alan Turing Institute. He previously held postdoctoral positions at the Gatsby Computational Neuroscience Unit, University College London and at the Department of Mathematics, University of Bristol. He received a PhD in Electrical and Electronic Engineering from the University of Bristol (2009). His research is at the interface between machine learning and statistical methodology, with a focus on nonparametric and kernel methods.

**Room and Time**: Augusta, 14h-15h

## 17 Oct 2017: **Varun Kanade**, University of Oxford

** Speaker **: Varun Kanade, University of Oxford

** Title **: From which world is your graph?

**Abstract**: Discovering statistical structure from links is a fundamental problem
in the analysis of social networks. Choosing a misspecified model, or
equivalently, an incorrect inference algorithm will result in an invalid
analysis or even falsely uncover patterns that are in fact artifacts of the
model. This work focuses on unifying two of the most widely used link-formation
models: the stochastic blockmodel (SBM) and the small world (or latent space)
model (SWM). Integrating techniques from kernel learning, spectral graph
theory, and nonlinear dimensionality reduction, we develop the first
statistically sound polynomial-time algorithm to discover latent patterns in
sparse graphs for both models. When the network comes from an SBM, the
algorithm outputs a block structure. When it is from an SWM, the algorithm
outputs estimates of each node's latent position. We report experiments
performed on a twitter dataset collected between October and November 2016.
Based on joint work with Cheng Li, Zhenming Liu and Felix Wong.

**Speaker Bio**: Varun Kanade is a Research Lecturer in the Department of Computer Science, Oxford University. He has been a Simons Postdoctoral Fellow at the University of California, Berkeley and an FSMP postdoctoral fellow at ENS, Paris. He obtained his Ph.D. from Harvard University in 2012. His research interests are in machine learning and theoretical computer science and focuses on mathematical foundations of machine learning, in particular computational and statistical tradeoffs and improving our understanding of methods that have been successful in practice.

**Room and Time**: Augusta, 14h-15h

## 09 Oct 2017: **Martin Lotz**, University of Manchester

**Speaker**: Martin Lotz, University of Manchester

**Title**: Randomized dimensionality reduction in persistent homology

**Abstract**: We discuss a few ideas related to dimensionality reduction at the point-cloud level
for the computation of persistent homology. In particular, it is shown that in certain cases, the ambient
dimension of the problem can literally be replaced with an intrinsic notion of dimension related to the
structure of the data, and that such a reduction can be computed efficiently. Some surprising connections
to the theory of suprema of Gaussian processes are presented, and potential extensions discussed.

**Speaker Bio**: Martin Lotz is a Lecturer in Mathematics at the University of Manchester. Previous to his
appointment at Manchester he was a post-doctoral researcher at the City University of Hong Kong, at the University
of Oxford, and as a Research Fellow at the University of Edinburgh, supported by a Leverhulme Trust and Seggie
Brown Fellowship. He studied Mathematics at the ETH (Swiss Federal Institute of Technology) in Zurich, and the
University of Paderborn, where he finished his doctorate in Algebraic Complexity Theory. His interests span a
wide range of mathematics and its applications. His current work explores connections between probability,
optimization, and applications in signal processing, data analysis and inverse problems. Previous work has been
in algebraic complexity theory, computational algebra and geometry, foundations of numerical analysis, and
combinatorics.

**Room and Time**: Augusta, 14h-15h

## 03 Oct 2017: **He Sun**, University of Edinburgh

**Speaker**: He Sun, University of Edinburgh

**Title**: Heat kernels in graphs: A journey from random walks to geometry, and back

**Abstract**: Heat kernels are one of the most fundamental concepts in physics and mathematics.
In physics, the heat kernel is a fundamental solution of the heat equation and connects the Laplacian
operator to the rate of heat dissipation. In spectral geometry, many fundamental techniques are based
on heat kernels. In finite Markov chain theory, heat kernels correspond to continuous-time random walks
and constitute one of the most powerful techniques in estimating the mixing time.

In this talk, we will briefly discuss this line of research and its relation to heat kernels in graphs. In particular, we will see how heat kernels are used to design the first nearly-linear time algorithm for finding clusters in real-world graphs. Some interesting open questions will be addressed in the end.

**Speaker Bio**: He Sun is a senior lecturer in Algorithms and Complexity in the School of
Informatics, University of Edinburgh. Prior to this, he held positions at the Max Planck Institute
for Informatics from 2010 to 2015, and the University of Bristol from 2015 to 2017. His main research
interests range over the fields of spectral graph theory, applied probability and statistics, matrix
analysis, and machine learning.

**Room and Time**: Augusta, 14h-15h

## 29 Sep 2017: **Greg Ongie**, University of Michigan

**Speaker**: Greg Ongie, University of Michigan

**Title**: Learning non-linear models with missing data

**Abstract**: Linear models have had great success for inference problems with missing data.
A key example is low-rank matrix completion, where the missing entries of a matrix are imputed by
exploiting low-dimensional linear structure. However, many real-world datasets fail to have linear
structure but still have low-dimensional non-linear structure that can be exploited. We introduce a
systematic framework for learning low-dimensional non-linear models in problems in missing data.
The approach has close connections to subspace clustering, manifold learning, and kernel methods
in machine learning. As a special case, we focus on the problem of matrix completion where we
assume data belongs to a low-dimensional algebraic variety, i.e., each data point is a solution to
a system of polynomial equations, and study the sampling requirements of this model. We propose an
efficient matrix completion algorithm that is able to recover synthetically generated data up to
the predicted sampling complexity bounds and outperforms traditional low-rank matrix completion
and subspace clustering approaches on real datasets in computer vision and genomics.

**Room and Time**: Enigma, 14h-15h

This talk is part of Greg Ongie's visit via the "Research in small groups" grant for Turing host: Armin Eftekhari.

## 19 Sep 2017: **Richard Samworth**, University of Cambridge

** Speaker **: Richard Samworth, University of Cambridge

** Title **: High dimensional change point estimation via sparse projection

**Abstract**: Change points are a very common feature of big data that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package 'InspectChangepoint'.

**Speaker Bio**: Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in nonparametric and high-dimensional statistics, including problems in shape-constrained inference, classification, variable selection and changepoint estimation, among others.

**Room and Time**: Augusta, 14h-15h.