Theory and Algorithms in Data Science

This seminar provides a forum for researchers working in foundational areas of Data Science (and related fields). All talks are held at the Alan Turing Institute.

The organizers are Mihai Cucuringu, Armin Eftekhari, Vidit Nanda and Hemant Tyagi. We represent the Low-dimensional structure as well as the Topology and geometry research interest groups within the Turing Institute. If you'd like to speak in this seminar, please email Vidit at vnanda@turing.ac.uk

Upcoming Talks

06 Mar 2018: Ruggerio Bellio, University of Udine

Ruggero

Speaker : Ruggero Bellio, University of Udine
Title : Modern Likelihood-Frequentist Inference

Abstract: This talk illustrates some results of our recent paper Modern Likelihood-Frequentist Inference, which is an exposition of the development of likelihood asymptotics obtained by many workers. The exposition is intended to be accessible to a wide audience. It does this by charting a narrow course through the developments over more than 30 years. This covers the likelihood ratio approximation to the distribution of the parameter estimator, approximate ancillary conditioning, transformation to a highly accurate approximation to the likelihood ratio test, and related topics.

The talk focuses in particular on the R package likelihoodAsy, that accompanies the paper. The package implements some tools for likelihood inference, including in particular the Skovgaard version of Barndorff-Nielsen's modified signed likelihood ratio test statistic r*. The main point of this is to approximate the sample space derivatives, that held up this theory for a decade until the Skovgaard breakthrough. The user needs only to provide R code for the likelihood function, for generating a sample from the model, and for the formulation of the interest parameter. The latter is allowed to be a rather general function of the model parameter, and this is the most important feature of the package.

This is a joint work with Donald A. Pierce, Oregon State University.
Pierce, D.A. and Bellio, R. (2017). Modern likelihood-frequentist inference. International Statistical Review, 85, 519-541.

Room and Time: Augusta, 11h-12h

06 Mar 2018: Coralia Cartis, University of Oxford

Cora

Speaker : Coralia Cartis, University of Oxford
Title : TBA

Abstract: TBA

Room and Time: Augusta, 12h-13h

20 Mar 2018: Jacek Brodzki, University of Southampton

Jacek

Speaker : Jacek Brodzki, University of Southampton
Title : TBA

Abstract: TBA

27 Mar 2018: Wei Dai, Imperial College London

Wei

Speaker : Wei Dai, Imperial College London
Title : TBA

Abstract: TBA

Room and Time: Augusta, 14h-15h

27 Mar 2018: Pranay Sheshadri, University of Cambridge

Pranay

Speaker : Pranay Sheshadri, University of Cambridge
Title : TBA

Abstract: TBA

Room and Time: Augusta, 15h-16h

03 Apr 2018: François-Xavier Briol, University of Warwick

Francois

Speaker : François-Xavier Briol, University of Warwick
Title : TBA

Abstract: TBA

27 Apr 2018: Guy Nason, University of Bristol

Francois

Speaker : Guy Nason, University of Bristol
Title : TBA

Abstract: TBA

Room and Time: Augusta, 14h-15h

Past Talks

22 Feb 2018: Yuji Nakatsukasa, University of Oxford

yuji

Speaker : Yuji Nakatsukasa, University of Oxford
Title : Global optimization via eigenvalues

Abstract: While non-convex optimization problems are generally regarded as difficult or computationally intractable, the Courant-Fischer theorem for symmetric eigenvalue problems suggests that each eigenpair corresponds to a stationary point of a non-convex optimization problem (in this case the Rayleigh quotient). More generally, matrix eigenvalue problems can be regarded as an important class of optimization problems that can be solved efficiently. This observation suggests conversely that perhaps a global solution for some non-convex optimization problems could be obtained by solving an eigenvalue problem. In this work we identify such optimization problems, and show that they lead to a variety of eigenvalue problems, including generalized, polynomial and multiparameter eigenvalue problems, all of which can be solved reliably and in polynomial time, and essentially non-iteratively. Specifically, we consider non-convex QCQP (quadratically constrained quadratic programming) QCQP is known to be NP-hard in general (when the number of constraints k is not fixed), and for k ≥ 2 fixed, it has been an open problem whether there are polynomial-time solvable until recently. We show that (i) for QCQP with one constraint, k = 1, it can be reduced to essentially a single generalized eigenvalue problem, (ii) for QCQP with two constraints, k = 2, the problem can be reduced to a two-parameter eigenvalue problem, and (iii) unconstrained optimization problems with cubic regularization can be reduced to quadratic eigenvalue problems.

Based on joint work with Satoru Adachi, Satoru Iwata, Shinsaku Sakaue and Akiko Takeda (University of Tokyo).

Room and Time: Augusta, 14h-15h

20 Feb 2018: Mu Niu, U of Plymouth + Pokman Cheung, Goldman Sachs

Speakers : Mu Niu, University of Plymouth and Pokman Cheung, Goldman Sachs
Title : Intrinsic Gaussian processes on complex constrained domains

Abstract: We propose a class of intrinsic Gaussian processes (in-GPs) for interpolation and regression on manifolds with a primary focus on complex constrained domains or irregular-shaped spaces arising as subsets or submanifolds of R2, R3 and beyond. For example, in-GPs can accommodate spatial domains arising as complex subsets of Euclidean space. in-GPs respect the potentially complex boundary or interior conditions as well as the intrinsic geometry of the spaces. The key novelty of the proposed approach is to utilise the relationship between heat kernels and the transition density of Brownian motion on manifolds for constructing and approximating valid and computation- ally feasible covariance kernels. This enables in-GPs to be practically applied in great generality, while existing approaches for smoothing on constrained domains are limited to simple special cases. The broad utilities of the in-GP approach is illustrated through simulation studies and data examples.

Room and Time: Augusta, 14h-15h

09 Jan 2018: Sebastian Stich, EPFL

Sebastian

Speaker : Sebastian Stich, École Polytechnique Fédérale de Lausanne
Title : Adaptive importance sampling for convex optimization

Abstract: In the last years, randomized optimization algorithms, and especially coordinate descent methods attracted more and more attention in the optimization community. These methods often replace the efficient -- but often relatively expensive -- iterations of the classical methods with much cheaper -- but less efficient -- updates. Consequently, such methods can be applied to problems of very big size. This application is sometimes also theoretically justified by very attractive worst-case efficiency guarantees.

In this talk, we will first provide a tutorial-like introduction to random decent methods and review their complexity. We will discuss variable metric or quasi-Newton type of methods [1] and accelerated schemes [2]. In the second part of the talk we further introduce the concept of (adaptive) importance sampling and present a scheme that computes the optimal sampling frequencies based on safe estimates of the gradient [3]. The proposed sampling is provably the best sampling with respect to the given bounds and always better than uniform sampling and fixed importance sampling. We conclude by discussing open problems in this field of research.

[1] Joint work with B. Gartner, C. Muller (Math. Prog. 2016)
[2] Joint work with Y. Nesterov (SIAM J. Opt. 2017)
[3] Joint work with A. Raj, M. Jaggi (NIPS 2017)

Room and Time: Augusta, 14h-15h

(2017)

20 Dec 2017: Nicolas Boumal, Princeton University

Nicolas

Speaker : Nicolas Boumal, Princeton University
Title : Near optimal bounds for phase synchronization

Abstract: Phase synchronization consists in estimating n phases (unit-modulus complex numbers) given measurements of all relative phases. We assume the measurements are corrupted by i.i.d. Gaussian noise. The maximum likelihood estimator is the solution to a non-convex optimization problem, NP-hard in general. Yet, it has been observed some years ago that the maximum likelihood estimator can be computed in polynomial time through semidefinite relaxation, even in the face of tremendous amounts of noise.

In this talk, I will present proof techniques that allow to explain this observation. This is based on the generalized power method, replicas to reduce statistical dependence, and dual certificates.

Joint work with Yiqiao Zhong, Afonso Bandeira and Amit Singer.

Room and Time: Augusta, 11h-12h

20 Dec 2017: Rajen Shah, University of Cambridge

Quentin

Speaker : Rajen Shah, University of Cambridge
Title : The xyz algorithm for fast interaction search in high-dimensional data

Abstract: When performing regression on a dataset with p variables, it is often of interest to go beyond using main effects and include interactions as products between individual variables. However, if the number of variables p is large, as is common in genomic datasets, the computational cost of searching through O(p^2) interactions can be prohibitive. In this talk I will introduce a new randomised algorithm called xyz that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in p. The underlying idea is to transform interaction search into a much simpler closest pair problem. We will see how strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires O(p^u) operations for 1 < u < 2 depending on their strength. An application of xyz to a genome-wide association study shows how more than 10^11 interactions can be screened in minutes using a standard laptop. This is joint work with Gian Thanei and Nicolai Meinshausen (ETH Zurich).

Speaker Bio: Rajen Shah is a lecturer in Statistics at the Statistical Laboratory, University of Cambridge, having completed his PhD under the supervision of Richard Samworth (also at Cambridge) in 2014. His main research interests are in high-dimensional statistics and large-scale data analysis. Rajen is currently an Associate Editor for the Journal of the Royal Statistical Society Series B, and a Turing Fellow.

Room and Time: Augusta, 14h-15h

28 Nov 2017: Adilet Otemissov, University of Oxford

Adilet

Speaker : Adilet Otemissov, University of Oxford
Title : Dimensionality reduction techniques for global optimization.

Abstract: Abstract: (Joint work with Coralia Cartis) The problem of finding the most extreme value of a function, also known as global optimization, is a challenging task. The difficulty is associated with the exponential increase in the computational time for a linear increase in the dimension. This is known as the ``curse of dimensionality''. In this talk, we demonstrate that such challenges can be overcome for functions with low effective dimensionality – functions which are constant along certain linear subspaces. Such functions can often be found in applications, for example, in hyper-parameter optimization for neural networks, heuristic algorithms for combinatorial optimization problems and complex engineering simulations.

We propose the use of random subspace embeddings within a(ny) global minimisation algorithm, extending the approach in Wang et al. (2013). We introduce two techniques (REGO and AREGO) that transform the high-dimensional optimization problem into a low-dimensional problem. REGO is formulated by adding constraints in the low-dimensional embedded space and AREGO by including constraints in the original space. We derive probabilistic bounds on the success of the dimension-reduced problems and present encouraging numerical results.

Speaker Bio: Adilet is a second year DPhil Student at the Alan Turing Institute and University of Oxford. He graduated from Nazarbayev University with a BSc in Mathematics and then pursued a master's degree at the University of Manchester. His research interests include high-dimensional non-convex and global optimization.

Room and Time: Augusta, 13h-14h

21 Nov 2017: Rodrigo Mendoza Smith, University of Oxford

Rodrigo

Speaker : Rodrigo Mendoza Smith, University of Oxford
Title : Embarrassingly parallel reduction of boundary matrices in Persistent Homology

Abstract: Topological Data Analysis is a relatively new paradigm in data-science that models datasets as point-clouds sampled from a shape embedded in an Euclidean space. The inference problem is to estimate the essential topological features of the underlying shape from a point-cloud sampled from it. This is done through a technique called persistent homology which is a mathematical formalism based on algebraic topology. A necessary step in the persistent homology pipeline is the reduction a so-called boundary matrix, which is a process reminiscent to Gaussian elimination. In this talk, I present a number of structural dependencies in boundary matrices and use them to design a novel embarrassingly parallel algorithm for their reduction. Simulations on synthetic examples show that the computational burden can be conducted in a small fraction of the number of iterations needed by traditional methods. For example, numerical experiments show that for a boundary matrix with 10^4 columns, the reduction completed to within 1% in about ten iterations as opposed to nearly approximately eight thousand iterations for traditional methods.

Speaker Bio: Rodrigo Mendoza Smith is a DPhil candidate in Applied Mathematics working with Prof. Jared Tanner. He is affiliated with the Numerical Analysis Group in Oxford and with the Alan Turing Institute in London. He is working on the design of provably fast numerical algorithms for sparse and low-rank data models. His research interests lie in numerical computing, compressed-sensing, optimisation, machine-learning, network theory and computational algebraic topology. His motivation and long term goal is to contribute to the understanding of Artificial Intelligence and its interplay with geometry, statistics and computability

Room and Time: Augusta, 14h-15h

16 Nov 2017: Guillame Lecue, CREST + ENSAE

Guillame

Speaker : Guillame Lecue, Center for Research in Economics and Statistics
Title : Learning from MOM’s principles

Abstract: (Joint work with Matthieu Lerasle) We obtain theoretical and practical performances for median of means estimators.

From a theoretical point of view, estimation and prediction error bounds achieved by the MOM estimators hold with exponentially large probability -- as in the gaussian framework with independent noise-- under only weak moments assumptions on the data and without assuming independence between the noise and the design. Moreover, MOM procedures are robust since a large part of the data may have nothing to do with the oracle we want to reconstruct. Our general risk bound is of order max(minimax rate of convergence in the i.i.d. setup, (number of outliers)/number of observations)). In particular, the number of outliers may be as large as (number of data)*(minimax rate) without affecting the statistical properties of the MOM estimator.

A regularization norm may also be used together with the MOM criterium. In that case, any norm can be used for regularization. When it has some sparsity inducing power we recover sparse rates of convergence and sparse oracle inequalities. For example, the minimax rate s log(d/s)/N of recovery of a s-sparse vector in R^d is achieved by a median-of-means version of the LASSO when the noise has q_0 moments for some q_0>2, the design matrix C_0\log(d) moments and the dataset is corrupted by s log(d/s) outliers. This result holds with exponentially large probability as if the noise and the design were i.i.d. Gaussian random variables.

On the practical side, MOM estimators (and their associated regularized versions) can easily be implemented. Actually, most gradient descent algorithms used to implement (non-robust) estimators like the LASSO can easily be transformed into a robust one by using a MOM approach.

Room and Time: Augusta, 15h-16h

14 Nov 2017: Piotr Fryzlewicz, London School of Economics

Piotyr

Speaker : Piotr Fryzlewicz, London School of Economics

Title: Multiscale thinking in data analysis, recursive algorithms, and data-adaptive change-point detection

Abstract: The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints.

We then turn more specific and study the canonical problem of a-posteriori detection of multiple change-points in the mean of a piecewise-constant signal observed with noise. Even in this simple set-up, many publicly available state-of-the-art methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods "learn" from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier change-point detection algorithm, Wild Binary Segmentation, and make it data-adaptive by equipping it with a recursive mechanism for deciding "on the fly" how many sub-samples of the input data to draw, and where to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent change-points.

Bio: Piotr Fryzlewicz is a professor of statistics in the Department of Statistics at the London School of Economics. He has previously worked at Winton Capital Management, the University of Bristol and Imperial College London. His research interests are in multiscale modelling and estimation, time series, change-point detection, high-dimensional statistical inference and statistical learning. He currently serves as Joint Editor of the Journal of the Royal Statistical Society Series B.

Room and Time: Augusta, 14h-15h

07 Nov 2017: Quentin Berthet, University of Cambridge

Quentin

Speaker : Quentin Berthet, University of Cambridge
Title : Bandit Optimization with Upper-Confidence Frank-Wolfe

Abstract: We consider the problem of bandit optimization, inspired by stochastic optimization and online learning problems with bandit feedback. In this problem, the objective is to minimize a global loss function of all the actions, not necessarily a cumulative loss. This framework allows us to study a very general class of problems, with applications in statistics, machine learning, and other fields. To solve this problem, we introduce the Upper-Confidence Frank-Wolfe algorithm, inspired by techniques for bandits and convex optimization. We show upper bounds on the optimization error of this algorithm over various classes of functions, and discuss the optimality of these results. This is joint work with Vianney Perchet (ENS Paris-Saclay and Criteo Research)

Speaker Bio: Quentin Berthet is a Turing Fellow and a Lecturer in the Statslab, in the DPMMS at the University of Cambridge, since 2015. He is a former student of the Ecole Polytechnique, received a Ph.D. from Princeton University in 2014, and was a CMI postdoctoral fellow at Caltech. His research interests are in Statistics, Optimization and Computer Science.

Room and Time: Augusta, 14h-15h

01 Nov 2017: Yiannis Koutis, New Jersey Institute of Technology

Yiannis

Speaker : Yiannis Koutis, New Jersey Institute of Technology
Title: A survey of provably fast linear system solvers

Abstract: The design of fast linear system solvers has been a very active research area for decades. Solvers based on various approaches, from multigrid methods to incomplete matrix factorization techniques have been quite successful in practical engineering problems, but lack fast convergence guarantees for broader classes of unstructured linear systems that are common in modern data science problems. Solvers that are provably fast for broad classes of unstructured linear systems are a fairly recent discovery. We will survey this recent progress, and discuss key enabling facts from the theory of concentration bounds for random sums of positive matrices. We will also discuss known obstacles that stand on the way to progress for general linear systems.

Bio: Yiannis Koutis is an associate professor in the Ying Wu College of Computing at the New Jersey Institute of Technology. He completed his Ph.D. in Compute Science at Carnegie Mellon University, and got his undergraduate Diploma in Computer Engineering and Informatics from the University of Patras in Greece. Before NJIT, he was an associate professor at the University of Puerto Rico, Rio Piedras. In the past he has also worked as a Systems Scientist at Carnegie Mellon University.

Room and Time: Augusta, 15h-16h

31 Oct 2017: Dino Sejdinovic, University of Oxford

Dino

Speaker : Dino Sejdinovic, University of Oxford
Title: Inference with Approximate Kernel Embeddings

Abstract: Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of the applications of the approximate kernel embeddings to Bayesian computation. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data. ArXiv:1703.07596

Bio: Dino Sejdinovic is an Associate Professor at the Department of Statistics, University of Oxford, a Fellow of Mansfield College, Oxford, and a Turing Fellow of the Alan Turing Institute. He previously held postdoctoral positions at the Gatsby Computational Neuroscience Unit, University College London and at the Department of Mathematics, University of Bristol. He received a PhD in Electrical and Electronic Engineering from the University of Bristol (2009). His research is at the interface between machine learning and statistical methodology, with a focus on nonparametric and kernel methods.

Room and Time: Augusta, 14h-15h

17 Oct 2017: Varun Kanade, University of Oxford

Varun

Speaker : Varun Kanade, University of Oxford
Title : From which world is your graph?

Abstract: Discovering statistical structure from links is a fundamental problem in the analysis of social networks. Choosing a misspecified model, or equivalently, an incorrect inference algorithm will result in an invalid analysis or even falsely uncover patterns that are in fact artifacts of the model. This work focuses on unifying two of the most widely used link-formation models: the stochastic blockmodel (SBM) and the small world (or latent space) model (SWM). Integrating techniques from kernel learning, spectral graph theory, and nonlinear dimensionality reduction, we develop the first statistically sound polynomial-time algorithm to discover latent patterns in sparse graphs for both models. When the network comes from an SBM, the algorithm outputs a block structure. When it is from an SWM, the algorithm outputs estimates of each node's latent position. We report experiments performed on a twitter dataset collected between October and November 2016. Based on joint work with Cheng Li, Zhenming Liu and Felix Wong.

Speaker Bio: Varun Kanade is a Research Lecturer in the Department of Computer Science, Oxford University. He has been a Simons Postdoctoral Fellow at the University of California, Berkeley and an FSMP postdoctoral fellow at ENS, Paris. He obtained his Ph.D. from Harvard University in 2012. His research interests are in machine learning and theoretical computer science and focuses on mathematical foundations of machine learning, in particular computational and statistical tradeoffs and improving our understanding of methods that have been successful in practice.

Room and Time: Augusta, 14h-15h

09 Oct 2017: Martin Lotz, University of Manchester

Martin

Speaker: Martin Lotz, University of Manchester
Title: Randomized dimensionality reduction in persistent homology

Abstract: We discuss a few ideas related to dimensionality reduction at the point-cloud level for the computation of persistent homology. In particular, it is shown that in certain cases, the ambient dimension of the problem can literally be replaced with an intrinsic notion of dimension related to the structure of the data, and that such a reduction can be computed efficiently. Some surprising connections to the theory of suprema of Gaussian processes are presented, and potential extensions discussed.

Speaker Bio: Martin Lotz is a Lecturer in Mathematics at the University of Manchester. Previous to his appointment at Manchester he was a post-doctoral researcher at the City University of Hong Kong, at the University of Oxford, and as a Research Fellow at the University of Edinburgh, supported by a Leverhulme Trust and Seggie Brown Fellowship. He studied Mathematics at the ETH (Swiss Federal Institute of Technology) in Zurich, and the University of Paderborn, where he finished his doctorate in Algebraic Complexity Theory. His interests span a wide range of mathematics and its applications. His current work explores connections between probability, optimization, and applications in signal processing, data analysis and inverse problems. Previous work has been in algebraic complexity theory, computational algebra and geometry, foundations of numerical analysis, and combinatorics.

Room and Time: Augusta, 14h-15h

03 Oct 2017: He Sun, University of Edinburgh

He Speaker: He Sun, University of Edinburgh
Title: Heat kernels in graphs: A journey from random walks to geometry, and back

Abstract: Heat kernels are one of the most fundamental concepts in physics and mathematics. In physics, the heat kernel is a fundamental solution of the heat equation and connects the Laplacian operator to the rate of heat dissipation. In spectral geometry, many fundamental techniques are based on heat kernels. In finite Markov chain theory, heat kernels correspond to continuous-time random walks and constitute one of the most powerful techniques in estimating the mixing time.

In this talk, we will briefly discuss this line of research and its relation to heat kernels in graphs. In particular, we will see how heat kernels are used to design the first nearly-linear time algorithm for finding clusters in real-world graphs. Some interesting open questions will be addressed in the end.

Speaker Bio: He Sun is a senior lecturer in Algorithms and Complexity in the School of Informatics, University of Edinburgh. Prior to this, he held positions at the Max Planck Institute for Informatics from 2010 to 2015, and the University of Bristol from 2015 to 2017. His main research interests range over the fields of spectral graph theory, applied probability and statistics, matrix analysis, and machine learning.

Room and Time: Augusta, 14h-15h

29 Sep 2017: Greg Ongie, University of Michigan

Greg Speaker: Greg Ongie, University of Michigan
Title: Learning non-linear models with missing data

Abstract: Linear models have had great success for inference problems with missing data. A key example is low-rank matrix completion, where the missing entries of a matrix are imputed by exploiting low-dimensional linear structure. However, many real-world datasets fail to have linear structure but still have low-dimensional non-linear structure that can be exploited. We introduce a systematic framework for learning low-dimensional non-linear models in problems in missing data. The approach has close connections to subspace clustering, manifold learning, and kernel methods in machine learning. As a special case, we focus on the problem of matrix completion where we assume data belongs to a low-dimensional algebraic variety, i.e., each data point is a solution to a system of polynomial equations, and study the sampling requirements of this model. We propose an efficient matrix completion algorithm that is able to recover synthetically generated data up to the predicted sampling complexity bounds and outperforms traditional low-rank matrix completion and subspace clustering approaches on real datasets in computer vision and genomics.

Room and Time: Enigma, 14h-15h

This talk is part of Greg Ongie's visit via the "Research in small groups" grant for Turing host: Armin Eftekhari.

19 Sep 2017: Richard Samworth, University of Cambridge

Quentin

Speaker : Richard Samworth, University of Cambridge
Title : High dimensional change point estimation via sparse projection

Abstract: Change points are a very common feature of big data that arrive in the form of a data stream. We study high dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the co-ordinates. The challenge is to borrow strength across the co-ordinates to detect smaller changes than could be observed in any individual component series. We propose a two-stage procedure called 'inspect' for estimation of the change points: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimization problem derived from the cumulative sum transformation of the time series. We then apply an existing univariate change point estimation algorithm to the projected series. Our theory provides strong guarantees on both the number of estimated change points and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data-generating mechanisms. Software implementing the methodology is available in the R package 'InspectChangepoint'.

Speaker Bio: Richard Samworth obtained his PhD in Statistics from the University of Cambridge in 2004, and has remained in Cambridge since, becoming a professor in 2013 and the Professor of Statistical Science in 2017. His main research interests are in nonparametric and high-dimensional statistics, including problems in shape-constrained inference, classification, variable selection and changepoint estimation, among others.

Room and Time: Augusta, 14h-15h.