Computation and Control

Convergence of Stochastic Gradient Algorithms

November 27, 2011 in algorithms, Mathematics, Optimization, Probability and Statistics | Leave a comment

I use stochastic gradient and stochastic approximation algorithms regularly, and I often find it surprisingly difficult to locate basic results on the convergence of these algorithms, despite multiple books written on the subject. For example, consider a convex function ${f: \mathbb R^n \rightarrow \mathbb R}$ . Denote the set of minima of ${f}$ by ${X^* := \arg \min_{x \in \mathbb R^n} \{f(x)\}}$ . If ${X^*}$ is not empty, then we know that it is convex set, but we do not know a priori if ${X^*}$ consists of a single point. Denote the subdifferential of ${f}$ at ${x}$ , i.e., the set of subgradients of ${f}$ at ${x}$ , by ${\partial f(x)}$ . A stochastic subgradient algorithm is an algorithm of the form

$\displaystyle x_0 \in \mathbb R^n, \;\; x_{k+1} = x_k - \gamma_k d_{k+1}, k \geq 0, \ \ \ \ \ (1)$

with ${g_{k+1} := E[d_{k+1} | \mathcal F_k] \in \partial f(x_k)}$ , for ${\mathcal F_k = \sigma(x_0; d_l, 1 \leq l \leq k)}$ , and ${\{\gamma_k\}}$ a sequence of nonnegative stepsizes. We can rewrite ${d_k = g_k + \epsilon_k}$ , for ${K \geq 1}$ , where ${\{\epsilon\}_{k \geq 1}}$ is a martingale difference sequence with respect to ${\{\mathcal F_k\}_{k \geq 0}}$ , i.e., ${E[\epsilon_k | \mathcal F_{k-1}] = 0}$ , a.s., for all ${k \geq 1}$ . We have the following theorem.

Theorem 1 Suppose that the set of minima of ${X^*}$ of ${f}$ is non empty, and that the stochastic subgradients satisfy

$\displaystyle \sup_{k \geq 0} E[\|d_{k+1}\|^2|\mathcal F_k] < K, \;\; \text{for some } K < \infty.$

Moreover, assume that ${\sum_{k=0}^\infty \gamma_k = + \infty}$ , ${\sum_{k=0}^\infty \gamma^2_k < + \infty}$ . Then the sequence of iterates ${\{x_k\}_{k \geq 0}}$ following (1) converges almost surely to some minimum ${x^* \in X^*}$ .

In various papers and books and books on stochastic approximations, e.g. [1-4], one can find very general theorems on the convergence of algorithms such as (1), including various additional error terms. Still, trying to apply these general theorems to the simple situation considered here, you might find that you need to make additional assumptions, for example that ${f}$ is continuously differentiable, that ${X^*}$ is a single point, and/or that you know somehow a priori that the sequence ${\{x_k\}_{k \geq 0}}$ is bounded almost surely. The assumptions on ${f}$ and ${X^*}$ might not hold here (consider for example ${f(x) = \max\{-x,0,x-1\}}$ ), and the extra work required to prove that ${x_k}$ is bounded is essentially of the same order as proving the theorem from scratch in this case. This is not to say that Theorem 1 cannot be found in the literature. For example, the proof below adapts one found for a somewhat more specific context in [5]. But in my opinion a good book on stochastic approximation algorithms should reference such a basic and widely applicable theorem more clearly. The proof presented here uses the following useful version of the supermartingale convergence theorem, see [6].

Theorem 2 (Supermartingale Convergence Theorem) Let ${Y_k, X_k, Z_k, k = 0,1,\ldots}$ be three sequences of random variables and let ${\{\mathcal F_k\}_{k \geq 0}}$ be a filtration (i.e., sigma algebras such that ${\mathcal F_k \subset \mathcal F_{k+1}}$ ). Suppose that

The random variables ${Y_k, X_k, Z_k}$ are nonnegative and ${\mathcal F_k}$ -measurable.

For each ${k}$ , we have ${E[Y_{k+1}|\mathcal F_k] \leq Y_k - X_k + Z_k}$ .

There holds ${\sum_{k = 0}^\infty Z_k < \infty}$ .

Then, we have ${\sum_{k=0}^\infty X_k < \infty}$ , and the sequence ${Y_k}$ converges to a nonnegative random variable ${Y}$ , with probability ${1}$ .

Proof: (of the stochastic subgradient convergence theorem) For ${y \in \mathbb R^n}$ and ${g_{k+1} \in \partial f(x_k)}$ , we have by definition of a subgradient

$\displaystyle f(y) \geq f(x_k) + g_{k+1}^T (y-x_k).$

Hence

$\displaystyle \begin{array}{rcl} E[\|x_{k+1}-y\|^2 | \mathcal F_k] &=& E[\|x_k - \gamma_k^T d_{k+1} - y\|^2] \\ &=& E[\|x_k-y\|^2 | \mathcal F_k] - 2 \gamma_k (x_k-y)^T E[d_{k+1} | \mathcal F_k] \\ &+& \gamma_k^2 E[\|d_{k+1}\|^2 | \mathcal F_k] \\ & \leq & E[\|x_k-y\|^2 | \mathcal F_k] - 2 \gamma_k (f(x_k)-f(y)) + \gamma_k^2 K. \end{array}$

This inequality is fundamental to study the progress made by many subgradient algorithms. We use it first by taking ${y = \bar x^*}$ , for some ${\bar x^* \in X^*}$ , to get

$\displaystyle E[\|x_{k+1}-\bar x^*\|^2 | \mathcal F_k] \leq E[\|x_{k+1}-\bar x^*\|^2 | \mathcal F_k] - 2 \gamma_k (f(x_k)-f(\bar x^*)) + \gamma_k^2 K.$

Now by the supermartingale convergence theorem, we deduce that almost surely, ${\sum_{k=0}^\infty \gamma_k (f(x_k)-f(\bar x^*)) < \infty}$ and the sequence ${\{\|x_k - \bar x^*\|\}_k}$ converges almost surely. In particular, ${\{x_k\}_k}$ is bounded almost surely.

The first conclusion implies immediately that

$\displaystyle \lim_{k \rightarrow \infty} f(x_k) = f^* = \min_{x \in \mathbb R^n} f(x),$

and it only remains to show that ${\{x_k\}_k}$ truly converges to one of the minima, rather than just to the set ${X^*}$ . For this, take a countable set of points ${\{x_i^*\}_{i \in I}}$ in ${X^*}$ , dense in ${X^*}$ . For each such point ${x_i^*}$ , we have as above that ${\{\|x_k - x_i^*\|\}_k}$ converges. Hence with probability ${1}$ , all sequences ${\{\|x_k-x^*_i\|\}_k}$ for ${i \in I}$ converge. Since the sequence ${\{x_k\}_k}$ is bounded, it has at least one limit point. This limit point must be unique in view of the convergence of all the sequences ${\{\|x_k-x^*_i\|\}_k}$ . Hence ${\{x_k\}_k}$ converges. $\Box$

[1] A. Benveniste, M. Métivier, P. Priouret, Adaptive Algorithms and Stochastic Approximations. Springer, 1990.

[2] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications. Springer, 2nd Edition, 2003.

[3] J. C. Spall, Introduction to Stochastic Search and Optimization. Wiley-Interscience, 2003.

[4] V. S. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, 2008.

[5] D. P. Bertsekas and A. Nedic and A. E. Ozdaglar, Convex Analysis and Optimization. Athena Scientific, 2003.

[6] J. Neveu, Discete Parameter Martingales, North-Holland, 1975.

Installing the GLFW package for Haskell

November 6, 2011 in Programming, Software | Tags: Haskell, programming, Software | Leave a comment

Cabal cannot install the GLFW package version 0.4.2 for GHC 7.0.3, at least on Mac OS X 10.6. 64-bit. This seems to be a well-know bug but I couldn’t find a solution online. To fix the problem, you can install the GLFW library directly, then do

sudo cabal install glfw -fdynamic

The “dynamic” flag avoids that cabal tries to recompile its version of GLFW.

DARPA META Program

August 9, 2010 in Cyber-Physical Systems, opinion, System Engineering | Tags: component-based design, cyber-physical systems, systems engineering | Leave a comment

DARPA has an interesting program called META that looks at approaches for component based design of complex engineering systems, especially aerospace systems. The idea is interesting and there is definitely already a large and growing number of papers being published at various real-time systems and control conferences on this systems engineering aspect. Yet I must say that I haven’t found a good paper on this subject for systems that involve any kind of analog component. Aviation Week has an article on the topic today that mentions the analogy with hierarchical and component-based automated design of integrated circuits. However, it is well known that these approaches work well with digital circuits but that as soon as analog signals enter the picture, automated design becomes much more complicated. I do not think we currently have proper abstractions necessary for component-based analog or mixed-signal system design. Moreover, these abstractions are likely to be more complex to develop than thinking of a transistor as a switch for digital design. For example, analog component-based design will have to trade-off performance for robustness of the components and convince the system designer that the overall system performance is still acceptable in interesting cases.

Update 11/02/10: Aviation week has another interesting article on systems engineering and integration, which mentions the META program.

Bonmin and Ipopt Compilation on Mac OS 10.6

June 29, 2010 in Optimization, Software | Tags: optimization, Software | Leave a comment

Bonmin is a solver for mixed-integer nonlinear programs. It is a global solver if the continuous relaxation is a convex program, and my hope is that it will perform better than the naive branch-and-bound I recently had to implement for a project (written in Python with cvxopt for quick development reasons, its performance was about the same as Yalmip bnb).

I struggled quite a bit yesterday to compile the library on my Mac running Snow Leopard though. Here is the final configuration I used, and so far that seems to be working (at least make test succeeded):

./configure FFLAGS="-arch x86_64" ADD_CXXFLAGS="-mmacosx-version-min=10.4" \
ADD_CFLAGS="-mmacosx-version-min=10.4" ADD_FFLAGS="-mmacosx-version-min=10.4" \
LDFLAGS="-flat_namespace" --with-blas='-framework vecLib' \
--with-lapack='-framework vecLib'

While I’m at it, if you want to compile Ipopt, the coin-or interior point optimizer on which Bonmin relies, then the following simpler configuration should do it:

./configure FFLAGS="-arch x86_64" --with-blas='-framework vecLib' \
--with-lapack='-framework vecLib'

ADS-B Equipment Requirements Issued

May 28, 2010 in air traffic control, astronautics, opinion | Tags: air traffic control, NextGen | Leave a comment

Automatic Dependent Surveillance-Broadcast (ADS-B), a satellite based tracking system for aircraft (much like a GPS), is supposed to replace radars in NextGen, the Next Generation Air Transportation System (the current plan is to maintain radars as a backup I believe). Yesterday, the FAA issued performance requirements for ADS-B, which most planes must have by… 2020! It’s a fact that pretty much everything related to the overhaul of the national airspace system seems painfully slow [1].

ADS-B is an important technological component of NextGen, as it will allow more people, including the pilots, to have better information about the current state of the system. In fact, commentaries often equate ADS-B with NextGen, although according to the FAA, NextGen is just an “umbrella term for the ongoing, wide-ranging transformation of the National Airspace System (NAS)”. The ultimate goal of NextGen is of course not simply track aircraft better. Rather, it should improve safety, reduce air traffic congestion and hence result is lower fuel consumed and shorter flight times, etc. To control the National Airspace System, it is clearly beneficial to have better sensors such as the ADS-B system. Ultimately however, this will not be sufficient to improve the performance of the system: better large-scale decision and control algorithms that use this information will need to be accepted and applied by the air traffic controllers and airlines.

[1] J.R. Wilson, “NextGen: A Slow Transformation”, Aerospace America Magazine, May 2010.

CPS Week

April 30, 2010 in conferences, Cyber-Physical Systems, opinion | Tags: cyber-physical systems | Leave a comment

I’m coming back from CPS Week, which I attended for the first time. The nice part was that the conference took place at KTH in Stockholm. The not so nice part was that I got stuck (like everybody else there) on my way back due to the volcanic eruption in Iceland. Fortunately I only wanted to reach Paris, not the US. That took me 3 days rather than 3 hours however.

The conference was fun. For those who wonder, CPS stands for “Cyber-Physical System”. Granted, that might not be the best name one could come up with. I actually thought that “cyber-” had disappeared in the 90s, but apparently it has come back. Pretty much every plenary speaker struggled to offer his definition of what a CPS is, but none was very convincing.

So let me maybe say here what CPS are in my opinion. It turns out that many researchers working in different areas, such as real-time systems, control systems, or sensor networks, face the same kind of problems and work on the same systems, but used to go to different conferences and publish in different journals. The goal of CPS week is to change that and foster collaboration, which I think is a good idea. But you need a new word to be able to bring these people together. Clearly, you cannot suddenly tell researchers in real-time systems that now they are doing control theory, and vice-versa. That’s why you can pretty much put whatever you want in your definition of CPS, depending on your background.

Python Control Libraries

December 31, 2009 in algorithms, control, Software | Tags: control systems, Software | 2 comments

The dominant software for control system design currently is clearly MATLAB. It has a nice Control System Toolbox, a Model Predictive Control Toolbox, a Robust Control Toolbox, and various other related toolboxes, such as Optimization and Signal Processing. Simulink is very useful for system design, and can be coupled to Stateflow to analyse hybrid systems.

Nevertheless, the main drawback of Matlab is that is it not free, and there has has been a desire to develop more widely accessible alternatives. The first that come to mind are Scilab and Octave. However, despite their qualities these packages have suffered from an insufficient number of users for their usage to become more widespread. For example, Scilab was created in 2003 by INRIA (the French national institute for research in computer science and control) but the overwhelming majority of the Scilab consortium still consists of French research institutes and companies. I believe that one important reason why the situation is unlikely to change is that MATLAB is universally used in the classroom to teach control systems design, in part because it is expected that students entering the industry know this language. Still, this is not a very good explanation: Scilab and Octave are purposely using a syntax very similar to MATLAB.

Anyway, another issue with MATLAB is that it is yet another language to learn, and new features in the language tend to be introduced slowly. For example, object-oriented design has been added only recently. Instead, there is a strong trend currently in using Python for scientific computing, and several scientific communities are enthusiastically developing in Python. Some examples of impressive packages include Numpy and Scipy, Matplotlib for visualization, and Sage, which is a wrapper for a large number of mathematical libraries. One can find an increasing number of libraries for signal processing, machine learning, computer vision, optimization, and so on. With regard to control systems libraries however, there seems to be much less activity. Here are some related links that I know of:

Python control systems, with only one developper, Rafael G. Martins.
Richard Murray’s page on a control systems library for Python
Stephen Boyd and Lieven Vandenberghe are now mostly developing optimization libraries, but a few applications specific to control systems can be found on their pages.
PyDSTool: not exactly a control systems library, but a simulation, modeling and analysis package for dynamical systems. It is related, but not a port, to the DsTool software package developed some time ago for analyzing dynamical systems. PyDSTool seems to support hybrid systems.
Simpy: a discrete-event simulation language.
Pyro: a Python robotics library.

Please let me know of any other Python library related to control systems.

Stochastic Control Course

November 9, 2009 in Teaching | Leave a comment

As I said in my previous post, I’m teaching a course in stochastic control at Penn this trimester. We have started a blog where the students are posting about their class project topics. The blog can be found here.

Updates?

October 6, 2009 in opinion | Leave a comment

I know I haven’t posted anything on this blog for ages, but the past few months have been pretty hectic. Among other things, I’ll be on the job market this year, and I’m teaching a course on discrete-time stochastic control at Penn this term. All this is fun but time consuming. Maybe I’ll have more time for this blog after the HSCC submission deadline…

Manned and Unmanned Flight News

August 27, 2009 in astronautics, unmanned vehicle systems | Tags: astronautics, unmanned vehicle systems | Leave a comment

There was two interesting pieces of aero/astro related news in this week’s IEEE Spectrum. First, the announcement by Russia of a new manned space program, whose goal is to send humans to Mars. More than the talks about international cooperation mentioned in the article, I think (unfortunately perhaps) it’s only international competition and the development of new space programs in Russia, China, India, etc., that will settle the current relatively sterile debates around Nasa’s own programs, in particular going to Mars or back to the Moon.

And a funny new UAV, developped by a German team. The applications would be the usual ones, surveillance and emergency communications. But what’s more interesting is that there is absolutely no pilot in the loop here controlling the aircraft remotely. So this leaves us with a lot of opportunities to improve autonomy technology and safe integration into the airspace.

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Computation and Control

Convergence of Stochastic Gradient Algorithms

Installing the GLFW package for Haskell

DARPA META Program

Bonmin and Ipopt Compilation on Mac OS 10.6

ADS-B Equipment Requirements Issued

CPS Week

Python Control Libraries

Stochastic Control Course

Updates?

Manned and Unmanned Flight News

Categories

Blogroll

Control Theory and Optimization

Mathematics

Pages

Archives

Meta