Lecture 11: Numerical optimization
You will learn to solve non-convex, multi-dimensional, dynamic optimization problems using numerical optimization with multistart and nesting (scipy.optimize). You will learn simple function approximation using linear interpolation (scipy.interp).
Useful note: Unconstrained Numerical Optimization: An Introduction for Econometricians (by Anders Munk-Nielsen)
from types import SimpleNamespace
import time
import numpy as np
import scipy as sp
from scipy import linalg
from scipy import optimize
from scipy import interpolate
import sympy as sm
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
All optimization problems are characterized by:
- Control vector (choices/decisions),
- Objective function (payoff/utility) to minimize, (differentiable or not)
- Constraints, i.e. (linear or non-linear interdependence)
Note that might also take other inputs (parameters or a dataset). But these are fixed and therefore not variables we optimize over.
Maximization is just minimization of .
All optimizers (minimizers) follow the structure:
- Make initial guess
- Evaluate the function (and perhaps gradients)
- Check for convergence
- Update guess and return to step 2
What is convergence? "Small" change in function value since last iteration (or "zero" gradient).
Characteristics of optimizers:
- Use gradients or not.
- Allow for specifying bounds.
- Allow for specifying general constraints.
Gradients provide useful information, but can be costly to compute (using analytical formula or numerically).
Penalty terms can (sometimes) instead be used to enforce bounds and constraints.
Optimizers it is good to know:
- Nelder-Mead:
- Pro: Robust (to e.g. noise in objective function) and does not require derivatives.
- Con: Slow convergence. No bounds or constraints.
- Newton-CG:
- Pro: Require few iterations. Very precise with analytical hessian for smooth functions.
- Con: Costly computation of hessian. No bounds or constraints.
- BFGS: (like newton, but with smart computation of hessian)
- Pro: Require few function evaluations.
- Con: No bounds or constraints.
L-BFGS-B: Like BFGS, but allows for bounds.
SLSQP:
- Pro: Bounds and constraints in multiple dimensions.
- Con: Not as efficient as BFGS.
1.1 Gradient based optimizers
Let us look at the idea behind gradient based optimizers.
One dimensional intuition: Consider the second-order Taylor approximation around :
Find the minimum wrt. to by solving the FOC:
Algorithm: minimize_newton()
- Choose tolerance , guess on , compute , and set .
- Compute (gradient/jacobian) and (hessian).
Compute new guess
If then stop.
- Set and return to step 2.
def minimize_newton(f,x0,jac,hess,max_iter=500,tol=1e-8):
""" minimize function with Newtons' algorithm
Args:
f (callable): function
x0 (np.ndarray): initial values
jac (callable): jacobian
hess (callable): hessian
max_iter (int): maximum number of iterations
tol (float): tolerance
Returns:
x (np.ndarray): minimum
nit (int): number of iterations used
"""
# step 1: initialize
x = x0
fx = f(x0)
nit = 1
# step 2-5: iteration
while nit < max_iter:
x_prev = x
fx_prev = fx
# step 2: evaluate gradient and hessian
jacx = jac(x_prev)
hessx = hess(x_prev)
# step 3: update x
inv_hessx = linalg.inv(hessx)
x = x_prev - inv_hessx@jacx
# step 4: check convergence
fx = f(x)
if abs(fx-fx_prev) < tol:
break
# step 5: increment n
nit += 1
return x,nit
Algorithm: minimize_gradient_descent()
- Choose tolerance , potential step sizes, , guess on , compute , and set .
- Compute .
Find good step size:
Compute new guess:
If then stop.
- Set and return to step 2.
def minimize_gradient_descent(f,x0,jac,alphas=[0.01,0.05,0.1,0.25,0.5,1],max_iter=500,tol=1e-8):
""" minimize function with gradient descent
Args:
f (callable): function
x0 (np.ndarray): initial values
jac (callable): jacobian
alpha (list): potential step sizes
max_iter (int): maximum number of iterations
tol (float): tolerance
Returns:
x (np.ndarray): minimum
nit (int): number of iterations used
nfev (int): number of function evaluations used
njev (int): number of jacobian evaluations used
"""
# step 1: initialize
x = x0
fx = f(x0)
nit = 1
nfev = 1
njev = 0
# step 2-6: iteration
while nit < max_iter:
x_prev = x
fx_prev = fx
# step 2: evaluate gradient
jacx = jac(x)
njev += 1
# step 3: find good step size (line search)
fx_ast = np.inf
x_ast = np.nan
alpha_ast = np.nan
for alpha in alphas:
x = x_prev - alpha*jacx
fx = f(x)
nfev += 1
if fx < fx_ast:
alpha_ast = alpha
x_ast = x
fx_ast = fx
# step 4: update guess
x = x_ast # = x_prev - alpha_ast*jacx
# step 5: check convergence
fx = fx_ast # = f(x)
if abs(fx-fx_prev) < tol:
break
# d. update i
nit += 1
return x,nit,nfev,njev
Many generalizations:
- Use both Hessian and line search
- Stop line search when improvement found
- Limit attention to a "trust-region"
etc. etc. etc. etc.
Consider the rosenbrock function:
with jacobian (gradient)
and hessian:
Global minimum is at where .
Check jacobian and hessian:
x1 = sm.symbols('x_1')
x2 = sm.symbols('x_2')
f = 0.5*(1.0-x1)**2 + (x2-x1**2)**2
Df = sm.Matrix([sm.diff(f,i) for i in [x1,x2]])
Df
Hf = sm.Matrix([[sm.diff(f,i,j) for j in [x1,x2]] for i in [x1,x2]])
Hf
Implementation:
def _rosen(x1,x2):
return 0.5*(1.0-x1)**2+(x2-x1**2)**2
def rosen(x):
return _rosen(x[0],x[1])
def rosen_jac(x):
return np.array([-(1.0-x[0])-4*x[0]*(x[1]-x[0]**2),2*(x[1]-x[0]**2)])
def rosen_hess(x):
return np.array([[1-4*x[1]+12*x[0]**2,-4*x[0]],[-4*x[0],2]])
3D Plot:
# a. grids
x1_vec = np.linspace(-2,2,500)
x2_vec = np.linspace(-2,2,500)
x1_grid,x2_grid = np.meshgrid(x1_vec,x2_vec,indexing='ij')
rosen_grid = _rosen(x1_grid,x2_grid)
# b. main
fig = plt.figure()
ax = fig.add_subplot(1,1,1,projection='3d')
cs = ax.plot_surface(x1_grid,x2_grid,rosen_grid,cmap=cm.jet)
# c. add labels
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.set_zlabel('$u$')
# d. invert xaxis
ax.invert_xaxis()
# e. add colorbar
fig.colorbar(cs);
C:\Users\gmf123\AppData\Local\Temp\ipykernel_12928\936720320.py:21: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
fig.colorbar(cs);
Contour plot:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
levels = [1e-6,5*1e-6,1e-5,5*1e-5,1e-4,5*1e-4,1e-3,5*1e-3,1e-2,5*1e-2,1,2,4,6,8,12,16,20]
cs = ax.contour(x1_grid,x2_grid,rosen_grid,levels=levels,cmap=cm.jet)
fig.colorbar(cs);
Newton:
x0 = np.array([5,4])
x,nit = minimize_newton(rosen,x0,rosen_jac,rosen_hess)
print(nit,x,rosen(x))
6 [1. 1.] 0.0
Gradient descent:
x0 = np.array([5,4])
x,nit,nfev,njev = minimize_gradient_descent(rosen,x0,rosen_jac,alphas=[0.01,0.05,0.1,0.25,0.5,1])
print(nit,nfev,njev,x,rosen(x))
173 1039 173 [1.00020519 1.00053964] 3.7750814497569406e-08
Task 1: Can you improve the algorithm by playing around with the alphas
?
Task 2: What happens if the search for alpha_ast
terminates when a improvement in the function value has been found?
Task 3: Could you implement the algorithm without knowing the jacobian?
2.1 Scipy minimizers
Preperation I: Function for collecting infomation while running optimizing:
# complicated -> not necessary to understand it
def collect(x):
# globals used to keep track across iterations
global nit # set nit = 0 before calling optimizer
global x0
global x1s
global x2s
global fs
# a. initialize list
if nit == 0:
x1s = [x0[0]]
x2s = [x0[1]]
fs = [rosen(x0)]
# b. append trial values
x1s.append(x[0])
x2s.append(x[1])
fs.append(rosen(x))
# c. increment number of evaluations
nit += 1
Preperation II: Function plotting the collected information:
# complicated -> not necessary to understand it
def contour():
global nit
global x1s
global x2s
global fs
# a. contour plot
fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(1,2,1)
levels = [1e-6,5*1e-6,1e-5,5*1e-5,1e-4,5*1e-4,1e-3,5*1e-3,1e-2,5*1e-2,1,2,4,6,8,12,16,20]
cs = ax.contour(x1_grid,x2_grid,rosen_grid,levels=levels,cmap=cm.jet)
fig.colorbar(cs)
ax.plot(x1s,x2s,'-o',ms=4,color='black')
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
# b. function value
ax = fig.add_subplot(1,2,2)
ax.plot(np.arange(nit+1),fs,'-o',ms=4,color='black')
ax.set_xlabel('iteration')
ax.set_ylabel('function value')
Nelder-Mead
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,
method='Nelder-Mead',
callback=collect, # call function collect() before each iteration
options={'disp':True}) # display the results
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 57
Function evaluations: 105
Note: Does not require a gradient. Slow convergence close to target.
Iterations: How many steps the algorithm has taken.
Function evaluations: Will be higher than iterations. Used to compute next step.
We can also print the information on results:
print(result)
final_simplex: (array([[0.9999932 , 0.99998507],
[1.00000809, 1.00003102],
[1.00003176, 1.00007406]]), array([2.48761443e-11, 2.53004854e-10, 6.15284951e-10]))
fun: 2.4876144250933352e-11
message: 'Optimization terminated successfully.'
nfev: 105
nit: 57
status: 0
success: True
x: array([0.9999932 , 0.99998507])
We can also acess specific information of the result object:
result.nit
57
Newton (with analytical hessian)
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,jac=rosen_jac,hess=rosen_hess,
method='Newton-CG',
callback=collect,
options={'disp':True})
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 11
Function evaluations: 12
Gradient evaluations: 12
Hessian evaluations: 11
Note: Smoother and faster.
Newton (with numerical hessian computed by scipy)
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,jac=rosen_jac,
method='Newton-CG',
callback=collect,
options={'disp':True})
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 11
Function evaluations: 12
Gradient evaluations: 36
Hessian evaluations: 0
Note: Same as above, but gradient evaluations instead of hessian evaluations.
BFGS (with analytical gradient)
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,jac=rosen_jac,
method='BFGS',
callback=collect,
options={'disp':True})
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 13
Function evaluations: 14
Gradient evaluations: 14
Note: Non-smooth, but fast. Very low number of function evaluations.
BFGS (with numerical gradient computed by scipy)
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0, # no jac= specified
method='BFGS',
callback=collect,
options={'disp':True})
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 13
Function evaluations: 42
Gradient evaluations: 14
Note: Same as above, but more function evaluations.
L-BFGS-B (with analytical gradient)
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,jac=rosen_jac,
method='L-BFGS-B',
bounds=((-3,3),(-3,3)),
callback=collect,
options={'disp':True})
contour()
SLSQP
nit = 0 # global used in "collect"
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,jac=rosen_jac,
method='SLSQP',
bounds=((-2,2),(-2,2)),
callback=collect,
options={'disp':True})
contour()
Optimization terminated successfully (Exit mode 0)
Current function value: 4.7296908855910356e-09
Iterations: 10
Function evaluations: 13
Gradient evaluations: 10
2.2 Controling the optimizers
Note: See the settings for each optimizer in the documention.
We can lower the tolerance:
nit = 0
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,
method='BFGS',
callback=collect,
options={'disp':True,'gtol':1e-8}) # note this
contour()
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 14
Function evaluations: 45
Gradient evaluations: 15
We can change the maximum number of iterations:
nit = 0
x0 = [-1.5,-1]
result = optimize.minimize(rosen,x0,
method='BFGS',
callback=collect,
options={'disp':True,'maxiter':5}) # note this and warning
contour()
Warning: Maximum number of iterations has been exceeded.
Current function value: 0.486266
Iterations: 5
Function evaluations: 18
Gradient evaluations: 6
Question: Can we make the program stop if the optimizer is not sure it has found a maximum?
The global minimum of this function is . But the function also have a lot of local minima. How to avoid these?
def griewank(x):
return griewank_(x[0],x[1])
def griewank_(x1,x2):
A = x1**2/4000 + x2**2/4000
B = np.cos(x1/np.sqrt(1))*np.cos(x2/np.sqrt(1))
return A-B+1
Test global minimum:
griewank(np.zeros(2))
0.0
3.1 3D plot
for bound in [600,5,1]:
print(f'In [{-bound},{bound}] x [{-bound},{bound}]:')
# a. grids
x1_vec = np.linspace(-bound,bound,1000)
x2_vec = np.linspace(-bound,bound,1000)
x1_grid_griewank,x2_grid_griewank = np.meshgrid(x1_vec,x2_vec,indexing='ij')
griewank_grid = griewank_(x1_grid_griewank,x2_grid_griewank)
# b. main
fig = plt.figure(figsize=(12,4))
# 3D
ax = fig.add_subplot(1,2,1,projection='3d')
cs = ax.plot_surface(x1_grid_griewank,x2_grid_griewank,griewank_grid,cmap=cm.jet)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
ax.invert_xaxis()
# contour
ax = fig.add_subplot(1,2,2)
cs = ax.contour(x1_vec,x2_vec,griewank_grid,levels=15,cmap=cm.jet)
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$')
plt.show()
In [-600,600] x [-600,600]:
In [-5,5] x [-5,5]:
In [-1,1] x [-1,1]:
3.2 Multi-start
Multi-start: Draw many random starting values:
np.random.seed(1986)
x0s = -600 + 1200*np.random.uniform(size=(5000,2)) # in [-600,600]
xs = np.empty((5000,2))
fs = np.empty(5000)
print(f'min(x0s) = {np.min(x0s):.2f}, max(x0s) = {np.max(x0s):.2f}')
min(x0s) = -599.82, max(x0s) = 599.99
Try to solve with BFGS starting from each of these:
fopt = np.inf
xopt = np.nan
for i,x0 in enumerate(x0s):
# a. optimize
result = optimize.minimize(griewank,x0,method='BFGS',tol=1e-8)
xs[i,:] = result.x
f = result.fun
# b. print first 10 or if better than seen yet
if i < 10 or f < fopt: # plot 10 first or if improving
if f < fopt:
fopt = f
xopt = xs[i,:]
print(f'{i:4d}: x0 = ({x0[0]:7.2f},{x0[1]:7.2f})',end='')
print(f' -> converged at ({xs[i][0]:7.2f},{xs[i][1]:7.2f}) with f = {f:12.8f}')
# best solution
print(f'\nbest solution:\n x = ({xopt[0]:7.2f},{xopt[1]:7.2f}) -> f = {fopt:12.8f}')
0: x0 = ( 82.65,-507.19) -> converged at ( 81.64,-508.68) with f = 66.38903667
1: x0 = ( 130.18, 477.00) -> converged at ( 131.88, 477.28) with f = 61.32846302
2: x0 = ( 53.89, 243.27) -> converged at ( 53.38, 241.78) with f = 15.33462113
3: x0 = (-136.64, 181.99) -> converged at (-138.16, 182.12) with f = 13.07067667
4: x0 = ( 228.06, 262.36) -> converged at ( 229.22, 260.62) with f = 30.13156447
5: x0 = ( 228.22, 368.17) -> converged at ( 229.22, 367.38) with f = 46.90141331
6: x0 = (-259.50, 309.21) -> converged at (-141.30, 279.46) with f = 24.52846525
7: x0 = (-230.91, -74.51) -> converged at (-232.36, -75.36) with f = 14.92523628
8: x0 = ( 63.24, -80.57) -> converged at ( 62.80, -81.64) with f = 2.65359622
9: x0 = ( 93.14, 470.65) -> converged at ( 94.20, 471.00) with f = 57.70816887
25: x0 = ( 56.57,-134.77) -> converged at ( 53.38, -3.14) with f = 0.71518870
57: x0 = ( -28.14, -4.62) -> converged at ( -28.26, -3.14) with f = 0.20222578
297: x0 = ( -3.72, -4.04) -> converged at ( -3.14, -3.14) with f = 0.00493234
2302: x0 = ( 0.28, -2.06) -> converged at ( -0.00, -0.00) with f = 0.00000000
best solution:
x = ( -0.00, -0.00) -> f = 0.00000000
The solver, wrongly, converges to many of the local minima:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(xs[:,0],xs[:,1])
ax.set_xlabel('$x_1$')
ax.set_ylabel('$x_2$');
fig = plt.figure()
ax = fig.add_subplot(1,2,1)
ax.hist(xs[:,0],bins=50)
ax.set_xlabel('$x_1$');
ax = fig.add_subplot(1,2,2)
ax.hist(xs[:,1],bins=50)
ax.set_xlabel('$x_2$');
3.3 Is there a better solution than multi-start?
In short: No.
Potential improvement: Use information from previous run to determine, where to look next. Fundamental trade-off between:
- Exploitation. Focus on areas where previous evaluations returned low function values.
- Exploration. Focus on completely new areas.
Heuristic: If the same optimum is obtained for many starting values, this is a good sign for it being the global optimum.
Further discussion: Benchmarking Global Optimizers (code)
4.1 In general
Consider the constrained problem:
subject to
Define objective and constraints:
def _objective(x1,x2,x3,x4):
return x1*x4*(x1+x2+x3)+x3
def objective(x):
return _objective(x[0],x[1],x[2],x[3])
def ineq_constraint(x):
return x[0]*x[1]*x[2]*x[3]-25.0 # violated if negative
def eq_constraint(x):
return 40.0 - np.sum(x**2) # must equal zero
# a. setup
bound = (1.0,5.0)
bounds = (bound, bound, bound, bound)
ineq_con = {'type': 'ineq', 'fun': ineq_constraint}
eq_con = {'type': 'eq', 'fun': eq_constraint}
# b. call optimizer
x0 = (40**(1/8),40**(1/8),40**(1/8),40**(1/8)) # fit the equality constraint
result = optimize.minimize(objective,x0,
method='SLSQP',
bounds=bounds,
constraints=[ineq_con,eq_con],
options={'disp':True})
print('\nx = ',result.x)
Optimization terminated successfully (Exit mode 0)
Current function value: 17.014017289044375
Iterations: 9
Function evaluations: 46
Gradient evaluations: 9
x = [1. 4.74299968 3.82114992 1.3794083 ]
Alternative: Extend the objective function with a penalty term, where guesses outside the allowed bounds and constraints are projected into the allowed region, but a (large) penalty term is added to discourage this. Solve this problem with an unconstrained solver.
4.2 Economic application
Consider the following consumption-saving problem:
where
- is cash-on-hand in period
- is consumption
- is end-of-period assets and income in period
- is income in period
- is the discount factor
- is the interest rate
- is the CRRA coefficient
- is the strength of the bequest motive
- is the degree of luxuriousness in the bequest motive
- is a no-borrowing constraint.
First order conditions:
- Period 1: If then .
- Period 2: If then .
- Period 3: If then .
Guide to solve such problem:
- Setup parameters
- Formulate objective function
- Determine how to handle constraints
- Call optimizer
Parameters:
par = SimpleNamespace()
par.a0 = 0.5
par.beta = 0.94
par.r = 0.04
par.rho = 8
par.kappa = 0.5
par.nu = 0.1
par.T = 3
par.y = np.arange(1,par.T+1) # = [1,2,3]
Objetive function:
def evaluate(c,par,penalty_factor=10_000):
""" evaluate model and calculate utility and penalty if constraints are not satisfies """
# a. allocate
a = np.zeros(par.T) # end-of-period assets
m = np.zeros(par.T) # cash-on-hand
cb = np.zeros(par.T) # bounded consumption, defined below
# b. bound consumption and penalty
penalty = 0.0
for t in range(par.T): # period-by-period
# i. lagged assets
a_lag = a[t-1] if t > 0 else par.a0
# ii. cash-on-hand
m[t] = (1+par.r)*a_lag + par.y[t]
# ii. bound consumption
if c[t] < 0.0: # too low
cb[t] = 0.0
penalty += penalty_factor*np.abs(c[t]-0.0)
elif c[t] > m[t]: # too high
cb[t] = m[t]
penalty += penalty_factor*np.abs(c[t]-m[t])
else: # just fine
cb[t] = c[t]
# d. end-of-period assets
a[t] = m[t] - cb[t]
# c. utility
total_utility = 0.0
# i. consumption
for t in range(par.T):
discounting = par.beta**t
per_period_utility = cb[t]**(1-par.rho)/(1-par.rho)
total_utility += discounting*per_period_utility
# ii. bequest
discounting = par.beta**(par.T-1)
bequest_utility = par.nu*(a[-1]+par.kappa)**(1-par.rho)/(1-par.rho)
total_utility += discounting*bequest_utility
# d. return
return total_utility,penalty,m,a
def obj(c,par,penalty_factor=10_000):
""" gateway to evaluate() for optimizer """
utility,penalty,_m,_a = evaluate(c,par,penalty_factor)
return -utility + penalty
Solve:
def solve(par,tol=1e-8):
# a. initial geuss
x0 = par.a0/par.T*np.ones(par.T)
# b. solve
t0 = time.time()
results = optimize.minimize(obj,x0,args=(par,),method='Nelder-Mead',options={'xatol':tol,'fatol':tol,'maxiter':50_000})
if not results.success:
print(results)
raise ValueError
print(f'solved model in {time.time()-t0:.3f} secs [{results.nit} iterations, {results.nfev} function evaluations]\n')
# c. details
c = results.x
total_utility,penalty,m,a = evaluate(c,par)
assert np.isclose(penalty,0.0)
print(f't = 0: a = {par.a0:.4f}')
for t in range(par.T):
print(f't = {t+1:2d}: y = {par.y[t]:7.4f}, m = {m[t]:7.4f}, c = {c[t]:7.4f}, a = {a[t]:7.4f} ')
print(f'\ntotal utility = {total_utility:.8f} [penalty = {penalty:.4f}]\n')
for t in range(par.T):
if t < par.T-1:
foc_error = c[t]**(-par.rho) - par.beta*(1+par.r)*c[t+1]**(-par.rho)
else:
foc_error = c[t]**(-par.rho) - par.nu*(a[t]+par.kappa)**(-par.rho)
print(f'FOC error in period {t+1:2d}: {foc_error:12.8f}')
solve(par)
solved model in 0.027 secs [306 iterations, 561 function evaluations]
t = 0: a = 0.5000
t = 1: y = 1.0000, m = 1.5200, c = 1.5200, a = 0.0000
t = 2: y = 2.0000, m = 2.0000, c = 2.0000, a = 0.0000
t = 3: y = 3.0000, m = 3.0000, c = 2.0001, a = 0.9999
total utility = -0.01039479 [penalty = 0.0000]
FOC error in period 1: 0.03127676
FOC error in period 2: 0.00008879
FOC error in period 3: 0.00000134
What happens if the income path is reversed?
par.y = list(reversed(par.y))
solve(par)
solved model in 0.017 secs [193 iterations, 354 function evaluations]
t = 0: a = 0.5000
t = 1: y = 3.0000, m = 3.5200, c = 1.9145, a = 1.6055
t = 2: y = 2.0000, m = 3.6698, c = 1.9090, a = 1.7607
t = 3: y = 1.0000, m = 2.8312, c = 1.9036, a = 0.9275
total utility = -0.00540710 [penalty = 0.0000]
FOC error in period 1: 0.00000000
FOC error in period 2: -0.00000000
FOC error in period 3: 0.00000000
Question: Could we easily extend the problem to more periods?
Follow-up question: What is the problem for ?
Central limit of problem type considered: No uncertainty/risk.
Intermezzo: To consider dynamic optimization problems, we need to think about interpolation.
Inputs:
- Sorted vector of known points (grid vector),
- Vector of known values (at these points),
- A new point,
x
Algorithm: linear_interpolate()
1. Determine i
such that
- Compute interpolated value by
Extrapolation:
- Below where :
- Above where :
def linear_interpolate(G,F,x):
""" linear interpolation (and extrapolation)
Args:
G (np.ndarray): known points
F (np.ndarray): known values
x (float): point to be interpolated
Returns:
y (float): intepolated value
"""
assert len(G) == len(F)
n = len(G)
# a. find index in known points
if x < G[1]: # exprapolation below
i = 0
elif x > G[-2]: # extrapolation above
i = n-2
else: # true interpolation
# search
i = 0
while x >= G[i+1] and i < n-1:
i += 1
assert x >= G[i]
assert x < G[i+1]
# b. interpolate
diff_G = G[i+1]-G[i]
diff_F = F[i+1]-F[i]
slope = diff_F/diff_G
y = F[i] + slope*(x-G[i])
return y
5.1 Example
Consider the following function and known points:
f = lambda x: (x-3)**3 - 3*x**2 + 5*x
G = np.linspace(-5,10,6)
F = f(G)
Simple test:
for x in [-2.3,4.1,7.5,9.1]:
true = f(x)
y = linear_interpolate(G,F,x)
print(f'x = {x:4.1f} -> true = {true:6.1f}, interpolated = {y:6.1f}')
x = -2.3 -> true = -176.2, interpolated = -193.5
x = 4.1 -> true = -28.6, interpolated = -27.7
x = 7.5 -> true = -40.1, interpolated = -24.5
x = 9.1 -> true = 24.1, interpolated = 50.7
Scipy.interpolate: Use the RegularGridInterpolator
# a. construct interpolation function
interp_func = interpolate.RegularGridInterpolator([G],F,
bounds_error=False,
fill_value=None)
# bounds_error=False and fill_value=None allow for extrapolation
# b. interpolate
grid = np.linspace(-7,12,500)
interp_values = interp_func(grid)
# c. evaluate true values
true_values = f(grid)
# d. plot true and interpolated values
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.plot(G,F,'o',label='known points')
ax.plot(grid,true_values,'-',lw=1,label='true function')
ax.plot(grid,interp_values,'-',lw=1,label='interpolated values')
ax.legend(loc='lower right',facecolor='white',frameon=True);
Task: Increase the number of known points and see what happens.
Note:
- Linear interpolation works best when the function does not curve too much.
- Extrapolation is much worse than interpolation.
Multiple dimensions: Same principle, interpolate.RegularGridInterpolator([G1,G2,G3],F)
.
The following subject is hard. But also extremely useful. If you master this, you can solve (almost) all economic models you meet on your way in life.
6.1 Problem formulation
Consider a household living in two periods.
In the second period it gets utility from consuming and leaving a bequest (warm glow),
where
- is cash-on-hand
- is consumption
- is end-of-period assets
- is the risk aversion coefficient
- is the strength of the bequest motive
- is the degree of luxuriousness in the bequest motive
- ensures the household cannot die in debt
The value function measures the household's value of having at the beginning of period 2.
def utility(c,par):
return c**(1-par.rho)/(1-par.rho)
def bequest(m,c,par):
return par.nu*(m-c+par.kappa)**(1-par.rho)/(1-par.rho)
def v_last_period(c,m,par):
return utility(c,par) + bequest(m,c,par)
In the first period, the household gets utility from consuming and takes into account that it will also live in the next-period, where it receives a stochastic income,
where
- is cash-on-hand in period 1
- is consumption in period 1
- is end-of-period assets in period 1
- is the discount factor
- is the expectation operator conditional on information in period 1
- is income in period 2
- is the level of income risk (mean-preserving if )
- is the interest rate
- ensures the household cannot borrow
def v(c,m,par,v_plus_interp):
# a. expected value
v_plus = 0.0
for p,y in [(par.p,1+par.Delta),((1.0-par.p,1-par.Delta))]:
# i. next period cash-on-hand
a = m-c
m_plus = (1+par.r)*a + y
# ii. next-period values
v_plus_now = v_plus_interp([m_plus])[0]
# iii. probability weighted sum
v_plus += p*v_plus_now
# b. total value
return utility(c,par) + par.beta*v_plus
Our goal: Find the value functions, and , and associated consumption functions, and on some grid for .
6.2 Solve household problem
Choose parameters:
par = SimpleNamespace()
# preferences
par.rho = 8.0
par.nu = 0.1
par.kappa = 0.5
par.beta = 0.94
# return and income
par.r = 0.04
par.p = 0.5
par.Delta = 0.5
# grid
par.Nm = 500 # number of grid points for m
par.m_min = 1e-4 # minimum value for m
par.m_max = 5.0 # maximum value for m
Solve second period:
def solve_last_period(par):
# a. allocate
m_grid = np.linspace(par.m_min,par.m_max,par.Nm)
v_func = np.empty(par.Nm)
c_func = np.empty(par.Nm)
# b. solve
for i,m in enumerate(m_grid):
# i. objective
obj = lambda x: -v_last_period(x[0],m,par)
# ii. optimizer
x0 = m/2 # initial value
result = optimize.minimize(obj,[x0],method='L-BFGS-B',bounds=((1e-8,m),))
# iii. save
v_func[i] = -result.fun
c_func[i] = result.x
return m_grid,v_func,c_func
Solve:
m2_grid,v2_func,c2_func = solve_last_period(par)
# illustration
fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(1,2,1)
ax.plot(m2_grid,c2_func)
ax.set_xlabel('$m_2$')
ax.set_ylabel('$c_2$')
ax.set_title('consumption function in period 2')
ax = fig.add_subplot(1,2,2)
ax.plot(m2_grid,v2_func)
ax.set_xlabel('$m_2$')
ax.set_ylabel('$v_2$')
ax.set_title('value function in period 2')
ax.set_ylim([-40,1]);
Note: We now solve for the consumption function, rather than a specific optimum.
Question: Why is there a kink in the consumption function?
Construct interpolator:
v2_func_interp = interpolate.RegularGridInterpolator([m2_grid],v2_func,bounds_error=False,fill_value=None)
Solve first period:
def solve_single_period(par,v_plus_interp):
# a. allocate
m_grid = np.linspace(par.m_min,par.m_max,par.Nm)
v_func = np.empty(par.Nm)
c_func = np.empty(par.Nm)
# b. solve
for i,m in enumerate(m_grid):
# i. objective
obj = lambda x: -v(x[0],m,par,v_plus_interp)
# ii. solve
x0 = m/2 # initial guess
result = optimize.minimize(obj,[x0],method='L-BFGS-B',bounds=((1e-8,m),))
# iv. save
v_func[i] = -result.fun
c_func[i] = result.x[0]
return m_grid,v_func,c_func
Solve:
m1_grid,v1_func,c1_func = solve_single_period(par,v2_func_interp)
# illustrate
fig = plt.figure(figsize=(10,4))
ax = fig.add_subplot(1,2,1)
ax.plot(m1_grid,c1_func)
ax.set_xlabel('$m_1$')
ax.set_ylabel('$c_1$')
ax.set_title('consumption function in period 1')
ax = fig.add_subplot(1,2,2)
ax.plot(m1_grid,v1_func)
ax.set_xlabel('$m_1$')
ax.set_ylabel('$c_1$')
ax.set_title('value function in period 1')
ax.set_ylim([-40,1]);
Summary: We can summarize what we have done in a single function doing:
- Solve period 2 (i.e. find og )
- Construct interpolator of
- Solve period 1 (i.e. find og )
def solve(par):
# a. solve period 2
m2_grid,v2_func,c2_func = solve_last_period(par)
# b. construct interpolator
v2_func_interp = interpolate.RegularGridInterpolator([m2_grid], v2_func,
bounds_error=False,fill_value=None)
# b. solve period 1
m1_grid,v1_func,c1_func = solve_single_period(par,v2_func_interp)
return m1_grid,c1_func,m2_grid,c2_func
Plot consumption function for various level of income risk, i.e varios
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
_Delta = par.Delta
for Delta in [0.25,0.50,0.75]:
par.Delta = Delta
m1_grid,c1_func,m2_grid,c2_func = solve(par)
ax.plot(m1_grid,c1_func,label=f'$\Delta = {Delta}$')
par.Delta = _Delta # reset
ax.legend(loc='lower right',facecolor='white',frameon=True)
ax.set_xlabel('$m_1$')
ax.set_ylabel('$c_1$')
ax.set_title('value function in period 1')
ax.set_xlim([0,2])
ax.set_ylim([0,1.5]);
Main takeaway: The household lowers its consumption when risk increases (such as in a recession). This is called precautionary saving.
Task: Discuss how to extend the consumer problem in some way or another.
6.3 Increasing the number of periods
def solve_many_periods(par):
t0 = time.time()
# a. allocate
m = np.zeros((par.T,par.Nm))
v = np.zeros((par.T,par.Nm))
c = np.zeros((par.T,par.Nm))
# b. iterate
for t in reversed(range(par.T)):
t0_ = time.time()
if t == par.T-1:
m[t,:],v[t,:],c[t,:] = solve_last_period(par)
else:
# i. construct interpolator
v_plus_interp = interpolate.RegularGridInterpolator([m[t+1,:]],v[t+1,:],
bounds_error=False,fill_value=None)
# ii. solve period
m[t,:],v[t,:],c[t,:] = solve_single_period(par,v_plus_interp)
print(f'period {t} solved in {time.time()-t0_:5.1f} secs')
print(f'model solved in {time.time()-t0:5.1f} secs')
return m,c
Solve:
par.T = 10
m_grids,c_funcs = solve_many_periods(par)
period 9 solved in 1.2 secs
period 8 solved in 7.6 secs
period 7 solved in 7.9 secs
period 6 solved in 5.5 secs
period 5 solved in 4.7 secs
period 4 solved in 4.0 secs
period 3 solved in 2.7 secs
period 2 solved in 2.5 secs
period 1 solved in 2.3 secs
period 0 solved in 2.4 secs
model solved in 40.9 secs
Plot:
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
for t in range(par.T):
ax.plot(m_grids[t,:],c_funcs[t,:],label=f'{t}')
ax.legend(loc='lower right',facecolor='white',ncol=3,frameon=True)
ax.set_xlabel('$m$')
ax.set_ylabel('$c$')
ax.set_title('consumption function over time')
ax.set_xlim([0,2])
ax.set_ylim([0,1.5]);
6.4 Simulation
Step 1: Construct interpolators from solution:
c_interps = []
for t in range(par.T):
c_interp = interpolate.RegularGridInterpolator([m_grids[t,:]],c_funcs[t,:],bounds_error=False,fill_value=None)
c_interps.append(c_interp)
Step 2: Draw initail distribution of lagged assets and simulate forward
# a. allocate
simN = 10_000
sim_m = np.zeros((par.T,simN))
sim_y = np.zeros((par.T,simN))
sim_c = np.zeros((par.T,simN))
sim_a = np.zeros((par.T,simN))
# b. simulate
a0 = np.random.exponential(size=simN) # arbitrary choice of distribution
for t in range(par.T):
# i. lagged assets
if t == 0:
a_lag = a0
else:
a_lag = sim_a[t-1,:]
# ii. income
p_vec = [par.p,1-par.p]
y_vec = [1+par.Delta,1-par.Delta]
sim_y[t,:] = np.random.choice(y_vec,p=p_vec,size=simN)
# iii. cash-on-hand
sim_m[t] = (1+par.r)*a_lag + sim_y[t,:]
# iv. consumption-saving
sim_c[t,:] = c_interps[t](sim_m[t,:])
sim_a[t,:] = sim_m[t,:]-sim_c[t,:]
Step 3: Plot distributions
for t in range(-1,par.T):
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
bins = np.linspace(0,5,100)
if t == -1:
ax.hist(np.fmin(a0,5),bins=bins)
else:
ax.hist(np.fmin(sim_a[t,:],5),bins=bins)
ax.set_title(f'assets, t = {t+1}')
ax.set_xlabel('$a_t$')
ax.set_ylabel('freq.')
ax.set_xlim([0,5])
Conclusion: You can now solve models with complex heterogeneity and uncertainty, and simulate the implied dynamics. By introducing various policies you can quantify their effect not just for the average, but for the full distribution.
This lecture:
- Solving multidimensional optimization problems with and without gradients (and hessians)
- Using multistart to alleviate problems with local minima (due to non-convexities)
- Using penalty terms to solve constrained optimization problems
- Linear interpolation between known points
- Solving dynamic optimization problems backwards period-by-period
Dynamic optimization: Extremely useful technique. Can handle multiple periods, multiple states and choices, more shocks etc. You can solve general equilibrium models where the households solve such problems.
Master courses:
- Dynamic Programming - Theory, Computation, and Empirical Applications by Bertel Schjerning
- Advanced Macroeconomics: Heterogenous Agent Models by Jeppe Druedahl
- Household Behavior over the Life Cycle by Thomas Høgholm Jørgensen
Next lecture: Canonical Economic Models.