[1]
import numpy as np

Linear regression

Consider the following linear equation:

yi=β0+β1x1,i+β2x2,i+ϵiy_i = \beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i} + \epsilon_i

Assume you have access to data of the independent variables (x1,ix_{1,i}, x2,ix_{2,i}) and the dependent variable (yiy_i) for NN individuals, where ii indexes individuals. The variable ϵi\epsilon_i is a mean-zero stochastic shock.

Assume the data generating process is given by:

[2]
def DGP(N):
    """ data generating process
    
    Args:
        
        N (int): number of observations
        
    Returns:
    
        x1 (ndarray): independent variable x1
        x2 (ndarray): independent variable x2
        y (ndarray): dependent varialbe y
        
    """
    
    # a. independent variables
    x1 = np.random.normal(0,1,size=N)
    x2 = np.random.normal(0,1,size=N)
    
    # b. errors
    eps = np.random.normal(0,1,size=N)
    
    extreme = np.random.uniform(0,1,size=N)
    eps[extreme < 0.05] += np.random.normal(-5,1,size=N)[extreme < 0.05]
    eps[extreme > 0.95] += np.random.normal(5,1,size=N)[extreme > 0.95]
    
    # c. dependent variable
    y = 0.1 + 0.3*x1 + 0.5*x2 + eps
    
    return x1, x2, y

The data you have access to is:

[3]
np.random.seed(2020)
x1,x2,y = DGP(10000)

Question 1: Estimate the vector of coefficients β=(β0,β1,β2)\mathbf{\beta} = (\beta_0,\beta_1,\beta_2) using ordinary least squares (OLS) implemented with matrix algebra by

β^=(XX)1Xy\hat{\mathbf{\beta}} = (\mathbf{X}^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{y}

where X\mathbf{X}^{\prime} is the transpose of X\mathbf{X} and

KaTeX parse error: Expected '}', got '\cr' at position 15: \pmatrix{ y_1 \̲c̲r̲ y_2 \cr \vdot…

Question 2: Construct a 3D plot, where the data is plotted as scattered points, and the prediction of the model is given by the plane

y^i=β^0+β^1x1,i+β^2x2,i\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_{1,i} + \hat{\beta}_2 x_{2,i}

Question 3: Esimtate the vector of coefficients β=(β0,β1,β2)\mathbf{\beta} = (\beta_0,\beta_1,\beta_2) using a numerical solver to solve the ordinary least square problem, shown below, directly. Compare your results with the matrix algebra results.

minβi=1N(yi(β0+β1x1,i+β2x2,i))2\min_{\mathbf{\beta}} \sum^N_{i=1} (y_i - (\beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i}) )^2

Question 4: Estimate the vector of coefficients β=(β0,β1,β2)\mathbf{\beta} = (\beta_0,\beta_1,\beta_2) using least absolute deviations (LAD) using a numerical solver to solve the following problem directly:

minβi=1Nyi(β0+β1x1,i+β2x2,i)\min_{\beta} \sum^N_{i=1} |y_i - (\beta_0 + \beta_1 x_{1,i} + \beta_2 x_{2,i}) |

where z|z| is the absolute value of zz.

Question 5: Set N=50N = 50. Repeat the estimation using the OLS and LAD methods K=5000K=5000 times, drawing a new random sample from the data generating process each time. Compare the estimates from each method using histograms. Which method do you prefer? Explain your choice.

Durable purchases

Consider a household living in two periods.

In the second period it gets utility from non-durable consumption, cc, and durable consumption, d+χxd+\chi x:

v2(m2,d)=maxc(cα(d+χx)1α)1ρ1ρs.t.x=m2cc[0,m2]\begin{aligned} v_{2}(m_{2},d)&= \max_{c}\frac{(c^{\alpha}(d+\chi x)^{1-\alpha})^{1-\rho}}{1-\rho}\\ \text{s.t.} \\ x &= m_{2}-c \\ c &\in [0,m_{2}] \end{aligned}

where

  • m2m_2 is cash-on-hand in the beginning of period 2
  • cc is non-durable consumption
  • dd is pre-commited durable consumption
  • x=m2cx = m_2 - c is extra durable consumption
  • ρ>1\rho > 1 is the risk aversion coefficient
  • α(0,1)\alpha \in (0,1) is the utility weight on non-durable consumption
  • χ(0,1)\chi \in (0,1) implies that extra durable consumption is less valuable than pre-comitted durable consumption
  • the second constraint ensures the household cannot die in debt

The value function v2(m2,d)v_2(m_2,d) measures the household's value of having m2m_2 at the beginning of period 2 with precomitted durable consumption of dd. The optimal choice of non-durable consumption is denoted c(m2,d)c^{\ast}(m_2,d). The optimal extra durable consumption function is x(m2,d)=m2c(m2,d)x^{\ast}(m_2,d) = m_2-c^{\ast}(m_2,d).

Define the so-called end-of-period 1 value function as:

w(a,d)βE1[v2(m2,d)]\begin{aligned} w(a,d)&\equiv\beta\mathbb{E}_{1}\left[v_2(m_2,d)\right] \end{aligned}

where

m2=(1+r)a+yy={1Δwith prob. 131with prob. 131+Δwith prob. 13\begin{aligned} m_2&= (1+r)a+y \\ y &= \begin{cases} 1-\Delta & \text{with prob. }\frac{1}{3}\\ 1 & \text{with prob. }\frac{1}{3}\\ 1+\Delta & \text{with prob. }\frac{1}{3} \end{cases}\\ \end{aligned}

and

  • aa is assets at the end of period 1
  • β>0\beta > 0 is the discount factor
  • E1\mathbb{E}_1 is the expectation operator conditional on information in period 1
  • yy is income in period 2
  • Δ(0,1)\Delta \in (0,1) is the level of income risk (mean-preserving)
  • rr is the return on savings

In the first period, the household chooses it's pre-comitted level of durable consumption for the next-period,

v1(m1)=maxdw(a,d)s.t.a=m1dd[0,m1]\begin{aligned} v_{1}(m_{1})&=\max_{d} w(a,d)\\&\text{s.t.}&\\ a&= m_{1}-d \\ d&\in [0,m_{1}]\\ \end{aligned}

where m1m_1 is cash-on-hand in period 1. The second constraint ensures the household cannot borrow. The value function v1(m1)v_1(m_1) measures the household's value of having m1m_1 at the beginning of period 1. The optimal choice of pre-committed durable consumption is denoted d(m1)d^{\ast}(m_1).

The parameters and grids for m1m_1, m2m_2 and dd should be:

[4]
# a. parameters
rho = 2
alpha = 0.8
chi = 0.9
beta = 0.96
r = 0.04
Delta = 0.25

# b. grids
m1_vec = np.linspace(1e-8,10,100)
m2_vec = np.linspace(1e-8,10,100)
d_vec = np.linspace(1e-8,5,100)

Question 1: Find and plot the functions v2(m2,d)v_{2}(m_{2},d), c(m2,d)c^{\ast}(m_2,d), and x(m2,d)x^{\ast}(m_2,d). Comment.

Question 2: Find and plot the functions v1(m1)v_{1}(m_{1}) and d(m1)d^{\ast}(m_1). Comment.

Hint: For interpolation of v2(m2,d)v_2(m_2,d) consider using interpolate.RegularGridInterpolator([GRID-VECTOR1,GRID-VECTOR2],VALUE-MATRIX,bounds_error=False,fill_value=None).

Next, consider an extension of the model, where there is also a period 0. In this period, the household makes a choice whether to stick with the level of durables it has, z=0z = 0, or adjust its stock of durables, z=1z = 1. If adjusting, the household loses a part of the value of its durable stock; more specificaly it incurs a proportional loss of Λ(0,1)\Lambda \in (0,1).

Mathematically, the household problem in period 0 is:

v0(m0,d0)=maxz{0,1}{w(m0,d0)if z=0v1(m0+(1Λ)d0)if z=1\begin{aligned} v_{0}(m_{0},d_{0}) &= \max_{z\in\{0,1\}} \begin{cases} w(m_{0},d_{0}) & \text{if } z = 0\\ v_1(m_0+(1-\Lambda) d_{0}) & \text{if } z = 1\\ \end{cases}\\ \end{aligned}

The parameters and grids for m0m_0 and d0d_0 should be:

[5]
Lambda = 0.2
m0_vec = np.linspace(1e-8,6,100)
d0_vec = np.linspace(1e-8,3,100)

Question 3: For which values of m0m_0 and d0d_0 is the optimal choice not to adjust, i.e. z=0z = 0? Show this in a plot. Give an interpretion of your results.

Gradient descent

Let x=[x1x2]\boldsymbol{x} = \left[\begin{array}{c} x_1 \\ x_2\\ \end{array}\right] be a two-dimensional vector. Consider the following algorithm:

Algorithm: gradient_descent()

Goal: Minimize the function f(x)f(\boldsymbol{x}).

  1. Choose a tolerance ϵ>0\epsilon>0, a scale factor Θ>0\Theta > 0, and a small number Δ>0\Delta > 0
  2. Guess on x0\boldsymbol{x}_0 and set n=1n=1
  3. Compute a numerical approximation of the jacobian for ff by

    f(xn1)1Δ[f(xn1+[Δ0])f(xn1)f(xn1+[0Δ])f(xn1)]\nabla f(\boldsymbol{x}_{n-1}) \approx \frac{1}{\Delta}\left[\begin{array}{c} f\left(\boldsymbol{x}_{n-1}+\left[\begin{array}{c} \Delta\\ 0 \end{array}\right]\right)-f(\boldsymbol{x}_{n-1})\\ f\left(\boldsymbol{x}_{n-1}+\left[\begin{array}{c} 0\\ \Delta \end{array}\right]\right)-f(\boldsymbol{x}_{n-1}) \end{array}\right]
  4. Stop if the maximum element in f(xn1)|\nabla f(\boldsymbol{x}_{n-1})| is less than ϵ\epsilon

  5. Set θ=Θ\theta = \Theta
  6. Compute fnθ=f(xn1θf(xn1))f^{\theta}_{n} = f(\boldsymbol{x}_{n-1} - \theta \nabla f(\boldsymbol{x}_{n-1}))
  7. If fnθ<f(xn1)f^{\theta}_{n} < f(\boldsymbol{x}_{n-1}) continue to step 9
  8. Set θ=θ2\theta = \frac{\theta}{2} and return to step 6
  9. Set xn=xn1θf(xn1)x_{n} = x_{n-1} - \theta \nabla f(\boldsymbol{x}_{n-1})
  10. Set n=n+1n = n + 1 and return to step 3

Question: Implement the algorithm above such that the code below can run.

Optimizer function:

[6]
def gradient_descent(f,x0,epsilon=1e-6,Theta=0.1,Delta=1e-8,max_iter=10_000):
    pass

Test case:

[7]
def rosen(x):
    return (1.0-x[0])**2+2*(x[1]-x[0]**2)**2

x0 = np.array([1.1,1.1])
try:
    x,it = gradient_descent(rosen,x0)
    print(f'minimum found at ({x[0]:.4f},{x[1]:.4f}) after {it} iterations')
    assert np.allclose(x,[1,1])
except:
    print('not implemented yet')
not implemented yet