Equations in MFMod and ModelFlow

# Prepare the notebook for use of ModelFlow 

# Jupyter magic command to improve the display of charts in the Notebook
%matplotlib inline

# Import pandas 
import pandas as pd

# Import the model class from the modelclass module 
from modelclass import model 

# functions that improve rendering of ModelFlow outputs
model.widescreen()
model.scroll_off();

#Replace the path below with the location of the pak.pcim file (or some other world bank model file) on your computer
mpak,bline = model.modelload('../models/pak.pcim', \
                                alfa=0.7,run=1,keep= 'Baseline')

3. Equations in MFMod and `ModelFlow`#

As noted above an MFMod is comprised of two types of equations: identities and behavioral equations. Identities are mathematical or accounting relationships that are always true. The GDP accounting identity is a well known one:

\[Y_t=C_t+I_t+G_t+X_t-M_t\]

The general government deficit as revenues less spending is another.

Behavioral equations are also endogenous equations, but in a macrostructural model they describe an economic not an accounting relationship. Typically these relationships are estimated econometrically and do not hold exactly.

In this chapter - Equations in MFMod and ModelFlow

This chapter provides a very brief overview of the type of equation in MFMod models (identities and behavioral equations), and a deep dive into the behavioral or econometrically estimated equations in these models.

Three main issues are discussed:

The importance of Addfactors (_A variables) in behavioral equations.
The mechanism by which behavioral equations can be exogenized or de-activated in the ModelFlow environment using the _D and _X variables.
An explanation of the Error Correction Model used to estimate many of the econometric relations in World Bank MFMod models.

3.1. A behavioral equation#

Normally a behavioral equation is comprised of a left-hand-side variable (the regressand or dependent variable), right-hand side variables (the regressors in the econometric relationship, or explanatory variables), estimated parameters, perhaps some imposed parameters, and an error term.

Assume \(y_t\) is the dependent variable, \(X_t\) a vector of explanatory variables and \(\eta_t\) the error term, then a simple regression can be written as:

\[y_t = \alpha + \beta X_t + \eta_t\]

where \(\alpha\) and \(\beta\) are parameters to be estimated or in most cases \(\beta\) wil be a vector of estimated parameters.

One the estimation has occurred \(\alpha, \beta\) and \(\eta_t\) take on precise values and the equation is rewritten as:

\[y_t = \hat{\alpha} + \hat{\beta} X_t + \hat{\eta_t}\]

where the hats “^” signify the specific value for the parameter that emerged from the estimation process.

We can also write an expression for \(\hat{y}_t\) the fitted value from the regression as:

\[\hat{y}_t = \hat{\alpha} + \hat{\beta} X_t \]

Substituting this expression into the previous expression and re-arranging gives us

\[y_t-\hat{y}_t= \hat{\eta_t}\]

All of which are fairly elementary results from econometrics.

3.2. The add factor in behavioral equations#

The econometrics used to estimate the equation ensure that the expected value of \(\eta_t\) is zero. So the expectation of the above equation during the forecast period is

\[\begin{align*} E(y_t-\hat{y}_t) &= E (\hat{\eta_t}) \\ y_t-\hat{y}_t &= 0 \\ \end{align*}\]

In Macrostructural models the first of these equations is rewritten by substituting \(AF_t\) for \(\hat{\eta}_t\).

\[y_t= \hat{y}_t + AF_t\]

By imposing a nonzero value on \(AF_t\), the modeller can add her judgment to the model’s fitted value, either to reflect a view that the forecast value of y will deviate from the fitted value, or because some change in circumstances (say a policy change) will cause the underlying equation to be different in the future than it was when the parameters were estimated (regime change or structural break).

In World Bank models using ModelFlow the addfactor of an equation is given the same mnemonic as the dependent variable with an _A appended to it. Thus, in the above simplified version, the equation would be written as

\[y_t = \hat{\alpha} + \hat{\beta} X_t + y\_A_t\]

3.3. Excluding behavioral equations#

In ModelFlow behavioral equations can be excluded “de-activated” or included (“activated”). This is achieved by adding two additional variables to each equation. The first is given the name of the dependent variable with _D appended. The second is given the name of the dependent variable with _X appended.

The preceding equation is then re-written as below

\[\begin{equation*} y_t = (1-y\_D_t)\cdot\underbrace{\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]}_{\begin{array}{c} \text{Econometric equation}\end{array}} + y\_D_t\cdot \underbrace{y\_X_t}_{\begin{array}{c} \text{Exogenized} \\ \text{value} \end{array}} \end{equation*}\]

When \(y\_D_t\) = 0, the second part of the equation \(y\_D_t*y\_X_t\) evaluates to zero and drops out, while the expression \((1-y\_D_t)\) evaluates to one. Thus the whole equation simplifies to the standard behavioral equation.

\[\begin{align*} y_t &= 1\cdot\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]+ 0\\ y_t &= \hat{\alpha} + \hat{\beta} X_t + y\_A_t \end{align*}\]

When \(y\_D_t\) = 1, the \((1-y\_D_t)\) evaluates to zero so the first part of the equation drops out, and the equation simplifies to:

\[\begin{align*} y_t &= 0\cdot\biggl[\hat{\alpha} + \hat{\beta} X_t + y\_A_t\biggr]+ 1\cdot y\_X_t\\ y_t &= y\_X_t\\ \end{align*}\]

Thus, when \(y\_D_t\) = 1 the whole equation simply sets the endogenous variable \(y_t\) equal to the exogenous variable \(y\_X_t\).

3.4. Behavioral equations in ModelFlow#

It follows therefore that equations in ModelFlow have three special variables associated with them.

Special variables in ModelFlow behavioral equations

Terminator	Meaning	Role
_A	Add factor:	Special variable to allow judgment to be added to an equation
_X	Exogenized value:	Special variable that stores the value that the equation should return if exogenized
_D	Exogenous dummy:	Dummy variable. When set to one, the equation will return the value of the \(\_X\) variable, if zero, it returns the fitted value of the equation plus the Add factor.

Below the EViews and ModelFlow representations of the Household consumption equation are extracted from the model object using .frml() and .eviews() methods discussed in the previous chapter.

In the EViews representation we do not see the special variables but in the frml representation (which is the one actually used by ModelFlow they are visible.

mpak.PAKNECONPRVTKN.eviews

DLOG(PAKNECONPRVTKN) =- 0.2*(LOG(PAKNECONPRVTKN( - 1)) - LOG(1.21203101101442) - LOG((((PAKBXFSTREMTCD( - 1) - PAKBMFSTREMTCD( - 1))*PAKPANUSATLS( - 1)) + PAKGGEXPTRNSCN( - 1) + PAKNYYWBTOTLCN( - 1)*(1 - PAKGGREVDRCTXN( - 1)/100))/PAKNECONPRVTXN( - 1))) + 0.763938860758873*DLOG((((PAKBXFSTREMTCD - PAKBMFSTREMTCD)*PAKPANUSATLS) + PAKGGEXPTRNSCN + PAKNYYWBTOTLCN*(1 - PAKGGREVDRCTXN/100))/PAKNECONPRVTXN) - 0.0634474791568939*@DURING("2009") - 0.3*(PAKFMLBLPOLYXN/100 - DLOG(PAKNECONPRVTXN))

mpak.PAKNECONPRVTKN.frml

Endogeneous: PAKNECONPRVTKN: HH. Cons Real
Formular: FRML <DAMP,STOC> PAKNECONPRVTKN = (PAKNECONPRVTKN(-1)*EXP(PAKNECONPRVTKN_A+ (-0.2*(LOG(PAKNECONPRVTKN(-1))-LOG(1.21203101101442)-LOG((((PAKBXFSTREMTCD(-1)-PAKBMFSTREMTCD(-1))*PAKPANUSATLS(-1))+PAKGGEXPTRNSCN(-1)+PAKNYYWBTOTLCN(-1)*(1-PAKGGREVDRCTXN(-1)/100))/PAKNECONPRVTXN(-1)))+0.763938860758873*((LOG((((PAKBXFSTREMTCD-PAKBMFSTREMTCD)*PAKPANUSATLS)+PAKGGEXPTRNSCN+PAKNYYWBTOTLCN*(1-PAKGGREVDRCTXN/100))/PAKNECONPRVTXN))-(LOG((((PAKBXFSTREMTCD(-1)-PAKBMFSTREMTCD(-1))*PAKPANUSATLS(-1))+PAKGGEXPTRNSCN(-1)+PAKNYYWBTOTLCN(-1)*(1-PAKGGREVDRCTXN(-1)/100))/PAKNECONPRVTXN(-1))))-0.0634474791568939*DURING_2009-0.3*(PAKFMLBLPOLYXN/100-((LOG(PAKNECONPRVTXN))-(LOG(PAKNECONPRVTXN(-1)))))) )) * (1-PAKNECONPRVTKN_D)+ PAKNECONPRVTKN_X*PAKNECONPRVTKN_D  $

PAKNECONPRVTKN  : HH. Cons Real
DURING_2009     : 
PAKBMFSTREMTCD  : Imp., Remittances (BOP), US$ mn
PAKBXFSTREMTCD  : Exp., Remittances (BOP), US$ mn
PAKFMLBLPOLYXN  : Key Policy Interest Rate
PAKGGEXPTRNSCN  : Current Transfers
PAKGGREVDRCTXN  : Direct Revenue Tax Rate
PAKNECONPRVTKN_A: Add factor:HH. Cons Real
PAKNECONPRVTKN_D: Fix dummy:HH. Cons Real
PAKNECONPRVTKN_X: Fix value:HH. Cons Real
PAKNECONPRVTXN  : Implicit LCU defl., Pvt. Cons., 2000 = 1
PAKNYYWBTOTLCN  : Total Wage Bill
PAKPANUSATLS    : Exchange rate LCU / US$ - Pakistan

Careful inspection of the output from the .frml() and eviews() methods, reveals that in the .frml() specification the three special variables have been added to the model formula that are not part of the EViews output. These variables each have the same root mnemonic as the dependent variable PAKNECONRPVTKN but have special terminators _A _X _D appended to them.

To exclude an equation, the _D variable is set to 1 and the equation simplifies to PAKNECONRPVTKN=PAKNECONRPVTKN\_X if _D=0 then the econometric relationship and the add-factor will jointly determine the value of PAKNECONRPVTKN.

3.5. The ECM specification#

Many of the behavioral equations in World Bank models are written as Error Correction Models (ECMs).

The Error correction specification was developed to deal with two important problems in econometric equations.

Many time-series data tend to increase over time. As a result, a regression of one series on another series tends to have good fit even if the two variables are not really connected economically. For example, the price of cookies tends to rise over time because of inflation. Similarly, the quantity of screws produced in the manufacturing sector tends to rise over time because of increased population and, therefore, demand for manufactured goods. Regressing screw production on cookie prices will show a strong but spurious correlation.
Purely short run models focus on growth or differences and get around the problem of the spurious correlation arising from regressing two unrelated series that each have a trend. While, these explained the short run deviations, stringing the estimated growth rates together could result in implicit levels that were unstable because they were not anchored to the long-run relationship between variables dictated by underlying economic theory (or empirical behavior).

The co-integration approach to econometrics ({cite:t} engle_co-integration_1987) combined with the closely related ECM approach provided a solution to the above problem by providing a mechanism for modeling both the long run relationship and short-run relationships between variables.

The ECM specification used in World Bank models is a single equation approach that follows (Wickens and Breusch [1988]) and is comprised of two parts (the long run relationship, and the short-run relationship), which are estimated simultaneously.

Consider as an example two variables say consumption and disposable income. Both have an underlying trend or in the parlance are co-integrated to degree 1. For simplicity we call them y and x.

3.5.1. The short run relationship#

In its simplest form, a short run relationship between the growth rates of two variables could be written as:

\[\Delta ln(Y_t) = \alpha + \beta \Delta ln(X_t) +\epsilon_t\]

or substituting lower case letters for the logged values.

\[\Delta y_t = \alpha + \beta \Delta x_t +\epsilon_t\]

3.5.2. The long run equation#

The long run relates the level of two (or more) variables. A simplified version of that equation can be written as:

\[Y_t=αX_t^β+ \eta_t\]

Rewriting this (in logarithms) it can be expressed as:

\[y_t = ln⁡(α) + βx_t + \eta_t\]

3.5.2.1. The long run equation in the steady state#

Note that in the steady state the expected value of the error term in the long run equation is zero (\(\eta_t=0 \)) so in those conditions the long run relationship can be simplified to:

\[y_t=ln⁡(α)+\beta x_t + 0\]

or equivalently (substituting A for the log of \(\alpha\)).

\[y_t-A-βx_t=0\]

Moreover if this expression is multiplied by some arbitrary constant, say \(-\lambda\), it would still equal zero.

\[-\lambda(y_t -A-βx_t)\]

and in the steady state this will also be true for the lagged variables

\[-\lambda(y_{t-1}- A - βx_{t-1})\]

The part of the equation between the parenthesis is equal to the lagged error term of the long-run equation (\(\eta_{t-1}\)). In the Long Run its expected value is zero, but at any give instant it could be different from zero. The distance it is from zero at any point in time, reflects the distance that the dependent variable is from its long-run equilibrium value at that moment.

3.5.3. Putting it together#

From before we have the short run equation:

\[\Delta y_t = \alpha + \beta \Delta x_t +\epsilon_t\]

Inserting the steady state expression for the long-run into the short run equation makes no difference (in the long run) because in the long run it is equal to zero.

\[\Delta y_t = -\lambda(y_{t-1}-A-\beta x_{t-1}) + \alpha + \beta \Delta x_t +\epsilon_t\]

When the model is not in the steady state, the expression \(y_{t-1}-A-βx_{t-1}\) is of course the error term from the long run equation from the previous period (a measure of how far the dependent variable was from equilibrium).

3.5.4. Lambda, the speed of adjustment#

The parameter \(\lambda\) can then be interpreted as the speed of adjustment. It determines what share of the previous period error (distance from equilibrium) is absorbed in the following period. As long as \(\lambda\) is greater than zero and less or equal to one if there are no further disturbances ( \(\epsilon_t=0\)) the expression multiplied by lambda will slowly decline toward zero. How fast depends on how large or small is \(\lambda\).

Intuitively, the lagged long-run error-term measures how far the model was from equilibrium one period earlier (at t-1). The ECM term (multiplied by \(\lambda\) ensures the model will slowly converge to equilibrium – the point at which the long run equation holds exactly – if \(\lambda\) is greater than zero but less than or equal to one. In these conditions during each each time period some portion \(\lambda\) of the previous period year’s disequilibrium will be absorbed each year. How much is absorbed depends on the size of estimated speed of the adjustment coefficient \(\lambda\).

An ECM equation can, therefore be broken into two component parts. For the consumption function it will look something like this:

\[\Delta c_t = -\lambda (\underbrace{ log(C_{t-1})-log(Wages_{t-1}-Taxes_{t-1}+Transfers_{t-1}) -log(\alpha))} _\text{Long run} +\beta \underbrace{\Delta x_t}_\text{short run}\]

More precisely to be convergent \(\lambda\) must be between 0 and 2. If Lambda is greater than 1 but less than two, the error term will oscillate from positive to negative but will slowly converge to zero.

If lambda is greater than 2 (or less than zero), then the long run portion of the equation will cause the disequilibrium to grow each period not diminish.

If lambda is less is greater than zero but less than one, the equation will converge more or less directly at a speed determined by the value of \(\lambda\).

3.5.4.1. An illustrative example#

Below three ECMs are written out, each with and equilibrium value of 50 and different speeds of adjustment ranging from 0.3, 0.5 and 0.9. The figure and table below illustrate the adjustment to the equilibrium value starting from an initial value of 100 (error of 50) under the three speeds of adjustment.

import pandas as pd
ECMdf = pd.DataFrame({'E': 100},index=[v for v in range(2020,2051)])
ECMdf=ECMdf.upd('lbda3 lbda9 lbda5 = 100')
ECMdf=ECMdf.mfcalc('''
<2021 2050> dlog(Lbda3) = -.3 * (log(Lbda3(-1))-log(50))
<2021 2050> dlog(Lbda9) = -.9 * (log(Lbda9(-1))-log(50))
<2021 2050> dlog(Lbda5) = -.5 * (log(Lbda5(-1))-log(50))

''')
              
ECMdf.plot(title="Error correction process for different speeds of adjustment");

../../_images/9cbadc8e82ff47e67ad71593f8901a3b0f2894dd0352a2b6c7fcd6d6e55ce17f.png

With a slow speed of adjustment, the equilibrium level of 50 is not achieved until around 2030 (<51) or 2032 (50.5). With \(\lambda\)=0.5 the gap is closed in around 5 years (2025=50.5), while with \(\lambda\)=0.9 it takes just two years (2023=50.3).

Note

Advanced formatting of tables The table below introduces some advanced formatting routines, using the Pandas style property. For more see here.

Info on python named colors can be found here: https://matplotlib.org/stable/gallery/color/named_colors.html.

    
def color_proximity(val):
    if val > 80:
        color="red"
    elif val > 70:
        color="orangered"
    elif val > 55:
        color="coral"
    elif val > 51:
        color="lightsalmon"
    elif val > 50.5:
        color="peachpuff"
    else: 
        color="white"
    return 'background-color: %s' % color



ECMdf.loc[2020:2035,['LBDA3','LBDA5','LBDA9']].style.map(color_proximity) \
.format(precision=2).set_table_attributes('style="font-size: 10px"')