1.5. The .mfcalc() method. Return a dataframe with transformed variables.#

Like .upd(), the .mfcalc() method of ModelFlow extends the functionality of standard pandas. It is actually a much more powerful method that can be used to solve models or mini-models or see how ModelFlow normalizes equations. It can be particularly useful when creating scenarios – uses that are presented later in this volume.

The purpose of mfcalc()is to perform quick and dirty calculations and modify datafames.

1.5.1. Workspace initialization#

Set up python session to use pandas and ModelFlow by importing their packages. Modelmf is an extension of dataframes that is part of the ModelFlow installation package (and also used by ModelFlow itself).

Create a simple dataframe

Create a Pandas dataframe with one column with the name A and 6 rows.

Set set the index to 2020 through 2026 and set the values of all the cells to 100.

df = pd.DataFrame(                                 # call the dataframe constructure 
    100.000,                                           # the values 
    index=[v for v in range(2020,2026)],           #index
    columns=['A']                                  # the column name 
                 )
df   # the result of the last statement is displayed in the output cell 
A
2020 100.0
2021 100.0
2022 100.0
2023 100.0
2024 100.0
2025 100.0

1.5.2. Create a new series from an existing series#

Use mfcalc to calculate a new column (series) as a function of the existing A column series

The below call creates a new column x.

df.mfcalc('x = x(-1) + a')
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
A X
2020 100.0 0.0
2021 100.0 100.0
2022 100.0 200.0
2023 100.0 300.0
2024 100.0 400.0
2025 100.0 500.0

Warning

By default .mfcalc will initialize a new variable with zeroes.

Moreover, if a formula passed to .mfcalc contains a lag, the result of the operation will be calculated for a row only if there is data in the series for the preceding row.

These two behaviors affect how calculations generated with .mfcalc are executed and can generate results that may sometimes by unexpected.

The initialization of new variables with zero and the treatment of lags combined means that when the command df.mfcalc('x = x(-1) + a') is executed, the value for X in 2020 will be zero (not n/a). This results because there was no X variable defined for 2019 (no such row exists). ModelFlow first initializes all values of X with zero. It then goes to calculate X in 2020. There is no X value for 2019 so it skips ahead to 2021 and calculates X as equal to 0 (the value of x in 2020) + the value for a in 2021 – etc.

As with .upd() unless we assign the result of .mfcalc() to a variable the resulting dataframe is lost. The above did not change df.

df
A
2020 100.0
2021 100.0
2022 100.0
2023 100.0
2024 100.0
2025 100.0

1.5.3. Storing the result of an .mfcalc() call#

Above the results of the .mfcalc() operation was not assigned to an object – the DataFrame object df itself was not changed.

Below the results of the same operation are assigned to the variable df2 and therefore stored.

df2=df.mfcalc('x = x(-1) + a') # Assign the result to df2
df2
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
A X
2020 100.0 0.0
2021 100.0 100.0
2022 100.0 200.0
2023 100.0 300.0
2024 100.0 400.0
2025 100.0 500.0

Note

As discussed before, mfcalc initiates a new variables with zeroes. The lag in 2020 of X is not defined so the calculation is actually run from 2021-2025. As a result, we have a zero in 2020, and then this number is increased by 100 for each following year.

1.5.4. Recalculate A so it grows by 2 percent#

mfcalc() understands lagged variables and can do recursive calculations. Recall that if a lagged value does not exist the calculation will not be made for that period. Below this results in the warning:

```* Take care. Lags or leads in the equations, mfcalc results calculated for the period 2021 to 2025`

res = df.mfcalc('a =  1.02 *  a(-1)')
res
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
A
2020 100.000000
2021 102.000000
2022 104.040000
2023 106.120800
2024 108.243216
2025 110.408080
res.pct_change()*100 # to display the percent changes
A
2020 NaN
2021 2.0
2022 2.0
2023 2.0
2024 2.0
2025 2.0

In this example, mfcalc()knows that it can not start to calculate in 2020, because A (the lagged variable) has no value in 2019.

.mfcalc() therefore begins its calculation in 2021. Note, the existing value for 2020 is preserved. This behavior differs from other programs or python methods (such as .pct_change() above) that might return a n/a value for the 2020 observation.

1.5.5. Display the normalization of an equation (the showeq option)#

The showeq option is by default = False.

By setting equal to True, mfcalc can be used to express the normalization of an entered equation.

df.mfcalc('dlog( a) =  0.02',showeq=True);
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
FRML <> A=EXP(LOG(A(-1))+0.02)$

Note

ModelFlow the expression dlog(a) refers to the difference in the natural logarithm \(dlog(x_t) \equiv ln(x_t)-ln(x_{t-1})\) and is equal to the growth rate for the variable. The dlog() syntax is borrowed from EViews.

.mfcalc() normalizes the equation such that the systems solves for a as follows:

\[\begin{align*} dlog(a) &= 0.02\\ log(a)-log(a_{t-1}) &= .02\\ log(a) &=log(a_{t-1})+.02\\ a &= e^{log(a_{t-1})+0.02}\\ a &=a_{t-1}*e^{0.02}\\ \end{align*}\]

which expressed in the business logic language of ModelFlow is:

A=EXP(LOG(A(-1))+0.02)

1.5.6. The .diff() operator with mfcalc#

The diff() operator, effectively normalizes to an equation that will add the value to the right of the equals sign to the lagged value of the variable passed to the diff operator. Thus, diff(a)=x normalizes to a=a(-1)+x

df.mfcalc('diff(a) =  2',showeq=True)
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
FRML <> A=A(-1)+(2)$
A
2020 100.0
2021 102.0
2022 104.0
2023 106.0
2024 108.0
2025 110.0

1.5.7. mfcalc with several equations and arguments#

In addition to a single equation multiple commands can be executed with one command.

However, be careful because the equation commands are executed simultaneously, which, combined with the treatments of lags, means that results may differ from what they would be if the commands were run sequentially.

For example:

res = df.mfcalc('''
diff(a) =  2
x = a + 42 
''')

res

# use res.diff() to see the difference
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
A X
2020 100.0 0.0
2021 102.0 144.0
2022 104.0 146.0
2023 106.0 148.0
2024 108.0 150.0
2025 110.0 152.0

In this example the variable A in the DataFrame df was initialized to 100 for the period 2020 through 2025.

The first line of the .mfcalc() routine produces results only for the period 2021 - 2025 because there is no value for A in 2019. The value of a in 2020 is unchanged, and the following values rise by 2 in each period.

When calculating X however, .mfcalc does not use the final result of the calculation of A, but the intermediate result (the values for 2021 through 2025).

As a result, it is this shorter series that is passed to the second question which adds 42 to that result.

X in 2020 is not 142 as one might have expected but zero, the value to which the newly created variable defaults.

Compare the results above with the results (below) when the same calculations are undertaken in two separate calls to .mfcalc().

res1 = df.mfcalc('''
diff(a) =  2
''')

res2 = res1.mfcalc('''
x = a + 42 
''')
res2
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
A X
2020 100.0 142.0
2021 102.0 144.0
2022 104.0 146.0
2023 106.0 148.0
2024 108.0 150.0
2025 110.0 152.0

Danger

In .mfcalc(), when there are multiple equation commands in a single call, they are executed simultaneously. This, combined with mfcalc’s treatments of lags, means only the intermediate results of the lagged calculation will be passed to other commands equations defined in a single call to .mfcalc. As a consequence, results may differ from what would be expected and what would be seen if the two or more commands were run sequentially.

1.5.8. Setting a time frame with mfcalc.#

It can useful in some circumstances to limit the time frame for which the calculations are performed. Specifying a start date and end date enclosed in <> in a line restricts the time period over which subsequent calculations are performed.

In the example below zeroes are generated for x prior to 2023 when the expressions are executed.

Note

like .upd() time frames set in one line are inherited by subsequent lines unless reset explicitly.

res = df.mfcalc('''
<2023 2025>
diff(a) =  2
x = a + 42 
''')

res
A X
2020 100.0 0.0
2021 100.0 0.0
2022 100.0 0.0
2023 102.0 144.0
2024 104.0 146.0
2025 106.0 148.0