1.5. The .mfcalc() method. Return a dataframe with transformed variables.#
Like .upd(), the .mfcalc() method of ModelFlow extends the functionality of standard pandas. It is actually a much more powerful method that can be used to solve models or mini-models or see how ModelFlow normalizes equations. It can be particularly useful when creating scenarios – uses that are presented later in this volume.
The purpose of mfcalc()is to perform quick and dirty calculations and modify datafames.
1.5.1. Workspace initialization#
Set up python session to use pandas and ModelFlow by importing their packages. Modelmf is an extension of dataframes that is part of the ModelFlow installation package (and also used by ModelFlow itself).
Create a simple dataframe
Create a Pandas dataframe with one column with the name A and 6 rows.
Set set the index to 2020 through 2026 and set the values of all the cells to 100.
pd.DataFramecreates a dataframe For more see here https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrameThe expression
[v for v in range(2020,2026)]dynamically creates a python list, and fills it with integers beginning with 2020 and ending 2025
df = pd.DataFrame( # call the dataframe constructure
100.000, # the values
index=[v for v in range(2020,2026)], #index
columns=['A'] # the column name
)
df # the result of the last statement is displayed in the output cell
| A | |
|---|---|
| 2020 | 100.0 |
| 2021 | 100.0 |
| 2022 | 100.0 |
| 2023 | 100.0 |
| 2024 | 100.0 |
| 2025 | 100.0 |
1.5.2. Create a new series from an existing series#
Use mfcalc to calculate a new column (series) as a function of the existing A column series
The below call creates a new column x.
df.mfcalc('x = x(-1) + a')
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
| A | X | |
|---|---|---|
| 2020 | 100.0 | 0.0 |
| 2021 | 100.0 | 100.0 |
| 2022 | 100.0 | 200.0 |
| 2023 | 100.0 | 300.0 |
| 2024 | 100.0 | 400.0 |
| 2025 | 100.0 | 500.0 |
Warning
By default .mfcalc will initialize a new variable with zeroes.
Moreover, if a formula passed to .mfcalc contains a lag, the result of the operation will be calculated for a row only if there is data in the series for the preceding row.
These two behaviors affect how calculations generated with .mfcalc are executed and can generate results that may sometimes by unexpected.
The initialization of new variables with zero and the treatment of lags combined means that when the command df.mfcalc('x = x(-1) + a') is executed, the value for X in 2020 will be zero (not n/a). This results because there was no X variable defined for 2019 (no such row exists). ModelFlow first initializes all values of X with zero. It then goes to calculate X in 2020. There is no X value for 2019 so it skips ahead to 2021 and calculates X as equal to 0 (the value of x in 2020) + the value for a in 2021 – etc.
As with .upd() unless we assign the result of .mfcalc() to a variable the resulting dataframe is lost. The above did not change df.
df
| A | |
|---|---|
| 2020 | 100.0 |
| 2021 | 100.0 |
| 2022 | 100.0 |
| 2023 | 100.0 |
| 2024 | 100.0 |
| 2025 | 100.0 |
1.5.3. Storing the result of an .mfcalc() call#
Above the results of the .mfcalc() operation was not assigned to an object – the DataFrame object df itself was not changed.
Below the results of the same operation are assigned to the variable df2 and therefore stored.
df2=df.mfcalc('x = x(-1) + a') # Assign the result to df2
df2
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
| A | X | |
|---|---|---|
| 2020 | 100.0 | 0.0 |
| 2021 | 100.0 | 100.0 |
| 2022 | 100.0 | 200.0 |
| 2023 | 100.0 | 300.0 |
| 2024 | 100.0 | 400.0 |
| 2025 | 100.0 | 500.0 |
Note
As discussed before, mfcalc initiates a new variables with zeroes. The lag in 2020 of X is not defined so the calculation is actually run from 2021-2025. As a result, we have a zero in 2020, and then this number is increased by 100 for each following year.
1.5.4. Recalculate A so it grows by 2 percent#
mfcalc() understands lagged variables and can do recursive calculations. Recall that if a lagged value does not exist the calculation will not be made for that period. Below this results in the warning:
```* Take care. Lags or leads in the equations, mfcalc results calculated for the period 2021 to 2025`
res = df.mfcalc('a = 1.02 * a(-1)')
res
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
| A | |
|---|---|
| 2020 | 100.000000 |
| 2021 | 102.000000 |
| 2022 | 104.040000 |
| 2023 | 106.120800 |
| 2024 | 108.243216 |
| 2025 | 110.408080 |
res.pct_change()*100 # to display the percent changes
| A | |
|---|---|
| 2020 | NaN |
| 2021 | 2.0 |
| 2022 | 2.0 |
| 2023 | 2.0 |
| 2024 | 2.0 |
| 2025 | 2.0 |
In this example, mfcalc()knows that it can not start to calculate in 2020, because A (the lagged variable) has no value in 2019.
.mfcalc() therefore begins its calculation in 2021. Note, the existing value for 2020 is preserved. This behavior differs from other programs or python methods (such as .pct_change() above) that might return a n/a value for the 2020 observation.
1.5.5. Display the normalization of an equation (the showeq option)#
The showeq option is by default = False.
By setting equal to True, mfcalc can be used to express the normalization of an entered equation.
df.mfcalc('dlog( a) = 0.02',showeq=True);
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
FRML <> A=EXP(LOG(A(-1))+0.02)$
Note
ModelFlow the expression dlog(a) refers to the difference in the natural logarithm \(dlog(x_t) \equiv ln(x_t)-ln(x_{t-1})\) and is equal to the growth rate for the variable. The dlog() syntax is borrowed from EViews.
.mfcalc() normalizes the equation such that the systems solves for a as follows:
which expressed in the business logic language of ModelFlow is:
A=EXP(LOG(A(-1))+0.02)
1.5.6. The .diff() operator with mfcalc#
The diff() operator, effectively normalizes to an equation that will add the value to the right of the equals sign to the lagged value of the variable passed to the diff operator. Thus, diff(a)=x normalizes to a=a(-1)+x
df.mfcalc('diff(a) = 2',showeq=True)
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
FRML <> A=A(-1)+(2)$
| A | |
|---|---|
| 2020 | 100.0 |
| 2021 | 102.0 |
| 2022 | 104.0 |
| 2023 | 106.0 |
| 2024 | 108.0 |
| 2025 | 110.0 |
1.5.7. mfcalc with several equations and arguments#
In addition to a single equation multiple commands can be executed with one command.
However, be careful because the equation commands are executed simultaneously, which, combined with the treatments of lags, means that results may differ from what they would be if the commands were run sequentially.
For example:
res = df.mfcalc('''
diff(a) = 2
x = a + 42
''')
res
# use res.diff() to see the difference
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
| A | X | |
|---|---|---|
| 2020 | 100.0 | 0.0 |
| 2021 | 102.0 | 144.0 |
| 2022 | 104.0 | 146.0 |
| 2023 | 106.0 | 148.0 |
| 2024 | 108.0 | 150.0 |
| 2025 | 110.0 | 152.0 |
In this example the variable A in the DataFrame df was initialized to 100 for the period 2020 through 2025.
The first line of the .mfcalc() routine produces results only for the period 2021 - 2025 because there is no value for A in 2019. The value of a in 2020 is unchanged, and the following values rise by 2 in each period.
When calculating X however, .mfcalc does not use the final result of the calculation of A, but the intermediate result (the values for 2021 through 2025).
As a result, it is this shorter series that is passed to the second question which adds 42 to that result.
X in 2020 is not 142 as one might have expected but zero, the value to which the newly created variable defaults.
Compare the results above with the results (below) when the same calculations are undertaken in two separate calls to .mfcalc().
res1 = df.mfcalc('''
diff(a) = 2
''')
res2 = res1.mfcalc('''
x = a + 42
''')
res2
* Take care. Lags or leads in the equations, mfcalc run for 2021 to 2025
| A | X | |
|---|---|---|
| 2020 | 100.0 | 142.0 |
| 2021 | 102.0 | 144.0 |
| 2022 | 104.0 | 146.0 |
| 2023 | 106.0 | 148.0 |
| 2024 | 108.0 | 150.0 |
| 2025 | 110.0 | 152.0 |
Danger
In .mfcalc(), when there are multiple equation commands in a single call, they are executed simultaneously. This, combined with mfcalc’s treatments of lags, means only the intermediate results of the lagged calculation will be passed to other commands equations defined in a single call to .mfcalc. As a consequence, results may differ from what would be expected and what would be seen if the two or more commands were run sequentially.
1.5.8. Setting a time frame with mfcalc.#
It can useful in some circumstances to limit the time frame for which the calculations are performed. Specifying a start date and end date enclosed in <> in a line restricts the time period over which subsequent calculations are performed.
In the example below zeroes are generated for x prior to 2023 when the expressions are executed.
Note
like .upd() time frames set in one line are inherited by subsequent lines unless reset explicitly.
res = df.mfcalc('''
<2023 2025>
diff(a) = 2
x = a + 42
''')
res
| A | X | |
|---|---|---|
| 2020 | 100.0 | 0.0 |
| 2021 | 100.0 | 0.0 |
| 2022 | 100.0 | 0.0 |
| 2023 | 102.0 | 144.0 |
| 2024 | 104.0 | 146.0 |
| 2025 | 106.0 | 148.0 |