Economic Analysis with News Sources

Economic Analysis with News Sources#

New analytical techniques have increased the role of non-traditional data sources for economic analysis, including text-based data. This research explores the use of text-based data from news articles, using natural language processing (NLP), to fill key data gaps on economic sentiments and prices, offering insights into relevant economic trends across the East Asia and Pacific region.

Data Sources#

The East Asia and Pacific region hosts a substantial corpus of accessible English-based content from newspapers and international news platforms, providing an opportunity to generate timely, comprehensive indicators of economic and political trends. Specifically, local news outlets from East Asia and Pacific countries, complemented by regional sources such as the Pacific Islands News Association (PINA), ABC Australia (ABC AU), and Radio New Zealand (RNZ), were selected due to their robust coverage and reliability. We used web-scraping techniques to extract articles from the selected sources, before organizing the contents into structured datasets.

Table 1: News Sources by Country

Country	Newspaper/Media Source	Number of Articles	From
Cambodia	Khmer Times	69,680	1970-01-01
China	China Daily	10,512	2014-03-28
	People’s Daily Online	3,442	2024-09-13
Fiji	Fiji Sun	63,880	2008-05-27
Indonesia	Antara	10,886	2025-09-23
	Jakarta Post	1,635	2025-02-24
	Tempo	77,615	2003-07-21
Japan	Japan News	51,555	2022-04-29
	Japan Today	4,500	2012-09-27
	The Asahi Shimbun	11,399	2020-04-16
Lao	The Laotian Times	8,687	2016-06-03
Malaysia	Malay Mail	225,506	2013-06-18
Marshall Islands	MI Journal	1,620	2015-01-02
Mongolia	UB Post	462	2016-10-08
New Zealand	New Zealand Herald	16,802	2025-06-10
Pacific	Australian Broadcasting Corporation (ABC AU)	25,468	2003-02-19
	PINA	39,176	2011-11-19
	Radio New Zealand (RNZ)	53,118	2007-06-17
Palau	Island Times	10,094	2016-06-03
Papua New Guinea	PNG Business News	3,498	2019-05-24
	Post Courier	52,768	2015-12-16
Philippines	Asia News Network	3,067	2018-04-03
	Inquirer	50,685	1998-10-07
	Philippine Star	220	2025-10-11
Samoa	Samoa Observer	77,557	2012-01-01
Singapore	The Independent	1,885	2022-10-17
	The Straits Times	9,789	2024-09-15
	Today Online	616	2024-04-13
Solomon Islands	SIBC	10,916	2013-12-14
	Solomon Star	34,109	2014-04-10
	Solomon Times	22,976	2007-04-14
	The Island Sun	10,301	2017-09-01
South Korea	The Korea Herald	12,431	2025-05-05
	The Korea Times	94,323	2006-12-07
Thailand	Nation Thailand	13,854	2024-04-22
Tonga	Matangi Tonga Online	40,481	1997-11-04
Vanuatu	Vanuatu Daily Post	35,333	2014-04-08
	Vanuatu Business Review (VBR)	577	2020-04-27
Vietnam	Tuoi Tre	36,564	1970-01-01
	Vietnam News	38,577	2004-06-21
Total		1,292,472

Methods#

Economic Policy Uncertainty (EPU) Index#

One of the most influential applications of exploiting text data in economics is the Economic Policy Uncertainty (EPU) index first developed by Baker et al. [2016]. In the initial application, an index of policy uncertainty was constructed based on analyzing the frequency of keywords related to economics, policy, and uncertainty in news articles. The authors found periods of elevated policy uncertainty to be strongly associated with declining in investment and employment, highlighting the negative impact of uncertainty on economic decision-making.

The construction of the EPU index follows a systematic approach where a news article must meet three criteria by containing at least one keyword from economic, policy, and uncertainty categories. Once the relevant news articles are identified, the EPU index is constructed through the following steps:

Table 2: EPU Index Keywords

Category	Words
Economic	“economy”, “economic”, “economics”, “business”, “finance”, “financial”
Policy	“government”, “governmental”, “authorities”, “minister”, “ministry”,”parliament”, “parliamentary”, “tax”, “regulation”, “legislation”, “central bank”, “imf”, “international monetary fund”, “world bank”
Uncertainty	“uncertain”, “uncertainty”, “uncertainties”, “unknown”, “unstable” “unsure”, “undetermined”, “risky”, “risk”, “not certain”, “non-reliable”, “fluctuations”, “unpredictable”

Let \( X_{it} = \frac{\text{EPU news in newspaper } i \text{ at time } t}{\text{All scraped news in newspaper } i \text{ at time } t} \) and pre-defined \(T_1\) to be the standardization and normalization period.
Calculate the standard deviation \(\sigma_i\) for each newspaper \(i\) over \(T_1\).
Standardize \(X_{it}\) by dividing by \(\sigma_i\) for all time \(t\), resulting in \( Y_{it} = \frac{X_{it}}{\sigma_i} \)
Compute the mean of \(Y_{it}\) over all newspapers in each month to obtain \( Z_t = \text{mean}(Y_{it}) \text{ at } t \)
Compute \(M\), the mean value of \(Z_t\) over the period \(T_1\)
Normalize the EPU index by multiplying \(Z_t\) by \( \left( \frac{100}{M} \right) \) for \(T_1\), resulting in the normalized EPU time-series index with a mean of 100 over \(T_1\).

Topic-based EPU#

The EPU index can also be computed for news sources related to specific policy topics. To qualify, articles need to contain at least one keyword in each of the four criteria, namely (1) Economy, (2) Uncertainty, (3) Policy, and (4) Policy Topic - a list of terms for a specific theme (labor, inflation, climate, food security). Because the sample of articles that meet this refined criteria decreases, a topic-based EPU is constructed at the quarterly time frequency. The graphs below display quarterly EPU for jobs and inflation.

Economic Policy Sentiment#

We use the EPU to filter news articles that align with the economic and policy categories for targeted sentiment analysis. The sentiment analysis uses VADER (Valence Aware Dictionary and sEntiment Reasoner), a rule-based model that handles social media and news text (Hutto and Gilbert, 2014). VADER calculates the sentiment score S based on the sum of lexical features (positive, neutral, and negative words). The final sentiment score S ranges between -1 (most negative) and +1 (most positive), with neutral scores around 0.

Consumer Price Index (CPI) and Inflation#

Once we have obtained the EPU index for each country and period, we use the result as an input to analyze and predict price movements. To do so, we obtain the International Monetary Fund (IMF) Consumer Price Index (CPI) data and apply a three-month moving average (MA3) to smooth the volatile directly measured inflation data. Subsequently, we conduct a regression analysis using variables selected through the cross-validated LASSO method, ensuring the inclusion of relevant variables while minimizing the risk of overfitting. To further prevent overfitting brought by the high-order polynomial, we limit the lag used in the analysis to a maximum of two, meaning for the next prediction, the model can only use past three months’ inflation information.

Results#

Country-Specific Models#

We use a training set of seven countries to evaluate the performance of the country-specific models. These are China, Fiji, Indonesia, Japan, Lao, Samoa, Solomon Islands, and Tonga. At the country level, Japan achieves the lowest RMSE at 0.11, indicating that the model’s predictions deviate by approximately 0.11 percentage points from the actual inflation values. Countries with the highest accuracy are Lao, Indonesia, and Samoa, achieving accuracies of 0.95, 0.88, and 0.84, respectively. Inflation volatility and the rapid alternation between deflation and inflation amongst countries reduce prediction accuracy.

Pooled Model#

The pooled model using MA3 achieves an accuracy of approximately 83.1 percent of the time and deviation around 0.83 percentage points from the actual inflation. This means that, based on historical data and the constructed EPU indexes, the models correctly predicted inflationary or deflationary trends more than four out of five times.

For out-of-sample validation of the pooled model, we use a set of three countries: Philippines, South Korea, and Vietnam. Philippines achieves a RMSE of 0.14 and an accuracy of 92.91%. South Korea achieves a RMSE of 0.15 and an accuracy of 84.25%, and Vietnam achieves a RMSE of 0.17 and an accuracy of 88.43%.

Future Work#

Future work will involve the development of a methodology that can interpolate quarterly CPI data to monthly values, bring lagged CPI data to the same time frequency as the EPU index, and generate inflation predictions on countries with no inflation data.

Table 3: IMF CPI Data Availability by Country

Country Name	ISO3	Frequency	Last Reported
American Samoa	ASM	No Data	No Data
Guam	GUM	No Data	No Data
Marshall Islands	MHL	No Data	No Data
New Zealand	NZL	Quarterly	2025-Q3
Palau	PLW	Quarterly	2025-Q2
Papua New Guinea	PNG	Quarterly	2025-Q2
Thailand	THA	Monthly	2025-M03
Tonga	TON	Monthly	2025-M01
Tuvalu	TUV	Quarterly	2012-Q2
Vanuatu	VUT	Quarterly	2023-Q4
Vietnam	VNM	Monthly	2025-M03