Using information in newspaper articles as an indicator of real economic activity
Published as part of the ECB Economic Bulletin, Issue 2/2020.
Text analysis methods have been used extensively in the economic literature to measure macroeconomic risk and uncertainty. However, there is limited evidence regarding the amount of information on real economic activity that can be extracted from such indices. This box presents an indicator for real economic activity in the United States based on the textual analysis of newspaper articles. The indicator is constructed using data from the Factiva database, which collects all articles published by major newspapers for a large set of countries. Newspaper articles published in the United States are extracted from the database and used to construct a text-based activity indicator. For each day since January 1970, the indicator measures the number of articles that discuss a slowdown (or recession) in the US economy relative to the total number of articles published in the United States. Intuitively, the constructed index should co-move with the business cycle as newspapers devote more space to the subject of an economic slowdown. Moreover, the indicator should react faster to developments in the economic cycle that take time to become visible in aggregate macro variables and are often published with a lag. Finally, this index can be updated easily at a high frequency (on a daily basis) and can be applied to a large number of advanced and emerging market economies.
The constructed index can be used as a real-time tracker of US real economic activity. Chart A shows that the indicator correlates well with periods of economic slowdown in the United States when these are measured in terms of declines in industrial production or the recession dates established by the National Bureau of Economic Research (NBER). This correlation suggests that the text-based index can be used as a real-time indicator to track economic developments at a high frequency, as it contains relevant information on the business cycle.
Text-based index and US recessions
The index also has predictive content for future economic activity. This assumption can be formally tested by adding the text-based indicator to a standard recession probability model. The following equation is estimated:
where the probability of a recession at the future horizon ( ) is forecast by the slope of the yield curve at the present horizon (the difference between short-term and long-term yields), which is a standard predictor of recession, and the text-based indicator. The index provides additional information on the slope of the US yield curve. The goodness of fit of recession probability models is summarised by the so-called receiver operator curve (ROC), which can be seen as a measure of the accuracy of the predictions made using the model. The ROC statistic is reported in Chart B and shows that the specification applying the newspaper article-based index is superior to the simple yield curve at short horizons. The inclusion of newspaper article data in the estimation significantly improves the performance of the model. This assessment is robust to a definition of recession other than that used by the NBER (whereby a recession is defined as eight consecutive months of contraction in industrial production) and to exclusion of the global financial crisis period.
The evidence presented in this box shows that information extracted from newspaper articles is useful for monitoring economic developments and complements macroeconomic data. Newspaper articles collect a large set of information on the business cycle that does not appear immediately in macroeconomic time series. The fact that this type of text-based indicator is available and can be updated on a daily basis makes it useful and relevant for monitoring and predicting economic developments, particularly at short horizons.
Goodness of fit statistics for the recession probability models at different month ahead forecast horizons
- Recent examples include Caldara, D. and Iacoviello, M., “Measuring Geopolitical Risk” International Finance Discussion Papers, No 1222, Board of Governors of the Federal Reserve System (United States), 2018; Baker, R.S., Bloom, N. and Davis, S.J., “Measuring Economic Policy Uncertainty”, The Quarterly Journal of Economics, Vol. 131(4), Oxford University Press, 2016, pp. 1593-1636; for a recent text analysis approach applied to the euro area, see Azqueta-Gavaldón, A., Hirschbühl, D., Onorante, L. and Saiz, L., “Sources of economic policy uncertainty in the euro area: a machine learning approach”, Economic Bulletin, Issue 5, ECB, Frankfurt am Main, November 2019.
- Using the same methodology as that adopted by Caldara, D. and Iacoviello, M., “Measuring Geopolitical Risk”, International Finance Discussion Papers, No 1222, Board of Governors of the Federal Reserve System (United States), 2018.
- Based on Wright, J.H., “The yield curve and predicting recessions’’, Finance and Economics Discussion Series, 2006-07, Divisions of Research and Statistics and Monetary Affairs, Federal Reserve Board, Washington D.C., February 2006.
- Yield-curve models have been revised recently in the context of the asset purchase programmes of major central banks. See the box entitled “US yield curve inversion and financial market signals of recession”, Economic Bulletin, Issue 1, ECB, Frankfurt am Main, 2020.
- The ROC compares the true positive, i.e. the assessment of a recession when there is really a recession, against false positives, i.e. the assessment of a recession when there is not a recession. The closer the estimated ROC statistic is to the vertical axis, the higher the predictive power of the model. Additionally, it is possible to summarise the ROC graph by computing the area that is below the ROC curve but above the 45 degree line (which implies random assignments). The larger the area below the curve, the more accurate the model is.