New features in the Harmonised Index of Consumer Prices: analytical groups, scanner data and web-scraping
Published as part of the ECB Economic Bulletin, Issue 2/2019.
Harmonised indices of consumer prices (HICPs) for food, industrial goods, services and energy are measures that the ECB uses for its more detailed analysis of inflation in the euro area. With the release of the HICPs for January 2019, these analytical groups ‑ special aggregates ‑ are based on a more exact allocation of products. As a result, the distinction between goods and services and between unprocessed and processed food is now more precise. This improvement has been achieved by deriving special aggregates from the HICP’s generic classification – the “European Classification of Individual Consumption according to Purpose” (ECOICOP) – which provides a more detailed level of breakdown than the product classification used thus far. Another recent enhancement is the extended use of supermarket scanner data. “Web-scraping” – an automated approach to collecting mass data from websites – is also being more broadly applied. Overall, these changes reflect better the actual consumer price developments in the economy, especially since they increase the coverage of sales prices.
The ECB monitors and analyses inflation using the HICP grouped into unprocessed food, processed food, industrial goods, services and energy. These special aggregates often exhibit distinct properties, such as the greater volatility of the HICPs for unprocessed food and for energy. Some measures of underlying inflation are derived by excluding some of these special aggregates. In general, special aggregates are used to better analyse and understand the drivers of inflation.
Statistical offices in the European Union have introduced a further level of detail into the HICP classification by consumption purpose. At its most detailed level, ECOICOP includes around 300 sub-categories, for example “Mobile telephone equipment” (formerly the most detailed level of breakdown grouped all telephone and telefax equipment in one category). “Repair of telephone or telefax equipment” has also been grouped as a separate category. Statistical offices are providing breakdowns of their national HICPs in accordance with ECOICOP for different time spans. While France and Lithuania have back-calculated the entire time series, Ireland and Finland have only published data from 2017 onwards.
Price indices for analytical groups based on the more detailed classification of products by consumption purpose have been introduced with the publication of the euro area HICP for January 2019. Eurostat, the statistical office of the European Union, has calculated these new special aggregates back to January 2017 for the euro area and the European Union as a whole, as well as for all individual EU Member States. The old HICP special aggregates have been replaced. Up to December 2016, data for HICP special aggregates remain based on the less detailed breakdown, implying a statistical break in the respective time series. Chart A illustrates how the more detailed level of product breakdown impacts on the HICPs for unprocessed food and for industrial goods excluding energy. Apart from the split into unprocessed and processed food, the effects of the more detailed data on special aggregates are relatively minor. Nevertheless, this may have some implications for the forecasting and seasonal adjustment of HICP special aggregates.
More detailed classification on euro area HICPs for unprocessed food and for industrial goods excluding energy
(index: 2015 = 100)
Having HICPs for food, goods, services and energy derived from a more detailed classification of products by consumption purpose is an important improvement. It helps to better identify drivers of inflation, such as wage increases for services activities. Econometric modelling of inflation by analytical groups can also be expected to benefit from the more precise allocation.
With the publication of HICPs for January 2019, the use of web-scraped data has expanded further; supermarket scanner data are already used by several statistical offices. Traditionally, prices in bricks-and-mortar shops are collected by price observers, who focus on the prices of the most sold product variants and visit outlets at least once a month; for more volatile prices the visits are more frequent. While in many EU Member States price collection in shops is still central to HICP data sampling, many statistical offices have started or are intensifying the use of scanner and web-scraped data.
These new data collection methods provide considerably more price data, reflecting product variability, and they also cover a greater number of shopping days sampled within a month. In contrast to the standard survey-based price collection in bricks-and-mortar stores, index calculation using scanner data uses turnover by product bar codes (Global Trade Item Number, GTIN) or another identification code. Prices are derived by dividing the turnover of a certain product, identified by its item code, by the amount sold. Scanner data ensure that many more products are included over a longer time period. Prices derived from scanner data are closer to the average for the month, compared with point-in-time price collection.
New data collection methods require new statistical approaches. The significantly larger volume of data requires statistical offices to treat the data in an automated manner. The compilation of product-specific price indices from scanner data poses several challenges, in particular the treatment of discount prices and the greater purchase volumes triggered by discounts. Low turnover in post-discount periods implies that price indices, weighted by sales volumes, tend to be prone to a downward drift when established price index formulae are applied. In most cases, statistical offices that use scanner data currently compile drift-free indices by not incorporating index weights derived from concurrent turnover. Statistical researchers are currently developing methods to take account of turnover by means of expenditure weights while avoiding downward biases.
Relaunched products may also cause compilation issues when scanner data are used. While maintaining their essential product features, relaunched products may change their item code and sell at a higher price. Compiling price indices at the level of item codes would not capture such price increases. It is therefore necessary to develop methods that identify relaunches also when item codes have changed.
A larger range of product variants, a greater frequency of recording and higher coverage of the reporting month are the three main ways scanner and web-scraped data affect the HICP. Scanner data typically refer to a period of two to three weeks of a month. HICP flash estimates may cover less than this. Therefore, the use of scanner data may occasionally lead to higher and/or more frequent revisions to flash estimates. Overall, the larger amount of data implies that monthly price indices are more affected by the price setting of supermarkets and internet retailers. For example, weekend days, as well as the shopping days before Easter and Christmas, are covered better using these new methods.
Scanner data better indicate sharp changes in prices related to discounts. Sales prices around Christmas may have an impact, in particular when scanner data are incorporated for the first time, since the HICP formula requires chain-linking over December. Generally, with the use of scanner data, sales prices are covered more comprehensively, both across time and across products, implying that scanner data-based price indices may be significantly more volatile.