PyData Vermont 2024

Backtesting Time Series Forecasting Algorithms in SKTime and SKForecast
07-29, 11:15–11:45 (US/Eastern), Filmhouse

Backtesting refers to the process of evaluating a time series forecasting algorithm on historical data by replicating the corresponding real-world scenario. Concurrently, parameters such as the model updating and retraining frequencies are also tuned based on the usecase and relevant computational constraints.

In this talk, we will review the backtesting of time series algorithms using sktime and skforecast. The two open-source machine learning libraries are popular options for developing and deploying forecasting models. Specifically, the following aspects will be covered.

• Comparing and contrasting backtesting-related features of the two libraries

• An overview of the different types of cross-validation schemes for time series forecasting, including expanding and fixed windows

• Model update and retraining for both direct and recursive multistep forecasts

The talk is geared toward data scientists that want to systematically evaluate time series forecasting models in varied settings. In addition to gaining an overview of the various aspects, the audience will also learn about the implementation options supported by the two libraries. No prior knowledge about machine learning algorithms for forecasting is needed to attend the talk.


Forecasting algorithms, which predict the future value of a time series based on historical patterns, are often deployed to make rolling forecasts over multiple time steps. They also need to be re-trained often to "keep-up" with the recent observations. Deploying such machine learning models and pipelines raises several questions.

  1. What is the forecasting horizon?

  2. How often should a rolling forecast be made?

  3. How often should we re-train the model?

  4. How much historical data should we retain after every rolling forecast?

These questions must be answered in the context of the usecase’s requirements and relevant computational constraints in deploying the solution. For example, the retraining frequency must be high enough to ensure that the model’s predictions remain accurate but respect any limits on computational resources and costs.

Backtesting is the process of systematically evaluating the forecasting algorithm’s performance, and concomitantly tuning the parameters associated with the aforementioned issues. Establishing a backtesting framework that faithfully replicates the real-word scenario is critical for accurate benchmarking and deployment of the forecasting solution.

In the talk, we will explore backtesting using sktime and skforecast, two popular open-source libraries for developing machine learning algorithms for temporal data. The use of the evaluate() function in sktime will be discussed in detail. We will focus on combining the backtesting utility along with different cross-validation methods for temporal data and model update/retraining strategies. In skforecast, the backtesting_forecaster() function provides the corresponding functionalities. It can be combined with different windows for refitting. We will compare and contrast the two options for both direct and recursive multistep forecasting of an energy management usecase.

Abhishek Murthy is currently a Senior ML/AI Architect at Schneider Electric (SE) in Boston, Massachusetts USA. He is passionate about sustainability, with a focus on climate change. To that end, he develops Machine Learning (ML) algorithms on sensor data that are critical for the sustainability commitments of the Industrial Automation and Energy Management businesses of SE. He is also a lecturer at Northeastern University and teaches machine learning algorithms for the Internet of Things.

Abhishek received his PhD in Computer Science from Stony Brook University, State University of New York and MS in Computer Science from University at Buffalo. His doctoral research, which was part of a National Science Foundation Expedition in Computing, entailed developing algorithms for automatically establishing the input-to-output stability of dynamical systems.

He led the Data Science Algorithms team at WHOOP before joining SE. He also worked at Signify, formerly called Philips Lighting, as a Senior Data Scientist and led research on IoT applications for smart buildings. Abhishek has served on several conference review committees and NSF panels. His research includes several publications and research articles with more than 190 citations. He has been awarded 15 patents and has more than 50 applications pending.