Interview with Evan Shellshear, Head Of Analytics at Biarri

Evan Shellshear is the Head Of Analytics at Biarri, a company that applies advanced mathematical approaches to solve a wide variety of business problems.

In this interview, Mr. Shellshear deconstructs how companies can leverage AI and predictive analytics to aid decision makers, while providing an overview of several technologies and best practices required in the process.

What is your background and how did you get involved with Biarri?

grew up in sunny Brisbane in Queensland, Australia and discovered my

passion for mathematics at a young age. Letting the abstract beauty

of mathematics guide me, I went on to major in mathematics at

university and then do a PhD at the Nobel prize winning, Institute of

Mathematical Economics in Bielefeld, Germany. After further

adventures abroad I returned to Australia where I am now the Head of

Analytics at Biarri, a company which develops SaaS solutions to help

companies make better decisions in all industry verticals with the

power of mathematics. I lead a team of top data scientist and

optimisation experts solving some of the world's toughest challenges.

It is a dream job where I get to take abstract and deep mathematics

and turn it into simple web based apps that people without any

technical training can easily use to solve some of their biggest pain

points.

Who is Biarri and what does the company do?

Established in 2008,

Biarri is an Australian company which operates globally. Biarri was

founded to revolutionise how businesses can access mathematicsacross their operations. Biarri operates in Australia, North

America, Europe and Africa with over 170 staff.

One of only three Australian companies ever nominated for the Franz Edelman award (aka the Nobel Prize for Industrial Mathematics) for the mathematics behind our FOND tool used by the world's largest internet providers. Biarri has spun-out or joint-ventured around 10 companies applying advanced mathematics to a variety of industries. We do all aspects of analytics from data science and predictive modelling, to end-to-end optimisation.

What is an example of an AI technology that every company can use?

Time series forecasting - it is ubiquitous. Every company has troves of valuable time series data - whether that’s demand, supply, prices, costs, or similar - and everyone wants to be able to predict the future, with some certainty. This can all be modelled via appropriate time series methods. Once you’ve mastered the time series, you will have an extremely powerful toolset in your AI toolkit.

What does this AI tool look like?

It comes in several

main flavours:

Statistical methods: these models generally decompose a time-series into its components (i.e. trend, seasonality, cyclicity) and include exponential smoothing (ETS methods) or autoregressive models (e.g. ARIMA).
Machine learning methods: this includes generalised linear models (GLMs), quartile random forests, and deep learning (e.g. RNN, LSTM, GRU).
Hybrid methods: a dynamic combination of machine learning and statistical methods.

What are the pros and cons of each of the different types?

Statistical models

are highly interpretable and can implicitly model

autocorrelation, though struggles to leverage exogenous information

and cannot predict shocks or “black swan events” if the

explicit features don’t exist.

On the other hand, machine learning models can predict more complicated time-series with complex interactions, though require much effort in data collection, pre-processing, and feature (i.e. input) engineering. Models can be developed to provide a point estimate, and/or a distribution - which can be used to derive confidence intervals. More complex models can also be prone to overfitting on the training data - where the model “learns” the training data but is unable to perform sufficiently in “the real world”. While some machine learning models are highly interpretable (i.e. GLMs), others such as deep learning are known as a “black-box” model due to the difficulty to explain the inner workings.

Hybrid models

attempt to capture the best of both statistical and machine

learning models.

What is the most important thing to pay attention to when doing time series forecasting?

Truly **understand

the problem** and environment you are solving, beyond just

the data.

For example, we had

a retail client that required sales forecasts for its products. It

wasn't until we understood the bigger picture - the product was

typically bought in conjunction with major sporting events - that we

were able to accurately predict sales. Once we had collected data on

the occurrences of major sport events and added them as additional

features, the accuracy of the model improved dramatically.

What are the Gotchas?

*Autocorrelations*: is the similarity between observations as a function of time. For instance, tomorrow’s closing stock price is highly correlated with today’s closing price. If you fail to account for autocorrelation, it can lead to a poor performing model.
*Interactions*: some of the most important, easily explainable causes of variation, can be attributed to the interaction between 2 or multiple features - rather than individual features themselves.
*Normalisation of variables*: Especially when interpreting the importance of coefficients, this can cause grief.
*Lack of domain knowledge*: Probably the most important. People often jump head first into the technical side of a problem without properly understanding the background and why it is important.

How can I do something quickly to see if it is worthwhile doing deeper analytics?

Run an ARIMA or

GLM model, this should be your baseline. These models

exist out-of-the-box in most packages (R, Python, etc.). If you want

something even more basic, you can try a naive forecast which uses

the last observed value as a forecast - or the value from the same

time last week, last month, or last year.

In Excel, there are also some basic tools, including the FORECAST.ETS function as part of its Analysis Toolpak add-on. It is not as sophisticated as out-of-the-box R and Python routines, but it is an easy start if your data is manageable in a CSV format or in an Excel sheet (and not in a database).

In Python, one can

use additional tools such as tsfresh to enhance your

forecasting model. It can easily extract features from raw

time-series data which can be used for more sophisticated ML models.

If your model is

able to predict with a reasonable level of accuracy (check things

like mean squared error), then it is probably worth digging deeper

with a more advanced ARIMA or ML model. If it doesn’t and you have

no intuitions as to what drives the values, then go back to the

drawing board :).

What do I do after the basic model?

You then need to

understand the complexity of your data and business problem, and

answer questions like:

Will you need multiple time series models for different subproblems, or does one larger/general model suffice? If you have high dimensionality in your data and require a general model, maybe you need to look at something like deep learning.
How do we automate the ingestion of the data sources into the predictive model as it goes into production? Cleaning, ETL, etc.
How good does the model need to be, to be practically useful? Does 80% suffice? 90%? Then target your efforts to hit that requirement step-by-step.
Do you provide a point estimate or will you provide a probability distribution?

How does Biarri do it differently?

We have developed a pipeline that provides us with an extremely quick path to value. In the beginning, we follow the CRISP-DM steps and first gain an understanding of the data and set it up for modelling, but the key to this is our pipeline that is able to quickly and effectively feature engineer and test thousands of model combinations to find the best model.

In addition, over

the years we have developed deep expertise in integrating exogenous

factors such as weather, sports, demographic data, etc to **enrich

existing time series data sets and get deeper insights**

from the analysis.

In particular, the main thing we do differently at Biarri is to ensure that our predictive analytics (i.e. predictions) lead to prescriptive analytics (i.e. actions). If the work simply lands on a shelf, then it doesn't provide value and it doesn't help anyone. So if needed, we'll build a web-based tool and add an optimisation engine on top of the predictions to help the decision-maker with the *‘What if?’* question: so I have a prediction, now what do I do? We help build engines that answer this key question.