Earlier this week I tweeted the below as part of another conversation

My hypothesis with @ForecastForge is that people can make better forecasts using a fairly simple algorithm + their own domain knowledge rather than getting someone else who can make a fancy algorithm but doesn't know the domain

— Richard Fergie (@RichardFergie) March 2, 2021

I thought this would be a good opportunity to unpack my hypotheses about why Forecast Forge is a good/interesting idea. And also a good time to do a quick business update since it has been just over six months since launch.

Read more

One of the most requested feature since Forecast Forge launched has been the ability to forecast from monthly data and not just daily data.

Daily data is really easy to export from Google Analytics and other tools but at a more strategic level no one cares very much about what performance on 14th July looks like as long as July as a whole is doing OK. When planning for the year or months ahead it is much more normal to set a monthly target rather than look at things day by day.

My personal opinion is that working with daily data gives will give better results when you start adding helper columns - particularly helper columns that have an effect on a particular day (this is most of them in my experience). But it is also tedious to make a quick daily forecast and then have to aggregate the results to a monthly level before presenting them to a client or boss and I want to remove as much tedium as possible for my users.

Read more

Earlier today I was enjoying Andrew Charlton’s Probabilistic Thinking in SEO post when I came across something that puzzled me.

Andrew is describing how to combine a probability range (e.g. “we think this will be somewhere between 5% and 60%”) into a single probability. He uses the geometric mean for this:

You can convert your bounds into a point estimate. Taking ‘the average’ will likely be well off, but you can use something called the geometric mean instead. The geometric mean is better suited to data with outliers or extreme variance.

I couldn’t figure out why it was best to use the geometric mean rather than the arithmetic (normal) mean. So I pinged him a direct message to ask.

Andrew kindly explained to me that the geometric mean is useful when you are more interested in ending up in the middle of something that spans several orders of magnitude. For example the arithmetic mean of 1 and 1 million is close enough to 500,000 which is one order of magnitude smaller than 1 million but five orders of magnitude bigger than 1. So in one sense this average is “closer” to 1 million than it is to one.

The geometric mean of these two numbers is `sqrt(1*1000000)=1000`

which is in the middle in terms of orders of magnitude. And since Andrew is using these numbers to make a Fermi estimate where he aims to be within an order of magnitude of the true value this approach makes perfect sense.

This tweet is where it finally clicked for me:

I see the order of magnitude estimate as “really” an estimate of the log, so GM of estimates makes sense as it corresponds to AM of logs. Still trying to justify the AGM though :-)

— Robert Low (@RobJLow) November 13, 2017

But this got me thinking about probabilities and how we average them. Orders of magnitude are a bit different for probabilities because all probabilities must be between zero and one. It is very similar at the low end where 0.1, 0.01, 0.001 and 0.0001 are all different orders of magnitude but the same is true at the high end too when you look at 0.99, 0.999 and 0.9999.

Consider a service with 99% uptime; it can be down for nearly four days per year.

With 99.9% there must be less than 9 hours downtime per year.

99.99% requires less than 1 hour and a 99.999% SLA requires a maximum of slightly more than 5 minutes of downtime per year.

Normally when dealing with probabilities this kind of problem is dealt with by using a logit transform. We can use this to calculate the “logistic mean” in the same way that a log transform gives us the geometric mean.

The calculation looks like this:

- Calculate the logit of each probability
- Find the arithmetic mean of all these
- Take the inverse logit of this.

Here is the “logistic mean” of 0.9 and 0.999 calculated on Wolfram Alpha. The answer is 0.9896 which is almost exactly 0.99; the middle “order of magnitude” between the inputs.

I haven’t seen this written about anywhere else; probably because I’m not putting the right words for it into Google. But just in case this is something new I thought I’d document it here.

[Andrew has a cool looking SEO Forecasting course that you should have a look at]

Read more

Last month was an unusually social one for me with two virtual events.

First was a long term favourite of mine; MeasureCamp. And then a new one (for me); Bhav Patel’s CRAP Talks.

This was my first time attending a virtual MeasureCamp although I knew a bit of what to expect because I had been involved in a test event in lockdown number one.

I presented the following slides to introduce a discussion on “MLUX”; the user experience of working with machine learning.

In summary:

- I am not talking about the
*consumer*UX of products such as newsfeeds, Spotify/Netflix recommendations etc. - Instead, I am interested in analysts and practitioners as users of machine learning.
- At the moment this splits into two very different groups of users. There are people who can program custom loss functions in Tensorflow and train new algorithms on their own data, and there are those who try to use the results of these algorithms without being able to change them.
- I don’t like this. I think it alienates people in the latter group from their work (often for no good reason) and more and more people are ending up on the wrong side of the divide.
- With new developments like mega-large neural networks it is possible that only a small handful of people will actually understand what is going on with these models and they will all work for three or four companies in Silicon Valley.
- As far as I know there isn’t a perfect solution that fixes all of this at once. In some cases there will be a business imperative to use a third party algorithm that you don’t understand simply because it performs better than any of the alternatives.
- But
**I hope**that there is a gap where more specific domain knowledge of the problem combined with a bit of algorithmic help can lead to a better solution. This is where Forecast Forge sits and, based on what I’ve seen so far I am optimistic!

The discussion that followed didn’t really talk about forecasting at all (which is fine - this is a bigger problem than one area). People expressed frustration with things like product recommendations that lag six months behind the current fashions and the difficulty of comparing the uplift from different ways of doing things.

From one angle it is easy to say that the problem lies in a lack of skills/training for the relevant people. But this will never be a complete solution because there are so few places where one can learn about the development and training of world class models.

In my (biased - look where you are reading this!) opinion improvements in the tooling is more likely to be the best way forward. Libraries like Keras or Huggingface’s transformers allow someone with my level of skill to get started in this area - I would not be able to code or train stuff like this from scratch. And I think similar tooling can and should exist for other people too.

In this case CRAP stands for Conversion Rate, Analytics and Product. I have been aware of it as a offline meetup for a few years now but because it is mainly in London and I don’t have a focus on conversion rate optimisation or product analytics I have never made the effort to attend. This is my loss as I thought the virtual event was excellent.

The main part of the event was a presentation from Facebook’s marketing science team about Robyn which is a new open source project they have launched to help with Marketing Mix Modelling.

I have done a small amount of MMM in the past and I wish Robyn had been available then. I was learning as I went along without the support of someone with more experience so I spent a lot of time thinking and worrying about the details of what I was doing and whether or not it was the right approach. A page like this would have saved me so much stress even if didn’t also come with code that does all the calculations for you.

Following this presentation there was a discussion on forecasting which Bhav was kind enough to invite me to be a part of. I think he was hoping for some juicy arguments but (boringly!) I mostly agreed with everyone else. A quick summary of some of the points covered:

- Pick an error metric that matches your business needs and then work to find a forecasting model that minimises it. [I also believe that you should look for a model that is
*well calibrated*in that close to 80% of the data lies within the 80% interval, 90% within the 90% interval etc. But I don’t have strong opinions on how to balance this with the error metric. What to do if one forecast has worse errors but better calibrations?] - Once you have a model you should stick with it for a bit - how long exactly will depend on your business tempo. You can update the forecasts as frequently as you like but the underlying model should stay the same.

I think I went a bit off piste talking about how your forecast is your “model of the world” but people seemed very tolerant of my ramblings ;-)

I look forward to seeing what else comes out of CRAP Talks in the months and years to come.

Read more

Christmas 2020 was on a Friday which meant that Boxing Day was on a Saturday. In the UK Boxing Day is a public holiday but because it fell on a weekend the following Monday (the 28th December) was a “substitute day” public holiday.

The database of country holidays used by Forecast Forge treats these two things as different holidays. i.e. in 2020 there was a Boxing Day holiday *and* an extra substitute day Boxing Day on the 28th.

The last year with a substitute Boxing Day was 2015 so unless the historical data used in the forecast includes Christmas 2015 then the Forecast Forge algorithm will not have any relevant training data to make a prediction for the holiday effect in 2020.

I tweeted this observation last week and Peter O’Neill replied with this:

Wait - are you saying that machine learning models will all just break on unexpected days?

— Peter O'Neill (@peter_oneill) December 28, 2020

Which I think is a very interesting question.

The algorithm is behaving exactly as it is designed to so in one (important!) respect it is not broken at all. But on the other hand if it isn’t doing what the user expects then even if it isn’t fair to describe the procedure as “broken” it is certainly far from ideal.

And this gets to the heart of why I think it is an interesting question; it is about how users interact with machine learning systems and what expectations are reasonable for such an interaction. And this is exactly the area where the user interface for Forecast Forge sits!

Let’s talk specifically about the substitute day bank holiday problem even though there must be about a million other similar issues.

Off the top of my head I can think of six different ways of modelling this:

- Ignore it (i.e. treat the 26th as the bank holiday and the substitute day as a normal day)
- Move the entire holiday effect over onto the substitute day
- Apply the same holiday effect to both days (i.e. having a substitute day holiday gives you much more uplift than not)
- Split the holiday effect evenly between the two days
- Split the holiday effect unevenly between the two days. This sounds like a great solution, especially if you can use machine learning to estimate the optimal split. But in practice you will need data from previous substitute bank holidays to do this.
- Treat the substitute holiday as a separate holiday

It isn’t obvious to me that any one of these is always going to be better than all the others.

For a bricks and mortar store I can imagine that, in a non-Covid year, they would get a lot more footfall with a substitute day than in a normal year. For an online brand I can see it not making very much difference - particularly for Boxing Day which lies in the dead period between Christmas and New Year.

The good news is that Forecast Forge allows you to use any of these options by setting up appropriate helper columns. The bad news is that it doesn’t communicate at all about which of the models you are using with the default “country holidays” model.

It is a huge challenge for me to build a user interface that knows when a modelling choice has other options that the user might want to explore. People don’t use Forecast Forge because they want to have to think about this for **every** possible modelling decision; people like this need to code their own models. But, just as importantly, people do use Forecast Forge because they want a bit more control and customisation rather than a forecasting service where they get the result and can’t do anything about what it says even if it is obviously wrong.

And not knowing what modelling decisions have been made behind the scenes and what other options are possible makes this much harder than I would like for users.

As Peter says:

My argument has always been for a tool that can learn from past behaviour but allow for a human to adjust levers based on known upcoming events using a human estimated weighting

— Peter O'Neill (@peter_oneill) December 28, 2020

And this would be the holy grail for human/machine learning interaction. It already exists for those who are confident with both coding and converting whatever levers a business might want to pull into mathematics. But these people are a rare (and expensive!) breed; I wouldn’t class myself as top tier at either discipline.

I have made design decisions with Forecast Forge that do restrict the levers that are available for pulling. My hope is that I have left the most important levers and that by building it into a spreadsheet all my users have more flexibility in what they can do with it than in other tools.

But it is still short of what I wish it could be. Is there a way to hide the unnecessary complexity whilst making all the necessary complexity visible?

Read more