Online Ad selection: Upper Confidence Bound method and Thompson Sampling method
The post is devoted to select the most popular ad to display on a webpage to gather the most clicks. The rate at which the webpage visitors click on an ad is called a conversion rate for the add.
Assume that we have several ads and a place on a webpage to show one of them. We can display them one by one, record all the clicks, analyse the results afterwards and figure the most popular. But an ad display may be pricey. It would be more efficient to estimate rates in real time and to display the most popular one as soon as rates can be compared. Especially if an ad leads to a page for a visitor to buy something. There are couple of method for such estimations: Upper Confidence Bound method and Thompson Sampling method.
The first one is based on an confidence interval concept which is studied in a Statistics course and has a good intuitive explanation. Roughly speaking a confidence interval is a numeric interval were our value is supposed to lie with some probability, usually 95%. (The real statistical definition is more technical and means not quite this, but it practice the above explanation is close enough.)During our ad displays we can compute average rates at each step with corresponding confidence intervals and pick up for next display an ad with a highest upper confidence bound You can see how it happens in the video below.
The method has some drawbacks. It does not take into account that our rates must be between 0 and 1, so initial confidence intervals usually are much greater. It means that we loose some time on getting realistic values for our intervals. The worse thing is that if we throw in an additional ad then the process takes a lot of time to recover. Here is another method which is more efficient, Thompson Sampling Method. It constructs Beta distributions for each ad rate and instead of computing averages draws a random number in accordance with the distribution. There is a picture how it goes for one ad, with a blue vertical line marking a mean and the red line for a random value:
As you see since a random value has more probability to appear were our line is higher, then it get closer and closer to the mean at each step. You might view the area where a curve appears higher than horizontal axis as a confidence interval analogue.
Here how it works for a few ads (I dropped means to make picture more clear):
In addition it accommodates an additional ad in the middle of the process more easily.
Do not hesitate to ask questions and point out mistakes!
(the post originally appeared here: Mya Bakhoava’s blog
Link: Online Ad selection: Upper Confidence Bound method and Thompson Sampling method