A Multi-Armed Bandit Framework for Recommendations at Netflix
World Class CX
Når: Mandag kl. 12:00 -
Presenting a general multi-armed bandit framework for recommending titles to our 117M+ members on the Netflix homepage. A key aspect of our framework is closed loop attribution to link how our members respond to a recommendation. Our framework performs frequent updates of policies using user feedback collected from a past time interval window.
We will take deeper look by focusing on two example policies: a greedy exploit policy which maximize the probability of a user playing a title and an incrementality-based policy. The latter is a novel online learning approach that takes the causal effect of a recommendation into account. An incrementality-based policy recommends titles that brings about the maximum increase in a specific quantity of interest, such as engagement. This helps discount the effect of recommendations when a user would have played anyway. We describe offline experiments and online A/B test results for both of these example policies.