We consider the dynamic assortment optimization problem under the multinomial logit model with unknown utility parameters. The main question investigated in this paper is model mis-specification under the ε-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length T, we assume that customers make purchases according to a well-specified underlying multinomial logit choice model in a (1−ε)">ction of the time periods and make arbitrary purchasing decisions instead in the remaining ε-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active-elimination strategy. We establish both upper and lower bounds on the regret, and we show that our policy is optimal up to a logarithmic factor in T when the assortment capacity is constant. We further develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter ε. In the case of the existence of a suboptimality gap between optimal and suboptimal products, we also established gap-dependent logarithmic regret upper bounds and lower bounds in both the known-ε and unknown-ε cases. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds and Thompson sampling.
Opens in a new tab