Summary: | There is an urgent need to support high data rate among connected vehicles for both safety and infotainment purposes. However, high rate for vehicular communications is impacted by several challenges such as optimal beam tracking for multiple simultaneous-transmitting vehicles in highly-dynamic environments. In this paper, we aim to maximize the overall network throughput for multi-vehicular communications. We propose a reinforcement learning (RL) approach called combinatorial multi-armed bandit (CMAB) framework, where multiple arms (i.e., actions) form a super arm and can be played together, to handle the beam selection problem in a vehicular network. More specifically, we propose an adaptive combinatorial Thompson sampling algorithm, namely adaptive CTS, that CMAB embodies for appropriate selection of simultaneous beams in a high-mobility vehicular environment. The proposed approach applies the smart exploration-exploitation trade-off for the fast selection of beams. As the proposed adaptive CTS scheme produces a higher complexity over the search space which increases exponentially with the number of user, we also propose a sequential Thompson sampling (TS) algorithm where beam selection is performed in a one-by-one manner. We analyze the regret bounds for the proposed beam tracking algorithms. The performance of the proposed strategies are evaluated for multi-vehicle millimeter-wave (mmWave) simulation environments. Our results suggest that the proposed sequential approach performs almost similar to the simultaneous adaptive CTS scheme for tracking optimal beams in a multi-vehicular network with much reduced complexity. Simulation results also show that both of our proposed strategies approach the optimal achievable rate achieved by the genie-aided solution.
|