Undiscounted Semi-Markov Decision Processes with Countably Infinite Action Spaces
Kushal Guha Bakshi, Sagnik Sinha, Ramakant Bhardwaj, Purvee Bhardwaj, Satyendra NarayanIn this article, we study semi-Markov decision processes (SMDPs) under the limiting ratio average (undiscounted) pay-off criterion, where the state space is finite and the action space of the decision maker is possibly countably infinite. We impose no restriction on the reward function. We prove that the value of such an SMDP exists and that the decision maker possesses a near-optimal (ϵN-optimal) deterministic (pure) semi-stationary strategy, and we give a truncation-based algorithm that computes the value and such a strategy in finite time by solving a finite linear program at each truncation level. We analyze the algorithm’s computational complexity, showing that each truncation level is solved in time polynomial in the number of state–action pairs, while the number of levels required for termination depends on the prescribed accuracy and on the model. We further establish a certified, computable bound on the truncation error under bounded rewards together with standard ergodicity and tail-regularity conditions, yielding a stopping rule that guarantees a prescribed accuracy; we delimit precisely the regime in which such a certificate exists. Numerical experiments, including randomly generated instances, illustrate the algorithm and its computational behavior. Finally, under standard ergodicity conditions, we derive an optimality equation for these SMDP models.