Cloud Resource Allocation via Multi-Agent Reinforcement Learning and Amortized Winner Determination

Muhammad Adnan, Khan and Zeshan, Iqbal and Saba, Iqbal (2026) Cloud Resource Allocation via Multi-Agent Reinforcement Learning and Amortized Winner Determination. Journal of Innovation and Technology, 2026 (04). pp. 33-39. ISSN 2805-5179

	Text joit2026_04.pdf - Published Version Available under License Creative Commons Attribution. Download (148kB)
	Text 868 - Published Version Available under License Creative Commons Attribution. Download (30kB)

Official URL: http://ipublishing.intimal.edu.my/joint.html

Abstract

Dynamic cloud resource allocation, particularly in- volving heterogeneous resource bundles (combinatorial requests), presents a significant challenge constrained by the computational intractability of the Winner Determination Problem (WDP), which is classified as NP-hard. This paper introduces a unified framework integrating Multi-Agent Deep Reinforcement Learning (MADRL) with an Amortized Winner Determination policy to achieve real-time, equitable, and cost-efficient cloud orchestration. Cloud brokers are modeled as decentralized Proximal Policy Optimization (PPO) agents learning bidding strategies, while a central Auctioneer Agent utilizes a neural network (learned WDP solver) to quickly approximate the complex combinatorial matching task. The learning process is guided by a multi- objective reward function explicitly balancing cost minimization, social welfare, and equitable resource distribution, quantified using Jain’s Fairness Index. Empirical evaluation, conducted in the CloudSim simulation environment, demonstrates significant advantages over traditional heuristic and exact solvers. The MADRL framework achieved the lowest total cost (65.57) and dramatically superior fairness (Jain’s Index 0.929) compared to static baselines. Furthermore, the amortized solver maintained high social welfare (averaging 1285) near the theoretical maximum of Integer Linear Programming (ILP) (averaging 1310), but with a computational runtime (40–150 milliseconds) that is orders of magnitude faster, enabling the system to operate effectively in dynamic, near real-time cloud marketplaces. This integration validates amortized combinatorial optimization as a promising pathway to scalable, autonomous, and economically sound resource management.

Item Type:	Article
Uncontrolled Keywords:	Cloud Computing, Resource Allocation, Multi-Agent Reinforcement Learning, Combinatorial Auctions, Winner Determination Problem
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software T Technology > T Technology (General)
Depositing User:	Unnamed user with email masilah.mansor@newinti.edu.my
Date Deposited:	12 Mar 2026 03:04
Last Modified:	12 Mar 2026 03:04
URI:	http://eprints.intimal.edu.my/id/eprint/2314

Actions (login required)

View Item