МЕТОДОЛОГІЧНА ЕВОЛЮЦІЯ ТА АРХІТЕКТУРНІ ПАРАДИГМИ СУЧАСНОГО  БАГАТОАГЕНТНОГО НАВЧАННЯ З ПІДКРІПЛЕННЯМ

Н.Н. Шаповалова; І.О. Доценко; А.М. Стрюк

doi:10.31319/2519-2884.48.2026.15

Автор(и)

Н.Н. Шаповалова Криворізький національний університет, м. Кривий Ріг, Україна https://orcid.org/0000-0001-9146-1205
І.О. Доценко Криворізький національний університет, м. Кривий Ріг, Україна https://orcid.org/0000-0001-7912-2497
А.М. Стрюк Криворізький національний університет, м. Кривий Ріг, Україна https://orcid.org/0000-0001-9240-1976

DOI:

https://doi.org/10.31319/2519-2884.48.2026.15

Ключові слова:

багатоагентне навчання з підкріпленням, послідовне моделювання, гетерогенність агентів, колаборативна мовна модель, методологічна відтворюваність

Анотація

У роботі здійснено систематизований аналіз сучасного стану багатоагентного навчання з підкріпленням (2024—2025 р.р.), із фокусом на поєднанні класичних алгоритмічних підходів і трансформерних архітектур та великих мовних моделей. Розглянуто теоретичну еволюцію від Марковських ігор до децентралізованих частково спостережуваних процесів прийняття рішень, порівняно ефективність підходів CTDE, Multi-Agent Transformer і HetGPPO, а також проаналізовано інтеграцію з LLM через алгоритм MAGRPO. Окрему увагу приділено проблемі відтворюваності результатів і запропоновано перехід до імовірнісного оцінювання та нових бенчмарків відкритого світу. Наукова новизна роботи полягає у розробці та систематизації інтегрованого підходу до вирішення проблеми «конфлікту семантики дій» у гетерогенних середовищах шляхом поєднання методів послідовного авторегресійного моделювання з алгоритмами групової відносної оптимізації великих мовних моделей. Вперше комплексно доведено перевагу децентралізованих графових та трансформерних архітектур над класичною парадигмою CTDE у системах відкритого світу, що статистично підтверджено застосуванням імовірнісних профілів продуктивності.

Посилання

Albrecht S. V., Christianos F., Schäfer L. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. Кембридж : MIT Press, 2024. 650 с.

Wen M., Kuba J. G., Lin R., et al. Multi-Agent Reinforcement Learning is a Sequence Modeling Problem // Advances in Neural Information Processing Systems (NeurIPS 2022). 2022.

Guo D., Yang D., Zhang H., et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Electronic resource] // arXiv preprint. 2025. Режим доступу: https://arxiv.org/abs/2501.12948. doi: 10.48550/arXiv.2501.12948.

Yuan L., et al. MAGRPO: LLM Collaboration with Multi-Agent Reinforcement Learning [Elec-tronic resource] // arXiv preprint arXiv:2508.04652. 2025. Режим доступу: https://arxiv.org/abs/2508.04652.

Bettini M., Shankar A., Prorok A. Heterogeneous Multi-Robot Reinforcement Learning // Pro-ceedings of the 22nd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023). 2023.

Zhang K., Yang Z., Başar T. Convergence of Actor-Critic Algorithms in Multi-Agent RL // Ad-vances in Neural Information Processing Systems (NeurIPS 2025). 2025.

Da Silva F. L., et al. Context-Aware Multi-Agent Systems: A Survey [Electronic resource] // arXiv preprint. 2024. Режим доступу: https://arxiv.org.

Dansereau C., et al. The Heterogeneous Multi-Agent Challenge (HeMAC) // Proceedings of the 28th European Conference on Artificial Intelligence (ECAI 2025). 2025.

Al Omari B., Matthews M., Rutherford A., Foerster J. N. Multi-Agent Craftax: Benchmarking Open-Ended MARL [Electronic resource] // arXiv preprint arXiv:2511.04904. 2025. Режим доступу: https://arxiv.org/abs/2511.04904. doi: 10.48550/arXiv.2511.04904.

Yu C., et al. The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games // Advanc-es in Neural Information Processing Systems (NeurIPS 2022). 2022.

Rashid T., et al. QMIX: Monotonic Value Function Factorisation // Proceedings of the 35th In-ternational Conference on Machine Learning (ICML 2018). 2018.

Papoudakis G., et al. Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms // Advances in Neural Information Processing Systems (NeurIPS 2021). 2021.

Kim W., et al. Agent Order of Action Decisions-MAT (AOAD-MAT) [Electronic resource] // arXiv preprint. 2025. Режим доступу: https://arxiv.org.

Wen M., et al. Multi-Agent Transformer // Advances in Neural Information Processing Systems (NeurIPS 2022). 2022.

Zhong Y., et al. Heterogeneity in MARL: Taxonomy and Quantification [Electronic resource] // arXiv preprint. 2025. Режим доступу: https://arxiv.org.

Bettini M., et al. BenchMARL: A Benchmark for MARL // Journal of Machine Learning Re-search (JMLR). 2024.

Zhang W., et al. Strategic LLM Decoding through Bayesian Games // ICLR 2025 Workshop. 2025.

Park C. MAPoRL: Multi-Agent Post-Co-Training for Collaborative LLMs // ICML 2025 Work-shop. 2025.

Wan Z., et al. ReMA: Learning to Meta-think for LLMs with MARL // Advances in Neural In-formation Processing Systems (NeurIPS 2025). 2025.

Lin Y. C., et al. Creativity in LLM-based Multi-Agent Systems: A Survey // Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025). 2025.

Gorsane R., et al. Towards a Standardised Performance Evaluation Protocol for Cooperative MARL // Advances in Neural Information Processing Systems (NeurIPS 2022). 2022.

Formanek C., et al. Off-the-Grid MARL: A Framework for Dataset Generation [Electronic re-source] // arXiv preprint. 2024. Режим доступу: https://arxiv.org.

Patterson M., et al. RLiable: Tools for rigorous evaluation [Electronic resource]. GitHub reposi-tory, 2025. Режим доступу: https://github.com/google-research/rliable.

InstaDeep. MARL-eval: Standardised experiment data aggregation [Electronic resource]. GitHub repository, 2024.

Albrecht, S. V., Christianos, F., & Schäfer, L. (2024). Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. Cambridge, MA: MIT Press. ISBN 978-0262049375.

Wen, M., Kuba, J. G., Lin, R., et al. (2022). Multi-Agent Reinforcement Learning is a Sequence Modeling Problem // Advances in Neural Information Processing Systems (NeurIPS 2022). 2022.

Guo, D., Yang, D., Zhang, H., et al. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning [Electronic resource] // arXiv preprint arXiv:2501.12948. Mode of access: https://arxiv.org/abs/2501.12948. doi:10.48550/arXiv.2501.12948.

Yuan, L., et al. (2025). MAGRPO: LLM Collaboration with Multi-Agent Reinforcement Learn-ing [Electronic resource] // arXiv preprint arXiv:2508.04652. Mode of access: https://arxiv.org/abs/2508.04652.

Bettini, M., Shankar, A., & Prorok, A. (2023). Heterogeneous Multi-Robot Reinforcement Learning // Proceedings of the 22nd International Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2023).

Zhang, K., Yang, Z., & Başar, T. (2025). Convergence of Actor-Critic Algorithms in Multi-Agent RL // Advances in Neural Information Processing Systems (NeurIPS 2025).

Da Silva, F. L., et al. (2024). Context-Aware Multi-Agent Systems: A Survey [Electronic re-source] // arXiv preprint arXiv preprint. Mode of access: https://arxiv.org.

Dansereau, C., et al. (2025). The Heterogeneous Multi-Agent Challenge (HeMAC) // Proceed-ings of the 28th European Conference on Artificial Intelligence (ECAI 2025).

Al Omari, B., et al. (2025). Multi-Agent Craftax: Benchmarking Open-Ended MARL [Electronic resource] // arXiv preprint arXiv:2511.04904. Mode of access: https://arxiv.org/abs/2511.04904.

Yu, C., et al. (2022). The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games // Advances in Neural Information Processing Systems (NeurIPS 2022).

Rashid, T., et al. (2018). QMIX: Monotonic Value Function Factorisation // Proceedings of the 35th International Conference on Machine Learning (ICML 2018).

Papoudakis, G., et al. (2021). Benchmarking Multi-Agent Deep Reinforcement Learning Algo-rithms // Advances in Neural Information Processing Systems (NeurIPS 2021).

Kim, W., et al. (2025). Agent Order of Action Decisions-MAT (AOAD-MAT) [Electronic re-source] // arXiv preprint. Mode of access: https://arxiv.org.

Wen, M., et al. (2022). Multi-Agent Transformer // Advances in Neural Information Processing Systems (NeurIPS 2022).

Zhong, Y., et al. (2025). Heterogeneity in MARL: Taxonomy and Quantification [Electronic resource] // arXiv preprint. Mode of access: https://arxiv.org.

Bettini, M., et al. (2024). BenchMARL: A Benchmark for MARL // Journal of Machine Learn-ing Research (JMLR).

Zhang, W., et al. (2025). Strategic LLM Decoding through Bayesian Games // ICLR 2025 Work-shop.

Park, C. (2025). MAPoRL: Multi-Agent Post-Co-Training for Collaborative LLMs // ICML 2025 Workshop.

Wan, Z., et al. (2025). ReMA: Learning to Meta-think for LLMs with MARL // Advances in Neural Information Processing Systems (NeurIPS 2025).

Lin, Y. C., et al. (2025). Creativity in LLM-based Multi-Agent Systems: A Survey // Proceed-ings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025).

Gorsane, R., et al. (2022). Towards a Standardised Performance Evaluation Protocol for Cooper-ative MARL // Advances in Neural Information Processing Systems (NeurIPS 2022).

Formanek, C., et al. (2024). Off-the-Grid MARL: A Framework for Dataset Generation [Elec-tronic resource] // arXiv preprint. Mode of access: https://arxiv.org.

Patterson, M., et al. (2025). RLiable: Tools for rigorous evaluation [Electronic resource]. GitHub repository: https://github.com/google-research/rliable.

InstaDeep. (2024). MARL-eval: Standardised experiment data aggregation [Electronic resource]. GitHub repository.

МЕТОДОЛОГІЧНА ЕВОЛЮЦІЯ ТА АРХІТЕКТУРНІ ПАРАДИГМИ СУЧАСНОГО БАГАТОАГЕНТНОГО НАВЧАННЯ З ПІДКРІПЛЕННЯМ

Автор(и)

DOI:

Ключові слова:

Анотація

Посилання

##submission.downloads##

Опубліковано

Номер

Розділ

Інформація

##plugins.block.developedBy.blockTitle##

Мова