Learning first-order probabilistic models with combining rules
Sriraam Natarajan
Backwards citation:
Pieter Abbeel , Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning, Proceedings of the twenty-first international conference on Machine learning, p.1, July 04-08, 2004, Banff, Alberta, Canada [doi>10.1145/1015330.1015430]
Altman, E. (1999). Constrained markov decision processes. Chapman and Hall. First edition.
Craig Boutilier, A POMDP formulation of preference elicitation problems, Eighteenth national conference on Artificial intelligence, p.239-246, July 28-August 01, 2002, Edmonton, Alberta, Canada
Urszula Chajewska , Daphne Koller , Dirk Ormoneit, Learning an Agent’s Utility Function by Observing Behavior, Proceedings of the Eighteenth International Conference on Machine Learning, p.35-42, June 28-July 01, 2001
Eugene A. Feinberg , Adam Shwartz, Constrained Markov decision models with weighted discounted rewards, Mathematics of Operations Research, v.20 n.2, p.302-320, May 1995 [doi>10.1287/moor.20.2.302]
Zoltán Gábor , Zsolt Kalmár , Csaba Szepesvári, Multi-criteria Reinforcement Learning, Proceedings of the Fifteenth International Conference on Machine Learning, p.197-205, July 24-27, 1998
Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored MDPs. In Proceedings NIPS-01.
Leslie Pack Kaelbling , Michael L. Littman , Anthony R. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, v.101 n.1-2, p.99-134, May 1998 [doi>10.1016/S0004-3702(98)00023-X]
James F. Kurose , Keith Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2002
Yi Lu , Weichao Wang , Yuhui Zhong , Bharat Bhargava, Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, p.187, March 23-26, 2003
Sridhar Mahadevan, Average reward reinforcement learning: foundations, algorithms, and empirical results, Machine Learning, v.22 n.1-3, p.159-195, Jan./Feb./March 1996 [doi>10.1007/BF00114727]
Shie Mannor , Nahum Shimkin, A Geometric Approach to Multi-Criterion Reinforcement Learning, The Journal of Machine Learning Research, 5, p.325-360, 12/1/2004
Andrew Y. Ng , Stuart J. Russell, Algorithms for Inverse Reinforcement Learning, Proceedings of the Seventeenth International Conference on Machine Learning, p.663-670, June 29-July 02, 2000
Ronald Parr, Flexible decomposition algorithms for weakly coupled Markov decision problems, Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, p.422-430, July 24-26, 1998, Madison, Wisconsin
Puterman, M. L. (1994). Markov decision processes. J. Wiley and Sons.
Russell, S., & Zimdars, A. L. (2003). Q-decomposition for reinforcement learning agents. In Proceedings of ICML-03.
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of ICML-93.
Peter Stone, TPOT-RL Applied to Network Routing, Proceedings of the Seventeenth International Conference on Machine Learning, p.935-942, June 29-July 02, 2000
Prasad Tadepalli , DoKyeong Ok, Model-based average reward reinforcement learning, Artificial Intelligence, v.100 n.1-2, p.177-224, April 1998 [doi>10.1016/S0004-3702(98)00002-2]
Nigel Tao , Jonathan Baxter , Lex Weaver, A Multi-Agent Policy-Gradient Approach to Network Routing, Proceedings of the Eighteenth International Conference on Machine Learning, p.553-560, June 28-July 01, 2001
White, D. (1982). Multi-objecticve infinite-horizon discounted markov decision processes. Journal of Mathematical Analysis and Applications, 89, 639–647.
Forwards Citation:
Peter Vamplew , John Yearwood , Richard Dazeley , Adam Berry, On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts, Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, December 01-05, 2008, Auckland, New Zealand
Peter Vamplew , Richard Dazeley , Ewan Barker , Andrei Kelarev, Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks, Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence, December 01-04, 2009, Melbourne, Australia
Kazuyuki Hiraoka , Manabu Yoshida , Taketoshi Mishima, Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin, Neural Information Processing: 14th International Conference, ICONIP 2007, Kitakyushu, Japan, November 13-16, 2007, Revised Selected Papers, Part I, Springer-Verlag, Berlin, Heidelberg, 2007
Rustam Issabekov , Peter Vamplew, An empirical comparison of two common multiobjective reinforcement learning algorithms, Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence, December 04-07, 2012, Sydney, Australia
Leon Barrett , Srini Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th international conference on Machine learning, p.41-47, July 05-09, 2008, Helsinki, Finland
Ivana Dusparic , Vinny Cahill, Using Reinforcement Learning for Multi-policy Optimization in Decentralized Autonomic Systems — An Experimental Evaluation, Proceedings of the 6th International Conference on Autonomic and Trusted Computing, July 07-09, 2007, Brisbane, Australia
Neville Mehta , Sriraam Natarajan , Prasad Tadepalli , Alan Fern, Transfer in variable-reward hierarchical reinforcement learning, Machine Learning, v.73 n.3, p.289-312, December 2008
Kyle Hollins Wray , Shlomo Zilberstein , Abdel-Illah Mouaddib, Multi-objective mdps with conditional lexicographic reward preferences, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, p.3418-3424, January 25-30, 2015, Austin, Texas
Mohamed A. Khamis , Walid Gomaa, Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework, Engineering Applications of Artificial Intelligence, 29, p.134-151, March, 2014
Umair Ali Khan , Bernhard Rinner, Online learning of timeout policies for dynamic power management, ACM Transactions on Embedded Computing Systems (TECS), v.13 n.4, p.1-25, Feburary 2014
Daniel J. Lizotte , Michael Bowling , Susan A. Murphy, Linear fitted-Q iteration with multiple reward functions, The Journal of Machine Learning Research, v.13 n.1, p.3253-3295, January 2012
Ivana Dusparic , Vinny Cahill, Multi-policy optimization in self-organizing systems, Proceedings of the First international conference on Self-organizing architectures, September 14, 2009, Cambridge, UK
Uri Kartoun , Helman Stern , Yael Edan, A Human-Robot Collaborative Reinforcement Learning Algorithm, Journal of Intelligent and Robotic Systems, v.60 n.2, p.217-239, November 2010
Diederik M. Roijers , Peter Vamplew , Shimon Whiteson , Richard Dazeley, A survey of multi-objective sequential decision-making, Journal of Artificial Intelligence Research, v.48 n.1, p.67-113, October 2013