Learning first-order probabilistic models with combining rules

Sriraam Natarajan

Backwards citation:

Pieter Abbeel , Andrew Y. Ng, Apprenticeship learning via inverse reinforcement learning, Proceedings of the twenty-first international conference on Machine learning, p.1, July 04-08, 2004, Banff, Alberta, Canada [doi>10.1145/1015330.1015430]

Altman, E. (1999). Constrained markov decision processes. Chapman and Hall. First edition.

Craig Boutilier, A POMDP formulation of preference elicitation problems, Eighteenth national conference on Artificial intelligence, p.239-246, July 28-August 01, 2002, Edmonton, Alberta, Canada

Urszula Chajewska , Daphne Koller , Dirk Ormoneit, Learning an Agent’s Utility Function by Observing Behavior, Proceedings of the Eighteenth International Conference on Machine Learning, p.35-42, June 28-July 01, 2001

Eugene A. Feinberg , Adam Shwartz, Constrained Markov decision models with weighted discounted rewards, Mathematics of Operations Research, v.20 n.2, p.302-320, May 1995 [doi>10.1287/moor.20.2.302]

Zoltán Gábor , Zsolt Kalmár , Csaba Szepesvári, Multi-criteria Reinforcement Learning, Proceedings of the Fifteenth International Conference on Machine Learning, p.197-205, July 24-27, 1998

Guestrin, C., Koller, D., & Parr, R. (2001). Multiagent planning with factored MDPs. In Proceedings NIPS-01.

Leslie Pack Kaelbling , Michael L. Littman , Anthony R. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, v.101 n.1-2, p.99-134, May 1998 [doi>10.1016/S0004-3702(98)00023-X]

James F. Kurose , Keith Ross, Computer Networking: A Top-Down Approach Featuring the Internet, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, 2002

Yi Lu , Weichao Wang , Yuhui Zhong , Bharat Bhargava, Study of Distance Vector Routing Protocols for Mobile Ad Hoc Networks, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, p.187, March 23-26, 2003

Sridhar Mahadevan, Average reward reinforcement learning: foundations, algorithms, and empirical results, Machine Learning, v.22 n.1-3, p.159-195, Jan./Feb./March 1996 [doi>10.1007/BF00114727]

Shie Mannor , Nahum Shimkin, A Geometric Approach to Multi-Criterion Reinforcement Learning, The Journal of Machine Learning Research, 5, p.325-360, 12/1/2004

Andrew Y. Ng , Stuart J. Russell, Algorithms for Inverse Reinforcement Learning, Proceedings of the Seventeenth International Conference on Machine Learning, p.663-670, June 29-July 02, 2000

Ronald Parr, Flexible decomposition algorithms for weakly coupled Markov decision problems, Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, p.422-430, July 24-26, 1998, Madison, Wisconsin

Puterman, M. L. (1994). Markov decision processes. J. Wiley and Sons.

Russell, S., & Zimdars, A. L. (2003). Q-decomposition for reinforcement learning agents. In Proceedings of ICML-03.

Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of ICML-93.

Peter Stone, TPOT-RL Applied to Network Routing, Proceedings of the Seventeenth International Conference on Machine Learning, p.935-942, June 29-July 02, 2000

Prasad Tadepalli , DoKyeong Ok, Model-based average reward reinforcement learning, Artificial Intelligence, v.100 n.1-2, p.177-224, April 1998 [doi>10.1016/S0004-3702(98)00002-2]

Nigel Tao , Jonathan Baxter , Lex Weaver, A Multi-Agent Policy-Gradient Approach to Network Routing, Proceedings of the Eighteenth International Conference on Machine Learning, p.553-560, June 28-July 01, 2001

White, D. (1982). Multi-objecticve infinite-horizon discounted markov decision processes. Journal of Mathematical Analysis and Applications, 89, 639–647.

Forwards Citation:

Peter Vamplew , John Yearwood , Richard Dazeley , Adam Berry, On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts, Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, December 01-05, 2008, Auckland, New Zealand

Peter Vamplew , Richard Dazeley , Ewan Barker , Andrei Kelarev, Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks, Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence, December 01-04, 2009, Melbourne, Australia

Kazuyuki Hiraoka , Manabu Yoshida , Taketoshi Mishima, Parallel Reinforcement Learning for Weighted Multi-criteria Model with Adaptive Margin, Neural Information Processing: 14th International Conference, ICONIP 2007, Kitakyushu, Japan, November 13-16, 2007, Revised Selected Papers, Part I, Springer-Verlag, Berlin, Heidelberg, 2007

Rustam Issabekov , Peter Vamplew, An empirical comparison of two common multiobjective reinforcement learning algorithms, Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence, December 04-07, 2012, Sydney, Australia

Leon Barrett , Srini Narayanan, Learning all optimal policies with multiple criteria, Proceedings of the 25th international conference on Machine learning, p.41-47, July 05-09, 2008, Helsinki, Finland

Ivana Dusparic , Vinny Cahill, Using Reinforcement Learning for Multi-policy Optimization in Decentralized Autonomic Systems — An Experimental Evaluation, Proceedings of the 6th International Conference on Autonomic and Trusted Computing, July 07-09, 2007, Brisbane, Australia

Neville Mehta , Sriraam Natarajan , Prasad Tadepalli , Alan Fern, Transfer in variable-reward hierarchical reinforcement learning, Machine Learning, v.73 n.3, p.289-312, December 2008

Kyle Hollins Wray , Shlomo Zilberstein , Abdel-Illah Mouaddib, Multi-objective mdps with conditional lexicographic reward preferences, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, p.3418-3424, January 25-30, 2015, Austin, Texas

Mohamed A. Khamis , Walid Gomaa, Adaptive multi-objective reinforcement learning with hybrid exploration for traffic signal control based on cooperative multi-agent framework, Engineering Applications of Artificial Intelligence, 29, p.134-151, March, 2014

Umair Ali Khan , Bernhard Rinner, Online learning of timeout policies for dynamic power management, ACM Transactions on Embedded Computing Systems (TECS), v.13 n.4, p.1-25, Feburary 2014

Daniel J. Lizotte , Michael Bowling , Susan A. Murphy, Linear fitted-Q iteration with multiple reward functions, The Journal of Machine Learning Research, v.13 n.1, p.3253-3295, January 2012

Ivana Dusparic , Vinny Cahill, Multi-policy optimization in self-organizing systems, Proceedings of the First international conference on Self-organizing architectures, September 14, 2009, Cambridge, UK

Uri Kartoun , Helman Stern , Yael Edan, A Human-Robot Collaborative Reinforcement Learning Algorithm, Journal of Intelligent and Robotic Systems, v.60 n.2, p.217-239, November 2010

Diederik M. Roijers , Peter Vamplew , Shimon Whiteson , Richard Dazeley, A survey of multi-objective sequential decision-making, Journal of Artificial Intelligence Research, v.48 n.1, p.67-113, October 2013

Wellness Innovation and Interaction Lab

WII Lab

Citation Tree

Learning first-order probabilistic models with combining rules