Oversubscription planning is the problem of choosing an action sequence which reaches a state with an as high as possible utility, given a budget for total action cost. Most previous work on oversubscription planning was restricted to non-negative utility functions and 0-binary utility functions. While this restriction allows using techniques similar to partial satisfaction planning, it limits the expressivity of the formalism. In this work, we address oversubscription planning generalized to arbitrary additive utility functions over a finite-domain representation. We introduce the notions of the net and gross utility of an action which provide a more comprehensive context of achievement evaluation and allow dealing with negative utility values. Focused on the net utility, we capture relationships between sets of state variables and handle negative interactions that occur in the non-negative utility setting as well. We use the notions of gross and net utility to examine the structure of plans on the level of action and sequences of actions and prove several properties about the structure of optimal plans.
To leverage these properties, we introduce a compilation of an OSP task to classical planning, which forces a plan to conform to these properties. The landmarks of this classical planning task constitute value driven landmarks, that is, landmarks which must occur in every plan that improves over the utility of the initial state.
We also introduce a modified version of the inc-compile-and-BFBB planer (Mirkis & Domshlak 2013,2014; Domshlak & Mirkis 2015), which leverages our observations on additive utility functions and optimal plans. Our empirical evaluation shows that our approach allows for a substantial reduction in search effort in problems with additive utility functions. The value-driven landmarks are more informative than previous state-of-the-art methods for landmark discovery for oversubscription planning and lead to better planning performance with non-negative utilities as well.
Oversubscription planning (D. Smith, 2004), also referred to as OSP, deals with achieving a state with a high utility value, given a budget on total action cost. This formulation allows us to handle situations with under-constrained resources, which do not allow us to achieve all possible goal facts. OSP objective differs from classical planning objective, in which the objective is to find a cheap plan which achieves all goal facts. In optimal OSP and optimal classical planning, the tasks are further constrained to finding a path which achieves a state with maximal utility, and, to finding the cheapest cost path which achieves the goal, respectively.
Over the years, the theory and practice of classical planning have been studied and advanced much more intensively compared to OSP. Recent work (Mirkis & Domshlak 2013,2014; Domshlak & Mirkis 2015) made several contributions aiming at improving the scalability of OSP solvers. In particular, they developed a planner which exploits standard landmark discovery tools of classical planning, as well as abstractions for solving OSP problems. They showed how standard goal-reachability landmarks of certain classical planning tasks could be generated to represent achievements of valuable facts of the original OSP task. These landmarks compiled into the original OSP task to obtain an equivalent OSP task with a lower cost allowance, and thus with a smaller effective search space. In this research, we investigate approximation methods for OSP, aiming at extending the scope of the landmark-based approximations for OSP, as well as on improving the scalability of the state-space search driven by these approximations. We propose techniques which allow us to discover more informative landmarks than previous methods. Furthermore, our techniques are applicable for OSP tasks generalized to additive utility setting.
Starting with the most basic question of what additive utility function in OSP problem is, we point out the challenges in multi-valued planning tasks with additive utility setting. We discuss the relationships between state variables and different value assignments to a variable in successive states along a plan. We closely consider negative interactions between state variables with multi-valued (non-zero binary) utility setting, and we show how these negative interactions could occur in tasks with non-negative utility setting. We introduce a novel heuristic search approach to address additive utility OSP tasks which differs from the traditional automated action planning approach in the sense of interpretation of the objective of planning for valuable facts. With our focus on improving state-variables rather than on collecting valuable facts, we define the utility of actions. Investigation of planning tasks and their utility structure with the utility of action definition in hand allowed us to discover properties of optimal plans on the level of actions and sequences of actions. A synergistic combination of these properties allows handling OSP additive utility functions over multi-valued variables.
We introduce the notion of net utility of an action, which allows for more detailed reasoning about valuable facts by taking into account the A broader context of achievements. The notion of the net utility of action takes into account intra-state dependencies between facts to consider not only the positive but also the negative achievements of the actions. Furthermore, this notion allows for reasoning in the context of inter-state dependencies of facts, which allow us to evaluate the effectiveness of action by considering facts that are a given away to achieve other valuable facts.
In particular, the notion of action’s net utility will help us to set a more accurate estimation of likely goal states in an OSP task, and reduce the search space. We prove that an optimal plan must end with an action which has positive net utility.
Eliminating the assumption of zero utility value in the initial state, we define a reference point, which serves as a baseline for improvement in utility.
This reference point is a set of propositions with improved utility over propositions in a given state, for each state variable. We introduce the notion of a gross positive action, which improves the utility for at least one of the state variables, although possibly at the cost of a lower utility for other variables. We prove that an optimal plan must contain a gross positive action (which might have negative net utility), which achieves some valuable fact that is maintained until the end of the plan (which must be an action with positive net utility).
This approach of improving differs from previous approaches which attempted to collect valuable facts and allow us to handle any additive utility function, not just non-negative additive utility functions. Based on these observations we propose a technique to discover more informative landmarks than previous methods.
To leverage these properties, we introduce a compilation of an OSP task to classical planning, which forces a plan to conform to these properties. The landmarks of this classical planning task constitute value driven landmarks, that is, landmarks which must occur in every plan that improves over the utility of the initial state. We also introduce a modification to the inc-compile-and-BFBB planer (Mirkis & Domshlak 2013,2014; Domshlak & Mirkis 2015) planner, which leverages our definitions of net positive action and gross positive actions and our observations on these definitions.
We introduce an efficient way to define the net utility of an action which is based on off-line analysis of problem structure and its utility setting. Instead of calculating the net utility of action in each state during the search, we perform a pre-processing step which includes identification and marking only the net positive actions which are candidates to terminate the search. A framework for action split is presented which is essential for defining the net utility for actions in SAS+ representation with redundant precondition list. We show how this split actions allow for further reducing the search space.
The ideas developed in this work have been implemented in a new optimal OSP planner that extend the planner developed by Mirkis and Domshlak on the basis of the Fast Downward system (Helmert, 2006). An empirical evaluation shows that the landmarks we discover are more informative and that our modified planner outperforms state-of-the-art OSP planners in the non-negative utility setting. Also, to our knowledge, our implementation constitutes the first domain independent planner for optimal OSP with arbitrary additive utility functions, and we hope that more advances in this important computational problem will follow.