OPRO - Large Language Models as Optimizers

* Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V. Le, Denny Zhou, Xinyun Chen * Google DeepMind * ArXiv Sep 2023 (ICLR 2024) # Introduction TL;DR: Best zero-shot prompt template: **Take a deep breath and work on this problem step-by-step.** While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. We propose **Optimization by PROmpting (OPRO)**, a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step # Methodology ![](OPRO%20Framework.png) We denote the LLM for objective function evaluation as the scorer LLM, and the LLM for optimization as the optimizer LLM. ![](OPRO%20Prompting.png) * Optimization problem description. The first part is the text description of the optimization problem, including the objective function and solution constraints. For example, for prompt optimization, the LLM can be instructed to “generate a new instruction that achieves a higher accuracy”, and we denote such instructions in the meta-prompt as meta-instructions. We can also provide customized meta-instructions as an informal regularization of the generated solutions, such as “the instruction should be concise and generally applicable”. * Exploration-exploitation trade-off. We tune the LLM sampling temperature to balance between exploration and exploitation. A lower temperature encourages the LLM to exploit the solution space around the previously found solutions and make small adaptations, while a high temperature allows the LLM to more aggressively explore solutions that can be notably different. * The optimization trajectory includes past solutions paired with their optimization scores, sorted in the ascending order. Including optimization trajectory in the meta-prompt allows the LLM to identify similarities of solutions with high scores, encouraging the LLM to build upon existing good solutions to construct potentially better ones without the need of explicitly defining how the solution should be updated. * More exemplars do not necessarily improve the performance, as a few exemplars are usually sufficient to describe the task. In addition, including more exemplars results in a longer meta-prompt with a dominating exemplar part, which may distract the optimizer LLM from other important components like the optimization trajectory. To improve stability, we prompt the LLM to generate multiple solutions at each optimization step, allowing the LLM to simultaneously explore multiple possibilities and quickly discover promising directions to move forward (i.e., **Self-consistency Prompting**). * We only keep instructions with the highest scores in the meta-prompt in consideration of the LLM context length limit. The output of the optimizer LLM is an instruction, which is concatenated to the question part of every exemplar and prompts the scorer LLM. * We consider the following positions to insert the instruction: * `Q_begin`: the instruction is added before the original question. * `Q_end`: the instruction is added after the original question. * `A_begin`: the instruction is added to the beginning of the scorer LLM output. This is applicable to pretrained LLMs without instruction tuning, where the prompt is formatted as a sequence of QA pairs. The optimization curves also generally show a decrease of the variance among the accuracies of instructions generated at each step, indicating that the optimizer LLM generates distributionally better instructions throughout the optimization # Evaluation While traditional optimization often requires a decently large training set, our experiment shows that a small number or fraction of training samples (e.g., 3.5% of the training set for GSM8K (Cobbe et al., 2021), 20% for Big-Bench Hard (Suzgun et al., 2022)) is sufficient ![](Evaluation%20of%20OPRO%20and%200-shot%20CoT.png) With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts. # Discussion The main advantage of LLMs for optimization is their ability of understanding natural language, which allows people to describe their optimization tasks without formal specifications. For instance, in prompt optimization where the goal is to find a prompt that optimizes the task accuracy, the task can be described with a high-level text summary along with input-output examples. # Mindmap ![](LARGE%20LANGUAGE%20MODELS%20AS%20OPTIMIZERS_withMarginNotes.pdf)