# Few-shot Chain-of-thought Prompting Elicits Reasoning in Large Language Models - Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, Denny Zhou - Google Research - NeurIPS 2022 ## Chain-of-thought Prompting ![](Chain-of-thought%20prompting.png) * Our proposed approach is to augment each exemplar in few-shot prompting with a chain of thought for an associated answer, as illustrated in Figure 1 (right) * Chain-of-thought reasoning can be readily elicited in sufficiently large off-the-shelf language models simply by including examples of chain of thought sequences into the exemplars of few-shot prompting * Chain-of-thought prompting enables large language models to tackle complex arithmetic, commonsense, and symbolic reasoning tasks. Chain-of-thought reasoning processes are highlighted. ## Experiments ### Arithmetic Reasoning ![arithmetic|300](CoT%20Arithmetic%20Reasoning.png) * the ASDiv dataset of diverse math word problems (Miao et al., 2020) * the SVAMP dataset of math word problems with varying structures (Patel et al., 2021) * the MAWPS benchmark (Koncel-Kedziorski et al., 2016) #### Robustness ![robustness|300](CoT%20Robustness.png) ### Commonsense Reasoning ![commonsense](CoT%20Commonsense%20Reasoning.png) * The popular CSQA (Talmor et al., 2019) asks commonsense questions about the world involving complex semantics that often require prior knowledge * StrategyQA (Geva et al., 2021) requires models to infer a multi-hop strategy to answer questions * Date Understanding, which involves inferring a date from a given context ### Symbolic Reasoning ![symbolic|250](CoT%20Symbolic%20Reasoning.png) * Last letter concatenation. This task asks the model to concatenate the last letters of words in a name (e.g., “Amy Brown” → “yn”). It is a more challenging version of first-letter concatenation, which language models can already perform without a chain of thought. We generate full names by randomly concatenating names from the top one thousand first and last names from name census data (https://namecensus.com/). * Coin flip. This task asks the model to answer whether a coin is still heads up after people either flip or don’t flip the coin (e.g., “A coin is heads up. Phoebe flips the coin. Osvaldo does not flip the coin. Is the coin still heads up?”→ “no”). ### Conclusion Generating a chain of thought—a series of intermediate reasoning steps—significantly improves the ability of large language models to perform complex reasoning. ## Mindmap ![](Chain-of-Thought%20Prompting%20Elicits%20Reasoning%20in%20Large%20Language%20Models.pdf)