# Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
* Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister
* UWash and Google Research
* ACL 2023
## Method
Our paradigm has two simple steps:

* First, given an LLM and an unlabeled dataset, we prompt the LLM to generate output labels along with rationales to justify the labels.
* Each prompt is a triplet (xp, rp, yp), where xp is an example input, yp is its corresponding label and rp is a user-provided rationale that explains why xp can be categorized as yp.
* With the demonstrations seen in p, the LLM can mimic the triplet demonstration to generate the rationale ˆri and output ˆyi for xi.
* We require users to produce a few example demonstrations (∼ 10-shot for all tasks) in order to use the few-shot CoT
* We utilize Chain-of-Thought (CoT) prompting (Wei et al., 2022) to elicit and extract rationales from LLMs
* Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. 2022. Chain of thought prompting elicits reasoning in large language models.
* Second, we leverage these rationales in addition to the task labels to train smaller downstream models.
* We first describe the current framework for learning task-specific models. With this framework in place, we extend it to incorporate rationales into the training process.
* Multi-task learning with rationales.
* To create a more explicit connection between xi’s to ˆyi’s, we use extracted rationales ˆri as additional supervision.
* instead of using rationales as additional model inputs, we frame learning with rationales as a multi-task problem
* We prepend “task prefixes”([label], [rationale]) to the input examples and train the smaller model to output ˆyi when [label] is provided and to produce ˆri with [rationale] (Raffel et al., 2020).
* Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67
* Our proposed multi-task training framework consistently leads to better performances than treating rationale and label predictions as a single task. Singletask training can at times lead to worse performance than standard finetuning.
* Task-specific distillation (Hinton et al., 2015; Tang et al., 2019) uses LLM teachers to generates pseudo noisy training labels, ˆyi in place of yi (Wang et al., 2021)
* Peifeng Wang, Aaron Chan, Filip Ilievski, Muhao Chen, and Xiang Ren. 2022a. Pinto: Faithful language reasoning using prompt-generated rationales.
## Experiments


# Mindmap
