If you have a dataframe in wide record format, melting or stacking it can convert it to a narrower, longer table.
And if you have a narrow dataframe, pivoting and unstacking it can convert it to a wider, shorter table.
But when should you use stack versus melt and unstack versus pivot?
- **Stack/unstack**: When you already have defined all proper (row) indexes and columns, you can move the lowest level of the columns to the lowest level row index via `stack(...)` (making tables longer & narrower) and vice versa via `unstack(...)` (making tables shorter & wider) while making sure that the values stay correctly attached to their index position.
- **Pivot**: When you have loaded the raw dataframe, but still need to index and unstack the table to a wider, record-level format, use `df.pivot(columns=..., index=..., [values=...])`
- **Melt**: You should rarely need this method. However, if you want to reshape the table before you set proper indices on your dataframe, `df.melt(id_vars=..., var_name='...', value_name='...')` can come in handy.
Once you unstack or stack a dataframe to the point where you remove the last remaining level in the column or row index, the dataframe will "degrade" to a multi-index *series*.
You can "recover" the dataframe nature by transforming the series back to a dataframe and giving the column that will represent the series values a name, by applying `.to_frame(name='...')` to the series.
## Example
This is a narrow & fully indexed dataframe; Notice that is has two row index levels, *Item* and *Metal*:
| | | EUR |
| ----- | ---------- | ---- |
| Item | Metal | |
| **0** | **Bronze** | 0.31 |
| | **Gold** | 2.20 |
| **1** | **Gold** | 3.20 |
| | **Silver** | 1.10 |
And this would be an wide & indexed version of the same; It has two column index levels, *EUR* and *Metal* and introduces `NaN` values where rows were "missing" in the narrow table (e.g., a `pivot(...)` or `unstack()`-ing of the narrow table would introduce those `NaN` values):
| | EUR | | |
| ----- | ---------- | -------- | ---------- |
| Metal | **Bronze** | **Gold** | **Silver** |
| Item | | | |
| **0** | 0.31 | 2.2 | NaN |
| **1** | NaN | 3.2 | 1.1 |
Notice how the *Metal* index level moves between the row and column index.
If you pivot a dataframe that only has a single value column (like in the above example), and you are not planning on stacking the dataframe to a narrow table, then you should also explicitly set the `values` parameter in the `pivot(...)` call: If you do not, you get a top level column index with only the name of that value column (see the *EUR* top row in the example table above). That happens, because `pivot` by default prepends all value columns in the column index, just above the `columns` that you pick when calling `pivot(...)` (see the two-values example further below, too).
For example, if you have one value column, pivoting without a `values` parameter gives you the above multi-indexed, wide table:
```python
raw = {
'Item': [0, 0, 1, 1],
'Metal': ['Gold', 'Bronze', 'Gold', 'Silver'],
'EUR': [2.2, 0.31, 3.2, 1.1],
}
pd.DataFrame(raw).pivot(index='Item', columns='Metal')
```
(See the last table above for the shape of this dataframe.)
To get rid of the *EUR* multi-index level (i.e., the top row in the above table), use `values` as follows:
```python
pd.DataFrame(raw).pivot(index='Item', columns='Metal', values='EUR')
```
| Metal | **Bronze** | **Gold** | **Silver** |
| ----- | ---------- | -------- | ---------- |
| Item | | | |
| **0** | 0.31 | 2.2 | NaN |
| **1** | NaN | 3.2 | 1.1 |
If you now `.stack()` this pivoted dataframe, you get the following series back, because there are no more column index levels left to move to the row index:
```plain
Item Metal
0 Bronze 0.31
Gold 2.20
1 Gold 3.20
Silver 1.10
dtype: float64
```
(You would end with a similar series if you were to `.unstack()` this dataframe, as there is only one row index level available to move.)
To elevate such a series back to a dataframe again, you can use `.to_frame(name='...')`:
```python
pd.DataFrame(raw).pivot(
index='Item', columns='Metal', values='EUR'
).stack().to_frame(name='EUR')
```
| | | EUR |
| ----- | ---------- | ---- |
| Item | Metal | |
| **0** | **Bronze** | 0.31 |
| | **Gold** | 2.20 |
| **1** | **Gold** | 3.20 |
| | **Silver** | 1.10 |
Notice that the top-level *EUR* column index has "snuck back in".
If we instead look at an example where you have multiple value columns in the narrow shape, the `.pivot(..., values=...)` behavior becomes more obvious:
```python
# "narrow" raw data with two value columns, EUR and USD:
raw = {
'Item': [0, 0, 1, 1],
'Metal': ['Gold', 'Bronze', 'Gold', 'Silver'],
'EUR': [2.2, 0.31, 3.2, 1.1],
'USD': [2.0, 0.27, 2.7, 1.0],
}
pd.DataFrame(raw).pivot(index='Item', columns='Metal')
```
This gets you the following wide shape, by default tracking any remaining columns not listed in `index` or `columns` as **multi-indexed value columns** (here, *EUR* and *USD*):
| | | | EUR | | | USD |
| ----- | ---------- | -------- | ---------- | ---------- | -------- | ---------- |
| Metal | **Bronze** | **Gold** | **Silver** | **Bronze** | **Gold** | **Silver** |
| Item | | | | | | |
| **0** | 0.31 | 2.2 | NaN | 0.27 | 2.0 | NaN |
| **1** | NaN | 3.2 | 1.1 | NaN | 2.7 | 1.0 |
And for completeness, here is what happens if you `stack()` that above table:
```python
raw = {
'Item': [0, 0, 1, 1],
'Metal': ['Gold', 'Bronze', 'Gold', 'Silver'],
'EUR': [2.2, 0.31, 3.2, 1.1],
'USD': [2.0, 0.27, 2.7, 1.0],
}
pd.DataFrame(raw).pivot(index='Item', columns='Metal').stack()
```
| | | **EUR** | **USD** |
| ----- | ---------- | ------- | ------- |
| Item | Metal | | |
| **0** | **Bronze** | 0.31 | 0.27 |
| | **Gold** | 2.20 | 2.00 |
| **1** | **Gold** | 3.20 | 2.70 |
| | **Silver** | 1.10 | 1.00 |
Which is a bit of a contrived example, as in this particular situation, you could directly define the relevant index to get the dataframe set up as above:
```python
pd.DataFrame(raw).set_index(['Item', 'Metal'])
```
(This produces the exact same dataframe as shown in the table just above.)
The key thing to take home is how `stack()` and `unstack()` can move the respective last index level (*Metal*, in this case) from the columns to the rows and vice versa:
```python
pd.DataFrame(raw).set_index(['Item', 'Metal']).unstack()
```
| | | | EUR | | | USD |
| ----- | ---------- | -------- | ---------- | ---------- | -------- | ---------- |
| Metal | **Bronze** | **Gold** | **Silver** | **Bronze** | **Gold** | **Silver** |
| Item | | | | | | |
| **0** | 0.31 | 2.2 | NaN | 0.27 | 2.0 | NaN |
| **1** | NaN | 3.2 | 1.1 | NaN | 2.7 | 1.0 |
Therefore, after you either `set_index(...)` on your dataframes or, if necssary, `pivot(...)`-ed them to the right shape **and** index, you should only need `stack()` and `unstack()` to go between narrow and wide versions of your dataframe.