## Preamble
```python
import pyspark.sql.functions as F
```
## Aggregate arrays of sets as unions
```python
sdf.groupBy(
sdf.key
).agg(
F.array_distinct(
F.flatten(
F.collect_list(
sdf.array_col
)
)
).alias('union')
)
```
## Resolve array of reference IDs to reference names
`src_df` contains a column of arrays that are ID references to some concepts stored by name in a `ref_df`:
```python
src_sdf.join(
ref_df,
F.array_contains(
src_df.id_array,
ref_df.id_col
)
).groupBy(
src_df.primary_key
).agg(
F.collect_list(
ref_df.name_col
).alias("name_array")
)
```