## Preamble ```python import pyspark.sql.functions as F ``` ## Aggregate arrays of sets as unions ```python sdf.groupBy(   sdf.key ).agg(   F.array_distinct(     F.flatten(       F.collect_list(       sdf.array_col       )     )   ).alias('union')   ) ``` ## Resolve array of reference IDs to reference names `src_df` contains a column of arrays that are ID references to some concepts stored by name in a `ref_df`: ```python src_sdf.join(   ref_df,   F.array_contains(     src_df.id_array,     ref_df.id_col   ) ).groupBy(    src_df.primary_key ).agg(   F.collect_list(     ref_df.name_col   ).alias("name_array") ) ```