Merge dataframe with aggregation

by Max Segal   Last Updated January 14, 2018 12:26 PM

I want to aggregated a dataframe - to get the first row of every group and simultaneously to concatenate the values in column 'upc':

df = pd.DataFrame({
    'id1': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7],
    'id2': [11, 22, 11, 11, 22, 33, 33, 33, 33, 44, 44, 55, 66, 66, 22, 77, 77],
    'value1': ["1first", "1second", "1third",
               "2first", "2second",
               "3first", "3second", "3third", "3fourth",
               "4first", "4second",
               "5first",
               "6first", "6second", "6third",
               "7first", "7second"],
    'upc': [str(x) for x in range(100, 117)]
})
firsts_df = df.groupby(['id1', 'id2']).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc))
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])

This results in this error:

ValueError: can not merge DataFrame with instance of type class 'pandas.core.series.Series'

How can I merge an aggregation result with a dataframe? could I get same result with less costly operation?

Tags : python pandas


Related Questions


pandas groupby apply is really slow

Updated November 05, 2017 15:26 PM

Do not map item to any output using apply()

Updated July 30, 2018 21:26 PM


Chaining groupby and apply pandas

Updated May 22, 2018 02:26 AM