pandas - restructuring dataframe using column names

by Évariste Galois   Last Updated September 14, 2018 21:26 PM

I read in the following DataFrame, where I have many columns similar to S&PCOMP with a (PO) and (PI) tail at the end of the column name.

                      Date         S&PCOMP(PO)         S&PCOMP(PI)  NASA100(PO)    NASA100(PI)
0      1978-09-13 00:00:00                 nan              106.34  someValue      someValue 
1      1978-09-14 00:00:00                 nan  105.10000000000001  someValue      someValue
2      1978-09-15 00:00:00                 nan              104.12   
3      1978-09-18 00:00:00                 nan  103.21000000000001   
4      1978-09-19 00:00:00                 nan              102.53   
5      1978-09-20 00:00:00                 nan              101.73   
6      1978-09-21 00:00:00                 nan               101.9   
7      1978-09-22 00:00:00                 nan              101.84   
8      1978-09-25 00:00:00                 nan              101.86   
9      1978-09-26 00:00:00                 nan              102.62   
10     1978-09-27 00:00:00                 nan              101.66

I want to restructure this using regex and multi-indexing, into the following DataFrame, for every single column name. Essentially, I use the PO and PI values as 2 columns and expand my DF vertically using the column name base as an index. As you can see, the dates roll... meaning that for every unique column name base I'd have the same set of days.

                      Date                      Open               Close
S&PCOMP      1978-09-13 00:00:00                 nan              106.34   
S&PCOMP      1978-09-14 00:00:00                 nan  105.10000000000001   
S&PCOMP      1978-09-15 00:00:00                 nan              104.12   
S&PCOMP      1978-09-18 00:00:00                 nan  103.21000000000001   
S&PCOMP      1978-09-19 00:00:00                 nan              102.53   
S&PCOMP      1978-09-20 00:00:00                 nan              101.73   
S&PCOMP      1978-09-21 00:00:00                 nan               101.9   
S&PCOMP      1978-09-22 00:00:00                 nan              101.84   
S&PCOMP      1978-09-25 00:00:00                 nan              101.86   
S&PCOMP      1978-09-26 00:00:00                 nan              102.62   
S&PCOMP      1978-09-27 00:00:00                 nan              101.66
NASA100      1978-09-13 00:00:00                 someValue        someValue   
NASA100      1978-09-14 00:00:00                 someValue        someValue  

What would be the easiest way of accomplishing this with pandas? Could I use regex?



Answers 1


if you can change the headers:

df = df.rename(index=str, columns={"S&PCOMP(PO)": "(PO)S&PCOMP", "S&PCOMP(PI)": "(PI)S&PCOMP", "NASA100(PO)": "(PO)NASA100", "NASA100(PI)": "(PI)NASA100"})

then you can:

df_long = pd.wide_to_long(df, ['(PO)', '(PI)'], i='Date', j='stock', suffix=r'(?<=\))(.*)')

Please don't forget to check the accepted solution. Enjoy!

Trenton_M
Trenton_M
September 14, 2018 21:25 PM

Related Questions


Do not map item to any output using apply()

Updated July 30, 2018 21:26 PM