What's the fastest and memory efficient way to concat an N col by 1 row dataFrame column wise and then row wise?

by aday   Last Updated September 13, 2018 23:26 PM

This is essentially what I've got going on in my code, it works, but it's slow:

results = []
for x1 in range(0, 10):
    toConcat = []
    for x2 in range(0, 10):
        df = pd.DataFrame(np.array([[x2+x1+1, x2+x1+2, x2+x1+3, x2+x1+4, x2+x1+5]]), columns=["a", "b", "c", "d", "e"])
        toConcat.append(df)
    df = pd.concat(toConcat, axis=1)
    results.append(df)
df = pd.concat(results, axis=0)
df

I am trying to take a list of N columns by 1 row data frames, concat them on axis 1 (column wise), then take a list of those and concat them on axis 0 (row wise.) I think I need to remove the line:

df = pd.concat(toConcat, axis=1) 

from inside the loop, because I know that calling pd.concat inside a loop slows things down a lot. I'm hoping to do something like this:

results = []
for x1 in range(0, 10):
    toConcat = []
    for x2 in range(0, 10):
        df = pd.DataFrame(np.array([[x2+x1+1, x2+x1+2, x2+x1+3, x2+x1+4, x2+x1+5]]), columns=["a", "b", "c", "d", "e"])
        toConcat.append(df)
    results.append(toConcat)
*magic concatenation*
df

I want to concat along two different axis at the same time. Is this possible, or can someone think of a better way to obtain the result I want? Thanks!

EDIT: Figured it out! Although I'm not sure if it's the most efficient. Here it is:

results = []
for x1 in range(0, 10):
    toConcat = []
    for x2 in range(0, 10):
        df = pd.DataFrame(np.array([[x2+x1+1, x2+x1+2, x2+x1+3, x2+x1+4, x2+x1+5]]), columns=["a", "b", "c", "d", "e"])
        toConcat.append(df)
    results.append(toConcat)
df = pd.concat([pd.concat(x, axis=1) for x in results], axis=0)
df


Answers 1


Let's (cold)speed this up. You can use itertools.product to cut out the loops. Just create a an array of data and reshape.

from itertools import product

r, c = 10, 10
data = list(map(sum, product(range(r), range(c), range(1, 6))))
df = pd.DataFrame(
    np.array(data).reshape(r, -1), columns=list('abcde') * c)
coldspeed
coldspeed
September 13, 2018 23:24 PM

Related Questions






Pandas For Loop Optimization

Updated July 24, 2018 15:26 PM