Function to apply log transformation over multiple variables in Python

by James Pietro Zanzarelli   Last Updated September 19, 2018 13:26 PM

I'm trying to apply log transformation over multiple columns from a Dataframe in Python with this function.

def log(x):
       if type(x) is float64 or int64:
              apply(np.log(x+1))
       else:
              return x

df2.apply(log)

I'm getting the following error:

NameError: ("name 'float64' is not defined", 'occurred at index CUSTID')

CUSTID is the first categorical column in the DataFrame



Answers 3


The NameError is probably due to an import error. If you have

import numpy as np

you need to refer to the data types as np.float64 and np.int64. Otherwise, your import should be

from numpy import float64, int64

Also, it is more common style to write

if isinstance(x, np.float64) or isinstance(x, np.int64)

or (equivalent but simpler)

if isinstance(x, (np.float64, np.int64))

which also works for subclasses (though I doubt this will make any difference in this case)

blue_note
blue_note
September 19, 2018 12:58 PM

thanks for your help! I tried this:

from numpy import float64, int64

def log(x): if isinstance(x, (np.float64, np.int64)): apply(np.log(x+1)) else: return x

df2.apply(log)

It's not giving me any error, but it's not applying any log transformation

James Pietro Zanzarelli
James Pietro Zanzarelli
September 19, 2018 13:22 PM

i think this will do the work as python 3 dont support apply

for c in [c for c in df.columns if np.issubdtype(df[c].dtype , np.number)]:
    df[c] = np.log(df[c])

Example Code : import pandas as pd import numpy as np

df = pd.DataFrame(
[
[2, 4, "A"],
[4, 5, "C"],
[5, 4, "B"],

[10, 4.2, "A"],
[9, 3, "B"],
[3, 3, "C"]
], columns=['data1', 'data2', 'Categories'])

for c in [c for c in df.columns if np.issubdtype(df[c].dtype , np.number)]:
    df[c] = np.log(df[c])

Output:

      data1     data2 Categories
0  0.693147  1.386294          A
1  1.386294  1.609438          C
2  1.609438  1.386294          B
3  2.302585  1.435085          A
4  2.197225  1.098612          B
5  1.098612  1.098612          C
krishnaa208
krishnaa208
September 19, 2018 13:26 PM

Related Questions


Pre-Calc Log and Natural Log Equations

Updated April 17, 2017 23:26 PM

Python: string comparison with double conditions

Updated December 24, 2017 08:26 AM



Histogram of logarithm values

Updated April 24, 2018 21:26 PM