Extracting data values belonging to a day and compute average value to each day

by bgarcial   Last Updated September 16, 2018 03:26 AM

I have a dataset with a date range from January 12th to August 3rd 2018 with some values:

enter image description here

The dimensionality of df_luminosidad DataFrame is:

df_luminosidad.shape
(9752, 2)

Each row of data has a half-hour frequency. The first row begins at 2018-01-12

df_luminosidad.iloc[0]

Fecha:       2018-01-12 00:17:28
Luz (lux)                      1
Name: 0, dtype: object

And the last row ends at 2018-08-03

df_luminosidad.tail(1)

Fecha:  Luz (lux)
9751    2018-08-03 23:44:59 1

There are many rows which correspond to one day such as denoted above.

I want to create a new dataframe selecting only 1 day as a value for the Fecha: column (no a repeated date) and select the average value from all existing value for the selected date of the Luz(lux) column

This would look like this:

|  Fecha:     |  Luz(lux) - Average each day values |
|  2018-01-12 |  9183.479167                        |
|  2018-01-13 |  7431.8125                          |
|  2018-01-14 |  11073.1875                         |
|  2018-01-15 |  11456.25                           |
|      .             .                              |
|      .             .                              |
|      .             .                              |
|  2018-08-03 |  11331.229167                       |

I am using pd.date_range() function to create a dataframe with the values from df_luminosidad previous dataframe to a specified frequency such as denoted @piRSquared in this answer

From January 12 to August 03 there are 203 days (28 weeks), then I choose 204 like period attribute, and I am using a freq attribute calendar day frequency D

df = pd.DataFrame(dict(
    Date=pd.date_range('2018-01-12', periods=204, freq='D'),
    Value=df_luminosidad['Luz (lux)'].mean()   
))

Until here, the approach is well, but I have a doubt about of how to compute the mean of Luz (lux) values to each day selected with pd.date_range, because, at this moment I just get an average same value for all days from 2018-01-12 until 2018-08-03 such as follow:

|  Date       |  Value        |
|  2018-01-12 |  11228.888331 |                       
|  2018-01-13 |  11228.888331 |                         
|  2018-01-14 |  11228.888331 |                        
|  2018-01-15 |  11228.888331 |                          
|      .             .        |                      
|      .             .        |                      
|      .             .        |                      
|  2018-08-03 |  11331.229167 | 

I've already generated a dataframe by each Fecha: column values, and separately I could get their average, but this force me to read each file day individually.

How can I generate of a cyclic way grouping all values of a day into a only average value by each day and put them in one dataframe?



Answers 2


I believe need resample with mean or aggregate mean with Grouper:

df_luminosidad['Fecha:'] = pd.to_datetime(df_luminosidad['Fecha:'])

df = df_luminosidad.resample('D', on='Fecha:')['Luz (lux)'].mean().reset_index()

Or:

df = (df_luminosidad.groupby(pd.Grouper(key='Fecha:', freq='D'))['Luz (lux)']
                    .mean()
                    .reset_index())

Another solutions with DatetimeIndex:

df_luminosidad['Fecha:'] = pd.to_datetime(df_luminosidad['Fecha:'])
df_luminosidad = df_luminosidad.set_index('Fecha:')


df = df_luminosidad.resample('D')['Luz (lux)'].mean().reset_index()
df = df_luminosidad.groupby(pd.Grouper(freq='D'))['Luz (lux)'].mean().reset_index()
jezrael
jezrael
September 11, 2018 06:31 AM

For time series analysis,first convert your date column as index, if possible, and after that it's convenient to use with dates.

df = pd.read_csv('file_name.csv', parse_dates =['Fecha'], index_col='Fecha')

Then you can fetch any date value into another temporary dataframe.

dates = pd.date_range(start='2018-01-12 ', end='2018-08-03 ')

Now using for loop, rander each date from 'dates' list and then select the required portion of your dataset into another dataframe.

new_df = pd.DataFrame()   # Creating temporary data frame to store each day value
for temp_date in dates:
    required_date = str(temp_date)[:10]     # this is to fetch only date value from whole time stamp
    new_df = df1[required_date]  
# Now our requrired data is in new dataframe, and we can do all things to our new dataframe. 

It might be suggesting a naive method, but right now I have this much suggestion for you. Hope it works.

debaonline4u
debaonline4u
September 11, 2018 06:57 AM

Related Questions



Convert pandas data frame to series

Updated April 11, 2018 17:26 PM


How to merge multiple columns into one?

Updated May 24, 2018 17:26 PM

updating dataframe with iterrows

Updated March 14, 2017 13:26 PM