Prepared with Ceren Demirkol, Okan Güven, Sevgican Varol
require(data.table)
set.seed(123)
consumption=fread("C:/Users/ceren.orhan/Desktop/ETM 58D/HW2-3/GercekZamanliTuketim-01012016-19052020.csv")
#Format manipulation
setnames(consumption,names(consumption)[3],'value')
consumption[,date:=as.Date(Tarih,'%d.%m.%Y')]
consumption[,hour:=as.numeric(substr(Saat,1,2))]
consumption=consumption[,list(date,hour,value)]
consumption[,value:=gsub(".", "",value, fixed = TRUE)]
consumption[,value:=as.numeric(gsub(",", ".",value, fixed = TRUE))]
head(consumption)
Data has shifted to create 1 wee kand 2 days of lag. Then NA rows removed
consumption[,lag_168:=shift(value,168)]
consumption[,lag_48:=shift(value,48)]
full_data=consumption[complete.cases(consumption)]
head(full_data)
Absolute percentage error valuesa re calculated.
full_data[,ape_168:=abs(full_data$value-full_data$lag_168)/full_data$value*100] #absolute percentage error
full_data[,ape_48:=abs(full_data$value-full_data$lag_48)/full_data$value*100]
tail(full_data)
test=full_data[date >= '2020-03-01']
boxplot(test$ape_168,test$ape_48,names=c("Lag 168","Lag 48"))
title("Absolute Percentage Error")
summary(full_data[,6:7])
quantile_lag_168=quantile(test$ape_168, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
quantile_lag_48=quantile(test$ape_48, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
q_all=cbind(quantile_lag_168,quantile_lag_48)
q_all
From the boxplot we can say that predicting consumptions from last week instead of 2 days, give better result since box dimentison of Lag 168 is smaller. That means it has smaller standard deviation and less median error.
When we comment on summary, again we see that predicting consumption with last week's data is more accurate than other. Although, there are still too many outliers which might be reasonable. Religional holidays or special days like christmast will increase the electricity consumption and we can not easliy predict consumpiton by looking last week's data.