diff --git a/README.md b/README.md old mode 100644 new mode 100755 diff --git a/RTE - clean data 2012-2014 15-1-16.csv b/RTE - clean data 2012-2014 15-1-16.csv old mode 100644 new mode 100755 diff --git a/RTEAnalytics-Alex-01.Rmd b/RTEAnalytics-Alex-01.Rmd new file mode 100755 index 0000000..d3dbe4a --- /dev/null +++ b/RTEAnalytics-Alex-01.Rmd @@ -0,0 +1,74 @@ +--- +title: "Exercise Set 2: A $300 Billion Strategy" +author: "Alex, Goajun, Sergey, Bastien" +output: html_document +--- + +
+ +```{r echo=FALSE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE} +#load packages from helpers.R + source("helpers.R") +``` + +###Background +We have performed analysis of electricity data provided by the French Distribution Network (RTE) + +####Analysis +1. Comparison between supply and demand +2. Evolution of energy mix +3. Correlation between time of the day and solar energy production +4. Correlation between time of the day and wind energy +5. Correlation between supply and demand vs. import/export +6. Correlation between consumption and weather + +####Source + +Open data: https://www.data.gouv.fr/fr/datasets/electricite-consommation-production-co2-et-echanges/ + +```{r eval=TRUE, echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig=TRUE} +DataSet<-read.csv("RTE - clean data 2012-2014 15-1-16.csv", header=TRUE) +# create vectors with year, month, and day +DataSet$Year = Year(as.Date(DataSet$Date,format='%m/%d/%Y')) +DataSet$Month = Month(as.Date(DataSet$Date,format='%m/%d/%Y')) +DataSet$Day = Day(as.Date(DataSet$Date,format='%m/%d/%Y')) +#Hour = Hour(as.Date(DataSet$Date,format='%m/%d/%Y')) +``` +The file has the following structure. It has `r nrow(DataSet)` rows and `r ncol(DataSet)` columns. + +###Part I: Comparison between supply and demand +`r n<-16` +I am a `r n`J + + + +###Part II: Evolution of energy mix + + +```{r, echo=FALSE, message=FALSE, prompt=FALSE, results='asis'} +energyMixYear <- group_by(DataSet, Year) %>% summarise(Fuel = sum(Fuel/4/1000), Coal = sum(Coal/4/1000), Gas = sum(Gas/4/1000),Nuclear = sum(Nuclear/4/1000), Wind = sum(Wind/4/1000), Solar = sum(Solar/4/1000), Hydro = sum(Hydro/4/1000), Pumping = sum(Pumping/4/1000), Bioenergy = sum(Bioenergy/4/1000)) + +#conversion to String required to use vector as x-axis in Google Charts +energyMixYear$Year=as.character(energyMixYear$Year) + +#excluded pumping for now +print(gvisSteppedAreaChart(energyMixYear, xvar = "Year", yvar = c("Fuel", "Coal", "Gas","Nuclear", "Wind", "Solar", "Hydro", "Bioenergy"), options=list(isStacked=TRUE,width = 1000, height = 500, vAxis="{format:'#,###GWh'}")), 'chart') +``` + +####?? + +###Part III: Correlation between time of the day and solar energy production + +####?? + +###Part IV: Correlation between time of the day and wind energy (test Sergey) + +####?? + +###Part V: Correlation between supply and demand vs. import/export (test Gaojun) + +####?? + +###Part VI: Correlation between consumption and weather + + diff --git a/RTEAnalytics.Rmd b/RTEAnalytics-Bastien-01.Rmd old mode 100644 new mode 100755 similarity index 63% rename from RTEAnalytics.Rmd rename to RTEAnalytics-Bastien-01.Rmd index c6bbad5..cd45eea --- a/RTEAnalytics.Rmd +++ b/RTEAnalytics-Bastien-01.Rmd @@ -1,9 +1,16 @@ --- -Author: "Alex, Goajun, Sergey, Bastien" +title: "Exercise Set 2: A $300 Billion Strategy" +author: "Alex, Goajun, Sergey, Bastien" +output: html_document ---
+```{r echo=FALSE, eval=TRUE, comment=NA, warning=FALSE,error=FALSE, message=FALSE, prompt=FALSE} +#load packages from helpers.R + source("helpers.R") +``` + ###Background We have performed analysis of electricity data provided by the French Distribution Network (RTE) @@ -21,7 +28,15 @@ Open data: https://www.data.gouv.fr/fr/datasets/electricite-consommation-product ```{r eval=TRUE, echo=FALSE, comment=NA, warning=FALSE, message=FALSE,results='asis',fig.align='center', fig=TRUE} DataSet<-read.csv("RTE - clean data 2012-2014 15-1-16.csv", header=TRUE) +<<<<<<< HEAD Consumption<-DataSet$Consumption +======= +# create vectors with year, month, and day +Year = Year(as.Date(DataSet$Date,format='%m/%d/%Y')) +Month = Month(as.Date(DataSet$Date,format='%m/%d/%Y')) +Day = Day(as.Date(DataSet$Date,format='%m/%d/%Y')) +#Hour = Hour(as.Date(DataSet$Date,format='%m/%d/%Y')) +>>>>>>> 1fa45162ed8f37f96c10ebb5f95a527bbf849b5d ``` The file has the following structure. It has `r nrow(DataSet)` rows and `r ncol(DataSet)` columns.
@@ -35,6 +50,16 @@ I am a `r n`J ###Part II: Evolution of energy mix + +```{r, echo=FALSE, message=FALSE, prompt=FALSE, results='asis'} +list_of_sources=colnames(DataSet[9:16]) +#will try to use list_of_sources as argument to generate graphs + +energyMixYear <- group_by(DataSet, Year) %>% summarise(Wind = sum(Wind), Coal = sum(Coal)) + +#plot(gvisSteppedAreaChart(energyMixYear, yvar = c("Wind", "Coal"), options=list(isStacked=TRUE))) +``` + ####?? ###Part III: Correlation between time of the day and solar energy production @@ -51,8 +76,4 @@ I am a `r n`J ###Part VI: Correlation between consumption and weather -Test for Alex -======= -#test de push by ALex -Test for Bastien 4:35pm diff --git a/RTEAnalytics-Sergey-01.Rmd b/RTEAnalytics-Sergey-01.Rmd new file mode 100755 index 0000000..5b30689 --- /dev/null +++ b/RTEAnalytics-Sergey-01.Rmd @@ -0,0 +1,133 @@ +--- +title: "RTEAnalytics-Sergey-01.Rmd" +author: "Sergey Efimenko" +date: "29 Jan 2016" +output: html_document +--- + +This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see . + +When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: + + + + + + + + +################## + +Seasonality test for wind power generation. + + +```{r, echo=FALSE} +library("stringr") +library("googleVis") +DataSet<-read.csv("RTE - clean data 2012-2014 15-1-16.csv", sep =",", header=TRUE) +month_data = sapply(1:length(DataSet$Date), function(i) ifelse(str_length(DataSet$Date[i]) > 6, as.numeric(str_split(DataSet$Date[i], "/")[[1]][1]), NA)) +DataSet$month_data = month_data + +season_data = sapply(month_data, function(i){ + if (i %in% c(11,12,1,2)) res = 1 #"Winter" + if (i %in% c(3,4,5)) res = 2 #"Spring" + if (i %in% c(6,7,8)) res = 3 #"Summer" + if (i %in% c(9,10)) res = 4 #"Fall" + res +}) +DataSet$season_data = season_data # CAN CREATE DUMMIES IF NEEDED! JUST ADD NEW COLUMNS +``` + + +# check this +table(DataSet$season_data) + +# Make sure the regression data only has numeric variables (and dummies). the lm input is a data.frame +# Make sure the regression data only has numeric variables (and dummies). the lm input is a data.frame +```{r, echo=FALSE} +regression_data = data.frame( + Cons = as.numeric(DataSet$Consumption), + Fuel = suppressWarnings(as.numeric(DataSet$Fuel)), # I folled google advice + Coal = as.numeric(DataSet$Coal), + Gas = as.numeric(DataSet$Gas), + Nuclear = as.numeric(DataSet$Nuclear), + Wind = as.numeric(DataSet$Wind), + Solar = as.numeric(DataSet$Solar), + Hydro = as.numeric(DataSet$Hydro), + Pumping = as.numeric(DataSet$Pumping), + Bio = as.numeric(DataSet$Bioenergy), + Phys = as.numeric(DataSet$Physical.delivery), + CO2 = as.numeric(DataSet$CO2.emission), + Trade.UK = as.numeric(DataSet$Trade.with.UK), + Trade.ES = as.numeric(DataSet$Trade.with.Spain), + Trade.IT = as.numeric(DataSet$Trade.with.Italy), + Trade.SW = as.numeric(DataSet$Trade.with.Switzerland), + Trade.DE_BG = as.numeric(DataSet$Trade.with.Germany...Belgium), + Winter.d = as.numeric(ifelse(DataSet$season_data == 1, 1, 0)), + Spring.d = as.numeric(ifelse(DataSet$season_data == 2, 1, 0)), + Summer.d = as.numeric(ifelse(DataSet$season_data == 3, 1, 0)), + Fall.d = as.numeric(ifelse(DataSet$season_data == 4, 1, 0)), + Morning.d = as.numeric(ifelse(DataSet$Time %in% c("6:00", "12:00"), 1, 0)), + Noon.d = as.numeric(ifelse(DataSet$Time %in% c("12:00", "18:00"), 1, 0)), + Evening.d = as.numeric(ifelse(DataSet$Time %in% c("18:00", "0:00"), 1, 0)), + Night.d = as.numeric(ifelse(DataSet$Time %in% c("00:30", "6:00"), 1, 0)), + #We dont need the following parameters for correlation tables + Monthdata = as.numeric(DataSet$month_data), + Time = as.numeric(DataSet$Time), + season_data = DataSet$season_data + +) +``` + + +m1<-gvisTable(regression_data,options=list(showRowNumber=TRUE,width=1920, height=min(400,27*(nrow(regression_data)+1)),allowHTML=TRUE,page='disable')) +print(m1,'chart') + +``` + +#Graf correlation matrix +library(corrplot) +corrplot(cor(regression_data[1:25]), method = "color", type="upper", order="original", tl.col="black", tl.srt=70) + + +#Table correlation matrix +View(cor(regression_data[1:25])) + + +#Regression1: consumption vc season +regression_formula_Cons.Seas = as.formula("Cons ~ Winter.d + Spring.d + Summer.d") +Regression.Cons.Seas = lm(regression_formula_Cons.Seas, regression_data) +Regression.Cons.Seas$coefficients + + +#Regression2: CO2 vc Gen.Mix +regression_formula_CO2 = as.formula("CO2 ~ Fuel + Coal + Gas + Nuclear -1") +Regression.CO2 = lm(regression_formula_CO2, regression_data) +Regression.CO2$coefficients +summary(Regression.CO2) + + +#Regression3: Wind vc Season +According to corretalition table there is a high correlation between wind farm output and season. +Here are the results of regression analysis with dummy variables: +```{r, echo=FALSE} +regression_formula_Wind = as.formula("Wind ~ Winter.d + Spring.d + Summer.d") +Regression.Wind = lm(regression_formula_Wind, regression_data) +Summary.3 <- summary(Regression.Wind) +Fall_coef.3 <-round(coef(summary(Regression.Wind))["(Intercept)","Estimate"]) +Summer_coef.3 <-round(coef(summary(Regression.Wind))["Summer.d","Estimate"]) +Spring_coef.3 <-round(coef(summary(Regression.Wind))["Spring.d","Estimate"]) +Winter_coef.3 <-round(coef(summary(Regression.Wind))["Winter.d","Estimate"]) +``` +$Wind = `r Fall_coef.3`*Fall + `r Summer_coef.3`*Summer +`r Spring_coef.3`*Spring +`r Winter_coef.3`*Winter$ + + + + +#Regression4: Solar vc Season +regression_formula_Solar = as.formula("Solar ~ Winter.d + Spring.d + Summer.d") +Regression.Solar = lm(regression_formula_Solar, regression_data) +Regression.Solar$coefficients +(Regression.Solar) + + diff --git a/RTEAnalytics.html b/RTEAnalytics-Sergey-01.html similarity index 99% rename from RTEAnalytics.html rename to RTEAnalytics-Sergey-01.html index 5c32053..b1e6a4c 100644 --- a/RTEAnalytics.html +++ b/RTEAnalytics-Sergey-01.html @@ -8,9 +8,11 @@ + + - +RTEAnalytics-Sergey-01.Rmd @@ -59,70 +61,70 @@
- - -


-
-

Background

-

We have performed analysis of electricity data provided by the French Distribution Network (RTE)

-
-

Analysis

-
    -
  1. Comparison between supply and demand
  2. -
  3. Evolution of energy mix
  4. -
  5. Correlation between time of the day and solar energy production
  6. -
  7. Correlation between time of the day and wind energy
  8. -
  9. Correlation between supply and demand vs. import/export
  10. -
  11. Correlation between consumption and weather
  12. -
-
-
-

Source

-

Open data: https://www.data.gouv.fr/fr/datasets/electricite-consommation-production-co2-et-echanges/

-

The file has the following structure. It has 52608 rows and 23 columns.

-
-
-
-

Part X: High level comments on electricity in France

-

Over the last three years, the highest consumption point at one single point in time amounted to 102098 MegaWatts. The lowest consumption point at one single point in time amounted to 29477. The highest consumption point of consumption occurred on

-
-
-

Part X: Comparison between supply and demand

-

I am a 16J

-
-
-

Part II: Evolution of energy mix

-
-

??

-
-
-
-

Part III: Correlation between time of the day and solar energy production

-
-

??

-
-
-
-

Part IV: Correlation between time of the day and wind energy (test Sergey)

-
-

??

-
-
-
-

Part V: Correlation between supply and demand vs. import/export (test Gaojun)

-
-

??

-
-
-
-

Part VI: Correlation between consumption and weather

-
-
-

Test for Alex

-
-
-

test de push by ALex

-

Test for Bastien 4:35pm

+ + + +

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

+

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

+
+

+

Seasonality test for wind power generation.

+
## 
+## Welcome to googleVis version 0.5.10
+## 
+## Please read the Google API Terms of Use
+## before you start using the package:
+## https://developers.google.com/terms/
+## 
+## Note, the plot method of googleVis will by default use
+## the standard browser to display its output.
+## 
+## See the googleVis package vignettes for more details,
+## or visit http://github.com/mages/googleVis.
+## 
+## To suppress this message use:
+## suppressPackageStartupMessages(library(googleVis))
+
+
+

check this

+

table(DataSet$season_data)

+
+
+

Make sure the regression data only has numeric variables (and dummies). the lm input is a data.frame

+
+
+

Make sure the regression data only has numeric variables (and dummies). the lm input is a data.frame

+

m1<-gvisTable(regression_data,options=list(showRowNumber=TRUE,width=1920, height=min(400,27*(nrow(regression_data)+1)),allowHTML=TRUE,page=‘disable’)) print(m1,‘chart’)

+

```

+
+
+

Graf correlation matrix

+

library(corrplot) corrplot(cor(regression_data[1:25]), method = “color”, type=“upper”, order=“original”, tl.col=“black”, tl.srt=70)

+
+
+

Table correlation matrix

+

View(cor(regression_data[1:25]))

+
+
+

Regression1: consumption vc season

+

regression_formula_Cons.Seas = as.formula(“Cons ~ Winter.d + Spring.d + Summer.d”) Regression.Cons.Seas = lm(regression_formula_Cons.Seas, regression_data) Regression.Cons.Seas$coefficients

+
+
+

Regression2: CO2 vc Gen.Mix

+

regression_formula_CO2 = as.formula(“CO2 ~ Fuel + Coal + Gas + Nuclear -1”) Regression.CO2 = lm(regression_formula_CO2, regression_data) Regression.CO2$coefficients summary(Regression.CO2)

+
+
+

Regression3: Wind vc Season

+

According to corretalition table there is a high correlation between wind farm output and season. Here are the results of regression analysis with dummy variables:

+

\(Wind = 1555*Fall + -277*Summer +163*Spring +891*Winter\)

+
+
+

Regression4: Solar vc Season

+

regression_formula_Solar = as.formula(“Solar ~ Winter.d + Spring.d + Summer.d”) Regression.Solar = lm(regression_formula_Solar, regression_data) Regression.Solar$coefficients (Regression.Solar)

diff --git a/RTEAnalytics.Rproj b/RTEAnalytics.Rproj deleted file mode 100644 index 8e3c2eb..0000000 --- a/RTEAnalytics.Rproj +++ /dev/null @@ -1,13 +0,0 @@ -Version: 1.0 - -RestoreWorkspace: Default -SaveWorkspace: Default -AlwaysSaveHistory: Default - -EnableCodeIndexing: Yes -UseSpacesForTab: Yes -NumSpacesForTab: 2 -Encoding: UTF-8 - -RnwWeave: Sweave -LaTeX: pdfLaTeX diff --git a/helpers.R b/helpers.R new file mode 100755 index 0000000..d63cfd2 --- /dev/null +++ b/helpers.R @@ -0,0 +1,18 @@ +# list of packages required to run RTEAnalytics.Rmd + +get_libraries <- function(filenames_list) { + lapply(filenames_list,function(thelibrary){ + if (do.call(require,list(thelibrary)) == FALSE) + do.call(install.packages,list(thelibrary)) + do.call(library,list(thelibrary)) + }) +} + +libraries_used=c("corrplot","stringr","gtools","foreign","reshape2","digest","timeDate","devtools","knitr","graphics", + "grDevices","xtable","sqldf","stargazer","TTR","quantmod","shiny", + "Hmisc","vegan","fpc","GPArotation","FactoMineR","cluster", + "psych","stringr","googleVis", "png","ggplot2","googleVis", "gridExtra","RcppArmadillo","xts","DescTools", "dplyr") + +get_libraries(libraries_used) + +options(stringsAsFactors=FALSE) \ No newline at end of file