API data-science-education Apps || r Static-Generators

Comparing Website Generators Over Time

Introduction

This article is the follow-up of Retrieving And Scrapping Archived Data With The Wayback Machine. Here I will display some results from the scrapped archived web site https://www.staticgen.com/ at seven dates, starting from May 2014 to August 2019.

A remark on my own account

This blog entry is the first time that I am using RMarkdown for programming and displaying R code. I am not yet skilled how to do this. For instance I have included all code chunks even with my comments and some of my abandoned trials of programming code. This works well for me as a reminder and learning experience. I am not sure if this is valuable for other people as well or if its distracting because of an overload of information. Comments on this isssue are welcome.

The data for this article comes from the previous article, and I will load them with the following code chunk:

### Load dataset
sg_crawllist <- readRDS("../../../data/sg_crawllist.rds")
sg_data_collection <- readRDS("../../../data/sg_data_collection.rds")
sg_names <- readRDS("../../../data/sg_names.rds")
sg_data <- readRDS("../../../data/sg_data.rds")

Setup

knitr::opts_chunk$set(
        message = F,
        error = F,
        warning = F,
        comment = NA,
        highlight = T,
        prompt = T
        )
### Set the global option options(stringsAsFactors = FALSE) 
### inside a parent function and restore the option after the parent function exits
if (!require("xfun"))
        {install.packages("xfun", repos = 'http://cran.wu.ac.at/')
        library(xfun)}
## Loading required package: xfun
## 
## Attaching package: 'xfun'
## The following objects are masked from 'package:base':
## 
##     attr, isFALSE
xfun::stringsAsStrings()

### install and load some important packages
### https://github.com/tidyverse/tidyverse
if (!require("tidyverse"))
        {install.packages("tidyverse", repos = 'http://cran.wu.ac.at/')
        library(tidyverse)}
## Loading required package: tidyverse
## ── Attaching packages ──────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.0     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   0.8.3     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
# lubridate, for date/times.
if (!require("lubridate"))
        {install.packages("lubridate", repos = 'http://cran.wu.ac.at/')
        library(lubridate)}
## Loading required package: lubridate
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
# reshape2, restructure and aggregate data using melt and dcast
if (!require("reshape2"))
        {install.packages("reshape2", repos = 'http://cran.wu.ac.at/')
        library(reshape2)}
## Loading required package: reshape2
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths

Introduction

This article is the follow-up of Retrieving And Scrapping Archived Data With The Wayback Machine. Here I will display some results from the scrapped archived web site https://www.staticgen.com/ at seven dates, starting from May 2014 to August 2019. The data for this article comes from the previous article, and I will load them with the following code chunk:

My analysis will concentrate on three issues:

  1. Development of the number of static web site generators as displayed by the website https://www.staticgen.com/.
  2. Name of static website generators ranked by the number of stars for their repositories as a proxy for their popularity.
  3. Relative rankings of 18 website generators among each other at seven different dates.

Other data (the number of forks, open issues, and followers on twitter) are not analyzed. They have, in my opinion, only a weak relationship with the diffusion of the web site generator. Perhaps the exclusion of twitter followers needs more reflection:

  • The display of the number of twitter followers started only around 2019.
  • As today (2019-07-31) only 37 static website generators have a twitter account.
  • Even leading static website frameworks (e.g., Next.js) have no twitter account.
  • The number of followers results not only from the popularity of the generator but also from exciting and well-written tweeds.

Growth of the Popularity of Static Websites Generators

> sg_count = NULL
> for (i in 1:length(sg_data_collection)) {
+   sg_count[i] <- nrow(sg_data_collection[[i]])
+ }
> sg_quantity <- data.frame(cbind(sg_crawllist[2], sg_count))
> sg_quantity$datetime <- as_date(as.POSIXct(sg_quantity$datetime))
> names(sg_quantity) <- c("Date of Archived Websites", "Number of Static Generators")
> ggplot(sg_quantity, aes(x = `Date of Archived Websites`, y = `Number of Static Generators`)) + 
+   geom_line() +
+   labs(title = "Growth of the Number of Static Websites Generators")

I started my data scrapping of the archived webpages of https://www.staticgen.com/ in May 2014. At that time the website listed only about 50 generators. Currently (August 2019) the website features 260 static site generators. The plot shows a step and continuous growing popularity of these applications.

Taking the first ten generators at every selected date

I am interested in the development of the leading group of website generators measured by their number of repository stars as a proxy for their popularity. The result is a list of 18 generators which were part of the leading group at least at one date under the observation period.

> get_first10 <- function(l) {
+   names_first10 = NULL
+   for (i in 1:length(l)) {
+     names_first10 <- c(names_first10, l[[i]]$name[1:10])
+   }
+   names_first10 <- dplyr::distinct(data.frame(names_first10), names_first10)
+   dplyr::rename(names_first10, Name = names_first10)
+   return(names_first10)
+ }
> 
> sg_names <- get_first10(sg_data_collection)
> as.list(sg_names)
$names_first10
 [1] Jekyll      Octopress   Pelican     Middleman   Docpad     
 [6] Hexo        Metalsmith  Harp        Wintersmith Assemble   
[11] Brunch      Hugo        GitBook     Gatsby      Nuxt       
[16] Next.js     VuePress    Docusaurus 
18 Levels: Assemble Brunch Docpad Docusaurus Gatsby GitBook Harp ... Wintersmith

Website generators ranked by their repository stars

Get ranked data

I have also stored the number of forks but will not display the plots here as they give not valuable insights.

> get_sg_data <- function(df, l) {
+   sg_df <- data.frame()
+   for (i in 1:nrow(df)) {
+     row_content = NULL
+     sg_vec = NULL
+     my_name <- df[i,]
+     for (j in 1:length(l)) {
+       my_rank <-  which(l[[j]]$name == my_name)
+       if (!purrr::is_empty(my_rank)) {
+         row_content <- append(row_content, list(Rank = my_rank, 
+                                                 Stars = as.integer(l[[j]]$repo_stars[my_rank]), 
+                                                 Forks = as.integer(l[[j]]$repo_forks[my_rank])))
+       } else {
+         row_content <- append(row_content, list(Rank = NA, Stars = NA, Forks = NA))
+       }
+     }
+   sg_vec <- append(list(my_name), row_content)
+   sg_df <- data.frame(force_bind(sg_df, data.frame(sg_vec)))
+   }
+   
+   names(sg_df) <- c("Name", "Rank.Stars.Start", "Stars.Start", "Forks.Start",
+                             "Rank.Stars.2015", "Stars.2015", "Forks.2015",
+                             "Rank.Stars.2016", "Stars.2016", "Forks.2016",
+                             "Rank.Stars.2017", "Stars.2017", "Forks.2017",
+                             "Rank.Stars.2018", "Stars.2018", "Forks.2018",
+                             "Rank.Stars.2019", "Stars.2019", "Forks.2019",
+                             "Rank.Stars.End", "Stars.End", "Forks.End")
+   
+   return(sg_df)
+ }
> 
> # bit.ly/SO-rbind-colnames
> force_bind = function(df1, df2) {
+     colnames(df2) = colnames(df1)
+     dplyr::bind_rows(df1, df2)
+ }
> 
> 
> sg_data <- get_sg_data(sg_names, sg_data_collection)
> sg_data
          Name Rank.Stars.Start Stars.Start Forks.Start Rank.Stars.2015
1       Jekyll                1       15422         115               1
2    Octopress                2        7947         209               2
3      Pelican                3        3468         116               3
4    Middleman                4        3207          60               5
5       Docpad                5        2194         183              10
6         Hexo                6        2143          79               6
7   Metalsmith                7        2129           8               7
8         Harp                8        1982         103               9
9  Wintersmith                9        1713          47              12
10    Assemble               10        1502          29              11
11      Brunch               NA          NA          NA               4
12        Hugo               13        1241          65               8
13     GitBook               NA          NA          NA              NA
14      Gatsby               NA          NA          NA              NA
15        Nuxt               NA          NA          NA              NA
16     Next.js               NA          NA          NA              NA
17    VuePress               NA          NA          NA              NA
18  Docusaurus               NA          NA          NA              NA
   Stars.2015 Forks.2015 Rank.Stars.2016 Stars.2016 Forks.2016
1       17917         79               1      23038        132
2        8600        261               3       9285        274
3        4150        152               6       5343        165
4        3713         46               8       4866         85
5        2433        214              13       2737        199
6        3619        180               4       7972        187
7        2870         28               9       4280         33
8        2530        136              10       3535        183
9        2074         43              14       2642         33
10       2143         61              12       2778         60
11       3828         27               7       4893         41
12       2768        154               5       7837        238
13         NA         NA               2      10904        354
14         NA         NA              18       1989         36
15         NA         NA              NA         NA         NA
16         NA         NA              NA         NA         NA
17         NA         NA              NA         NA         NA
18         NA         NA              NA         NA         NA
   Rank.Stars.2017 Stars.2017 Forks.2017 Rank.Stars.2018 Stars.2018
1                1      28075        150               1      32993
2                5       9533        278              24       1616
3                7       6422        123               7       7620
4                9       5525         79              10       6117
5               17       2870        182              18       2950
6                4      13735        332               3      20452
7               10       5451         45               8       6527
8               11       4175        217              12       4594
9               15       3036         32              15       3295
10              14       3214         23              14       3484
11               8       5701         86               9       6342
12               2      14023        366               2      22954
13               3      13896        625               5      17385
14               6       6867         97               4      18226
15              NA         NA         NA               6       9481
16              NA         NA         NA              NA         NA
17              NA         NA         NA              NA         NA
18              NA         NA         NA              NA         NA
   Forks.2018 Rank.Stars.2019 Stars.2019 Forks.2019 Rank.Stars.End
1         130               1      36464       7993              2
2          54              28       1678        171             36
3         195              10       8462       1534             10
4         118              14       6384        687             15
5         188              22       2990        250             24
6         220               5      24795       3376              5
7          52              11       7090        621             13
8         238              16       4740        308             16
9          36              19       3407        344             20
10         22              18       3637        249             18
11        123              13       6546        455             14
12        208               3      31527       3638              3
13        842               6      19970       2785              7
14        450               4      29684       3898              4
15        265               7      17253       1410              6
16         NA               2      33154       3610              1
17         NA               8      10625       1346              8
18         NA               9      10128        785              9
   Stars.End Forks.End
1      38245      8336
2       1705       175
3       8929      1583
4       6498       696
5       2992       251
6      27387      3627
7       7281       640
8       4791       318
9       3479       346
10      3698       256
11      6589       462
12     36867      4118
13     21030      3031
14     36546      5517
15     21347      1831
16     39303      4811
17     13370      1948
18     12876      1142

Facet plot of all 18 generators over time

> sg_temp <- select(sg_data, c("Name", starts_with("Stars")))
> order_names <- order(sg_temp$Name)
> sg_temp <-  sg_temp[order_names, ]
> 
> # SEE: bit.ly/SO-flip-row-col
> sg_stars <- data.frame(t(sg_temp[-1]))
> colnames(sg_stars) <- sg_temp[, 1]
> rownames(sg_stars) <- sg_quantity[, 1]
> sg_stars <- as_tibble(rownames_to_column(sg_stars, var = "Date"))
> sg_stars$Date <- as.Date(sg_stars$Date)
> sg_stars_long  <- melt(sg_stars, id.vars = "Date", 
+                  variable.name = "Staticgen", value.name = "Stars")
> 
> p <- ggplot(sg_stars_long, aes(x = Date, y = Stars)) + 
+   geom_line(aes(group = Staticgen)) +  
+   labs(x = "Date",
+      y = "Rank by Repository Stars",
+      title = "Comparison of Static Website Generators",
+      subtitle = "Ranked by number of repository stars") +
+   facet_wrap(~Staticgen, ncol = 3)
> p

One can see that Gatsby, Hexo, Hugo, and Jekyll have a long and ongoing growth curve. But there are also with Next.js and Nuxt two newcomers with very positive developments.

Bump Chart: Rank changes over time

With this plot, it is difficult to distinguish the relative position of these generators to each other. Instead of absolute values, it is better to use a comparison of the ranking position. This type of plot is called a bump charts. For the following code, I have heavily used explanations and code snippets of various websites:

ggplot2 theme for bump chars

For a better display, all articles suggest creating a specific theme for ggplot2.

> my_theme <- function() {
+ 
+   # Colors
+   color.background = "white"
+   color.text = "#22211d"
+ 
+   # Begin construction of chart
+   theme_bw(base_size=15) +
+ 
+     # Format background colors
+     theme(panel.background = element_rect(fill=color.background, color=color.background)) +
+     theme(plot.background  = element_rect(fill=color.background, color=color.background)) +
+     theme(panel.border     = element_rect(color=color.background)) +
+     theme(strip.background = element_rect(fill=color.background, color=color.background)) +
+ 
+     # Format the grid
+     theme(panel.grid.major.y = element_blank()) +
+     theme(panel.grid.minor.y = element_blank()) +
+     theme(axis.ticks       = element_blank()) +
+ 
+     # Format the legend
+     theme(legend.position = "none") +
+ 
+     # Format title and axis labels
+     theme(plot.title       = element_text(color=color.text, size=20, face = "bold")) +
+     theme(axis.title.x     = element_text(size=14, color="black", face = "bold")) +
+     theme(axis.title.y     = element_text(size=14, color="black", face = "bold", vjust=1.25)) +
+     theme(axis.text.x      = element_text(size=10, vjust=0.5, hjust=0.5, color = color.text)) +
+     theme(axis.text.y      = element_text(size=10, color = color.text)) +
+     theme(strip.text       = element_text(face = "bold")) +
+ 
+     # Plot margins
+     theme(plot.margin = unit(c(0.35, 0.2, 0.3, 0.35), "cm"))
+ }

Bump Chart for 18 Website Generators

> sg_temp <- select(sg_data, c("Name", starts_with("Rank.Stars")))
> order_names <- order(sg_temp$Name)
> sg_temp <-  sg_temp[order_names, ]
> 
> # SEE: bit.ly/SO-flip-row-col
> sg_star_rank <- data.frame(t(sg_temp[-1]))
> colnames(sg_star_rank) <- sg_temp[, 1]
> rownames(sg_star_rank) <- sg_quantity[, 1]
> sg_star_rank <- as_tibble(rownames_to_column(sg_star_rank, var = "Date"))
> sg_star_rank$Date <- as.Date(sg_star_rank$Date)
> sg_star_rank_long  <- melt(sg_star_rank, id.vars = "Date", 
+                  variable.name = "Staticgen", value.name = "Rank")
> Archive.Nr <- rep(c(1, 2, 3, 4, 5, 6, 7), 18)
> sg_star_rank_long <<- data.frame(cbind(sg_star_rank_long, Archive.Nr))
> 
> 
> 
> # SEE: https://www.statology.org/how-to-easily-create-a-bump-chart-in-r-using-ggplot2/
> ggplot(sg_star_rank_long, aes(x = as.factor(Archive.Nr), y = Rank, group = Staticgen)) +
+   geom_line(aes(color = Staticgen, alpha = 1), size = 1) +
+   geom_point(aes(color = Staticgen, alpha = 1), size = 2) +
+   geom_point(color = "#FFFFFF", size = 1) +
+   scale_y_reverse(breaks = 1:nrow(sg_star_rank_long)) + 
+   scale_x_discrete(breaks = 1:7) +
+   theme(legend.position = 'none') +
+   geom_text(data = sg_star_rank_long %>% filter(Archive.Nr == "1"),
+             aes(label = Staticgen, x = 0.7) , hjust = .5,
+             fontface = "bold",  size = 3) +
+   geom_text(data = sg_star_rank_long %>% filter(Archive.Nr == "7"),
+             aes(label = Staticgen, x = 7.3) , hjust = 0.5,
+             fontface = "bold",  size = 3) +
+   labs(x = "1:Jun 2014, 7:Aug 2019, 2-6: Jan (2015-2019)",
+        y = "Rank",
+        title = "Comparison of Static Website Generators",
+        subtitle = "Ranked by number of repository stars") +
+   my_theme()

With this bump chart, one can see which generators are rising in their popularity. These relative developments were hidden by an overall positive trend of static website generators.

Page created: 2019-08-01 | Last modified: 2019-08-02
comments powered by Disqus