Scroll Depth Tracking Analysis with Google Analytics R

scroll_depth_google_analytics_percent_page_viewed_R

71% of pageviews to my post on automated Google Analytics cost import scroll 50% of the page and 41% of pageview reach the comments section. This is a tutorial on how to make the scroll depth tracking report above using the googleAnalyticsR package.

Scroll depth reporting gives you insight into how users are engaging with your content. How far down the page do visitors scroll? With the out of the box Google Analytics implementation there is no way to know.

Scroll Depth Tracking Implementation

One plug-in that you can implement to measure the percent a user scrolls down your page is Scroll Depth from parsnip.io I recommend implementing the scroll depth plug-in via Google Tag Manager. Here is a recent post on how to implement scroll tracking via GTM. On my blog I track 25%, 50%, 75%, and 100% percent page viewed. I also track when the user reaches the comments section of the page. If you need help with implementing the plug-in let me know. The rest of the post is about how to best report on and analyze the page scroll depth data.

Scroll Depth Reporting

I found a lot written on how to implement scroll depth tracking, but very little written about how to report on and analyze scroll depth data. The scroll data is tracked using Google Analytics events. When you use events in Google Analytics to measure multiple site actions it can be challenging to report on. The Google Analytics event category, event action, event labels are report “dimensions” (not to be confused with Custom Dimensions) and the total events and unique events are report “metrics” (not to be confused with Custom Metrics). For the page scroll depth reporting 25%, 50%, 75%, 100% are captured in the event action “dimension” and the count of the occurrence of these percentages are captured in the total events “metric”.

But the question I really want to answer is how often do each 25%, 50%, 75%, 100% scroll depth get reached on each page? I want page to be my report dimension and I want the percentage of pageviews reaching 25%, 50%, 75%, 100% to by my report metrics.

Here is how the event reporting looks in the Google Analytics interface:

scroll_depth_google_analytics_events_ga

Here is how the same scroll depth reporting looks after transforming the data in R:

scroll_depth_google_analytics_percent_page_viewed_R

Looking at the scroll depth report generated in R above, you can see that the data is sorted by pageviews for the date range. The top page had 2569 pageviews. 90% of those pageviews scrolled down 25% of the page. 27% of the those pageviews continued scrolling and reach the comments (shown as percent_disqus in the report). The following section will describe how to run this R Script. No previous R experience is necessary. I’ll walk you through each step. Let me know if you have any questions.

Scroll Depth Report R Script Tutorial

1) Download and Install R.

2) Install R Studio.

3) Launch R Studio and install the R packages: googleAnalyticsR & tidyr. In the Console module in R Studio (the bottom left pane) run the code below.

> install.packages("googleAnalyticsR")

> install.packages("tidyr")

When the packages install you’ll see a message in the console that says “package successfully installed…”

4) Save or copy the script below to your computer and open it in RStudio.

5) Run the pageScrollDepth.R Script

If this is your first time running the googleAnalyticsR package you’ll need to authorize the script. Add your Google Analytics viewID on row 13 of the script. Make sure you are logged into your account with access to Google Analytics on your web browser. When you run the R script a Google webpage will open prompting you to choose your account.

googleAnalyticsR_account_authorization

Then you’ll need to allow access to googleAuthR as shown below.

googleAnalyticsR_account_authorization_2

Once the authentication is complete you should see a message saying “Authentication complete. Please close this page and return to R.”

googleAnalyticsR_account_authorization_3

6) Retrieve your Google Analytics viewID in R

#Use the Google Analytics Management API to see a list of Google Analytics accounts you have access to
my_accounts <- google_analytics_account_list()
View(my_accounts)

On rows 8 through 10 of the script retrieves a list of all the accounts that you have access to via the Google Analytics Management API. Then you are shown a view of the dataframe with this info called my_accounts. You can find the viewId in my_accounts for the Google Analytics view that you’ll use in the script without even opening the Google Analytics web UI!

7) googleAnalyticsR Scroll Depth Queries

#Page View Query
df1 <- google_analytics_4(my_id, 
                         date_range = c("2016-10-01", "2016-12-07"),
                         metrics = c("pageviews"),
                         dimensions = c("pagePath"),
                         anti_sample = TRUE)
#Event Query
df <- google_analytics_4(my_id, 
                               date_range = c("2016-10-01", "2016-12-07"),
                               metrics = c("totalEvents"),
                               dimensions = c("pagePath","eventLabel"),
                               filtersExpression = c("ga:eventLabel=~%|#disqus"),
                               anti_sample = TRUE)

Rows 15 through 27 include the googleAnalyticsR queries. The first query on row 15 pulls pageviews for pages and the second query on row 21 pulls the scroll depth event data by page. In your query make sure to specify your date_range on rows 17 and 23. The filtersExpression on row 26 includes #disqus because I track when visitors reach the comment section which has a div id=disqus_thread. Remove or change this filtersExpression if you don’t use disqus.

8) tidyr spread() Transforms Scroll Depth Data

#Use tidyr to make long data wide- move row data to columns
df2 <-spread(df,eventLabel,totalEvents,fill = 0)

The original Google Analytics R event query dataframe df is shown below.

scroll_depth_google_analytics_event_data_query_R

The query data is transformed with the spread() function from the tidyr package.

Function:       spread(data, key, value, fill = NA, convert = FALSE)

Arguments:
        data: df            data frame
        key: eventLabel     column values to convert to multiple columns
        value: totalEvents  single column values to convert to multiple columns' values 
        fill: 0             If there isn't a value for every combination of the other variables and the key 
                            column, this value will be substituted

In the spread() function the data is df which is the dataframe with the original event query data. The eventLabel is the key which has the page scroll depth data that we want broken out in multiple columns for each 25%, 50%, 75%, 100%, #disqus_thread.  The totalEvents is the value which will fill in the values for the multiple columns with scroll depth. The fill=0 fills in 0 for NAs.

The transformed dataframe df2 is shown below. This transformation takes the long data Google Analytics event data and makes it wide. You can read more about long and wide data here.

scroll_depth_google_analytics_event_data_query_transformed_tidyr_R

9) Merge Transformed Scroll Depth Data with Page Views Data

#merge dataframees with pageview metric with scroll depth metrics
df3 <-merge(df1,df2)

10) Calculate the Percentage of People that Reach Each Scroll Depth for Each Page

#calculate percentage of people reaching each 25, 50, 75, 100, disqus comment on the page scroll depth
df3$percent25 <-round(df3$`25%`/df3$pageviews,digits = 2)
df3$percent50 <-round(df3$`50%`/df3$pageviews,digits = 2)
df3$percent75 <-round(df3$`75%`/df3$pageviews,digits = 2)
df3$percent_disqus <-round(df3$`#disqus_thread`/df3$pageviews,digits = 2)
df3$percent100 <-round(df3$`100%`/df3$pageviews,digits = 2)

Showing the percent of total pageviews that reach each page scroll depth makes it easy to compare across pages with different numbers of pageviews. For example 100 people reaching the 50% scroll depth of the homepage is pretty meaningless. But 36% of the total pageviews to the homepage reaching the 50% scroll depth is a lot more meaningful. It also makes it easier to compare this percent across pages. For example 84% of the total pageviews to the about page reach the 50% scroll depth which is clearly greater than the homepage.

The code above creates new calculated columns for the percent of pageviews reaching each page depth and adds them to the dataframe df3. It rounds the percentage to 2 digits.

11) Cleanup the Page Scroll Depth Report

#remove the raw counts from dateframe for page scroll depth and only show percent reaching each page depth
df4<-df3[,c("pagePath","pageviews","percent25","percent50","percent75","percent_disqus","percent100")]

#sort dataframe by pageviews
df5<-df4[order(-df4$pageviews),]

The final step is to clean up the scroll depth report to only show the page as the dimension the pageviews as one metrics and then percentage of pageviews reaching each scroll depth as the other metrics. This is captured in dataframe df4. Then you sort the data in descending order of pageviews which is captured in dataframe df5.

12) Your Page Scroll Depth Report

Run >View(df5) in the R console to see the final page scroll depth report below.

 scroll_depth_google_analytics_percent_page_viewed_R

Use the Built In Google Analytics API v4 Pivot Functionality

With the new Google Analytics v4 reporting API there is a built in pivot table functionality. Rather than the spread() function from the tidyr package in step 8 you could use the built in pivot directly from the API. Below is a code example of how to use the pivot table functionality to pull the pivoted data directly from the Google Analytics API.

If you have any questions or think of other great ways to analyze and report on Google Analytics scroll depth data let me know in the comments.