Google Search Console API R: Guide to get Started

google_search_console_api_r_amazing

This tutorial shows you how to setup a daily automated pull of Google Search Console data (formerly known as Google Webmaster Tools) using R. The example shows you how to save your Organic search data daily to get around Google’s current 90 day limit on historical data. Armed with this historical organic search data you can measure your content marketing and SEO efforts over months and years not just the previous 90 days. Take back your ‘not provided’ organic keyword data.

No previous knowledge of programming, APIs or R is necessary to complete this tutorial. Like my previous tutorial on getting started with the Google Analytics reporting API with Python (please check it out), I want to help you get started with a detailed example that you can use right now.

Google Search Console Data: What’s available via the API?

The data that we will use from the Google search console search analytics reporting includes: search query and organic landing page clicks, impressions, click-through rate, and ranking position. You can even see this search data for your images or videos indexed by Google. You can check out the full list of the data dimensions, metrics and breakdowns in Google Search Console’s documentation.

Google Search Console R Tutorial

1) Download and Install R.

2) Install R Studio. R Studio is an integrated development environment (IDE) that has a set of tools to make you more productive when working with R.

3) Launch R Studio and install the R packages we will use in this example: searchConsoleR & googleAuthR. In the Console module in R Studio (the bottom left pane) run the code below. Copy and paste or type each command shown below and press enter. This will install the packages via an online package repository called the Comprehensive R Archive Network or CRAN.

When your packages install successfully you’ll see a message in the console that says “package successfully installed…”

4) Download the Google Search Console R Analytics Script 

Save or copy the script below to your computer and open it in RStudio.

Google Search Console Service Account Authentication in R

5) Download and setup your client_secret.json. This tutorial uses a “server to server” authentication because we are going to be scheduling the R script to run daily via a batch file command. There is no web browser that is going to popup and let you authenticate and allow access to the R application. So you cannot use client side OAuth2 because we don’t have an “interactive environment” there is no web browser. You will be setting up a service account in the Google Developer Console that will be used in this tutorial.

Create a new Google Developers Console project with the Search Console API enabled. That link will automatically enable the search console API.

“Create a new project” should be showing in the drop down. Click continue to Create a new project.

Click on the “Go to credentials” button

On the Credentials screen in the first drop down “Which API are you using?” it should have “Google Search Console API” preselected. In the second drop down “Where will you be calling the API from?” choose “Other non-UI (e.g. cron job, daemon)”. For “What data will you be accessing?” choose the radio button for “Application data” and for “Are you using Google App Engine or Google Compute Engine?” choose the “No” radio button. Then click on the button “What credentials do I need?”. See the screen shot below for the selections:

google_developer_console_service_account_search_console_api_r

Type in a name for the “Name” field I named my service account “googlesearchconsole”. Make sure to copy and paste your service account id somewhere. This is your service account email address that you’ll need in step 6 below to allow access to your search console data for that user. Choose the “JSON” radio button for the “Key type” and click the “Continue” button.

google_developer_console_service_account_search_console_api_credentials_r

When you click continue your JSON credentials will be downloaded. A message will popup that says “Service account created The service account “googlesearchconsole” was given editor permission for the project. The account’s public/private key pair My Project-67237f1e6539.json has been saved on your computer. This is the only copy of the key, so store it securely.”

Make sure that you listen to the warning, you can’t go back into the Developer Console and download another copy of the JSON file so save it somewhere safe.

6) Give full permission for the service account email you just created in the admin of your search console account.

https://www.google.com/webmasters/tools/user-admin?hl=en&siteUrl=http://www.ryanpraski.com/

Replace http://www.ryanprask.com/ with your site URL for a direct link to the “Users and Property Owners” for your site Search Console permissions. Alternatively you can go to the Google Search Console home and click on your website (property). Then click on the gear icon in the top right > users and property owners. See the screen shot below.

google_search_console_service_account_add_new_user

Next click on the Add A New User button and paste in your service account email address and choose Full permission from the drop down.

Run the Google Search Console API Script with R

7) Open the google_search_console_daily.R script in RStudio that you downloaded in step 4 . The script will open in the top left source pane in RStudio. Make the edits below to configure the R script to pull your Google Search Console data.

• Line 14 add the path to your secret.json file for your service account that you downloaded and saved in step 5.
• Line 22 add the URL for your website as it appears in the search console (webmaster tools). Make sure to include http://.
• Line 45 add the file path you’d like to save the .csv Search Console data file that this R Script will create.
• Remember to save your R script after you make the updates.

8) Run your google_search_console_daily.R script. Click on Source at the top right of the top left pane or press Control + Shift + Enter. Alternatively highlight all the lines (select all) of the R Script and click Run at the top right next to the Source button. The screen shot below shows the script run successfully on my computer. As the script runs you’ll see the comments from the code printed in the Console pane at the bottom in black font. In red font you’ll see the URL query parameters used in request that is sent to the Search Console API end point. If everything has worked properly you will see ## Complete on the last line in the console. Your .csv file has been written to the folder you specified on Line 45 of the script.   

google_search_console_r_studio_script_run

9) Open the .csv search console data file in Excel. See the screen shot below for my search console query data. It looks like not a lot of people are searching for analytics related content on Google on a Saturday. 

google_search_console_r_excel_data

Automate a daily pull of the Google Search Console data

10) The original goal was to save a daily extract of the historical Google Search Console data so we could get around the 90 day Google limit on historical data. 

Create a .bat file file in your text editor. This is an automated way to run our R Script via a DOS Command. See the sample file below. In the first part of your file you’ll have to replace my path with your path to your R installation. In second part of the file you will need to replace my path with the path to your google_search_console_daily.R script. Save this .bat file in the same directory as your google_search_console_daily.R Script.

Create a task to run the R Google Search Console Task Daily

On Windows your Task Scheduler should be in your Control Panels > Administrative tools. Here is the path to the program on my computer.

In right navigation under Actions click on Create Basic Task. This will launch an easy to follow Wizard that will guide you through the necessary steps to create your task.

Step 1 Create a Basic Task- Name your task Google Search Console Daily R Script then click next.

google_search_console_r_automate_create_basic_task_1

Step 2 Trigger- Choose Daily for When you want the task to start then click next.

Step 3 Daily- Choose a start time a few minutes in the future so we can test to make sure our R script runs via this scheduled task. Then click next.

Step 4 Action- Choose start a program. Then click next.

Step 5 Start a Program- Click on the Browse button next to the Program/script field and select your .bat file. Recall my file was located here on my computer: C:\Users\praskry\Desktop\google_search_console_daily_R\search_console_pull_task.bat Then click next.

Step 6 Summary- This gives you a summary of the task including the trigger and action that were selected in the wizard. Then click finish.

google_search_console_r_automate_task_summary

After your task automatically runs, a new file called google_search_console_daily.Rout will be created in the same directory as your google_search_console_daily.R Script. You can open the “out” file via a text editor and it shows the details of the R Script that was run via our scheduled task and .bat file. If you R Script ran successfully your “out” file will show everything logged in the console shown in the screen shot in step 8 as well as time metrics showing how long the script took to run. If the script does not run successfully the “out” file will log the errors that occur. On the first run I would guess most errors for people will be around the file paths in the .bat file. If you have any questions getting this set up please let me know in the comments.

Automation Complete! Now everyday at 3:03pm my R Script will pull daily Google Search query data from the Google Search Console API automatically.

Combine your Daily Search Console CSV Files

Fast forward a few weeks. Our automated daily Google Search Console R API pulls are running smoothly. Now we want to put all the individual daily .csv files together for a full view of our search query data over time. To join the .csv files together we are going to use the R data.table package. Open RStudio. In the R console we are going to run the code shown below. For your file replace my path to list.files with your path to where the daily search console .csv files are saved on your computer. The pattern (file name filter) assumes that you didn’t change the names of your search console .csv files and all have “searchconsoledata” in the .csv file name. When you run this code you should see a dataframe  in R studio called “combodata” with all search query .csv files combined for all the days the automated R Script has run.

Extending the Tutorial

If you have any questions please leave a comment below or contact me directly. After you setup your daily query data pull by following the steps in the tutorial there are a few options for extending the tutorial even further. Below I’ll show you how to update the R scripts to pull even more search console data.

Pull the Previous 90 Days of Google Search Console Data

I recommend you also pull historical data for the previous 90 days. This way you have all available historic data saved. To do this simply change the start and end dates in the R Script on lines 18 and 19 as shown below:

Batching Requests to pull more than 5000 Rows

On line 40 change rowLimit to a number greater than 5000 for example rowLimit = 100000,  and add walk_data = c("byDate")) . This allows the script to batch the search console API requests by day to get the full data set when pulling the last 90 days of search data.

 

Pull Landing Page Google Search Console Data

The content on this site is for a very small niche audience, so I have a long tail of organic search terms. Google does not report on many long tail search queries in the search console reporting. The quote below is from the Google documentation.

“To protect user privacy, Search Analytics doesn’t show all data. For example, we might not track some queries that are made a very small number of times or those that contain personal or sensitive information.”

For my site this means many of the queries do not show in the search console. For my daily search console data pulls I also pull organic landing page data by changing the download_dimensions parameters in the R Script on line 25 as shown below:

For my daily search console data pulls I also pull data for page broken down by query. See the changes to R Script on line 25 below:

Pull Image or Video Google Search Console Data

Why would you want organic search data for images? Say you are a ski resort or a national park. Many of your visitors are searching for trail maps. You want to see historical organic search data to make sure trail map images from your site are ranking high and getting clicks. If images or videos indexed by Google drive organic traffic then you can get data on these content types by changing the type parameter on line 28 from ‘web’ to ‘image’ or ‘video’. See the example for ‘image’ below:

 

If you have any questions or think of a great way to analyze or visualize the Google Search Console data you’ve pulled from the API with R let me know in the comments. Thank you to Mark Edmondson for creating the searchConsoleR & googleAuthR packages. My tutorial was inspired by and is based on my attempt to complete Mark’s tutorial post on how to automatically pull daily Google search console data.

  • ryanpraski@gmail.com

    test

  • Hana Omori

    Hi Ryan,

    I’ve looked into both this and Mark’s tutorial but I’m still having a few issues. Using your method of authentication outside of scr_auth application with Google Search R resulted in some sort of improper file input, does the JSON file have to be in the working directory for it to pull correctly?

    Best,

    Hana

    • Hi Hana,
      Can you double check that the path to the JSON file in your R Script is correct. If you still have issues can you share a screen shot of the error?

      -Ryan

  • Marc

    Hi Ryan, I am eventually getting the error

    “Error: could not find function “rbindlist”” when trying to combine the data. Has something changed in the libraries or what might be the issue?

    • Hi Marc,
      You need to install then include the data.table package to use the rbindlist() function.

      -Ryan

  • Addie Olson

    Hi Ryan,
    is there a way to get the data from an email with “Restricted” access to the site like with a different scope, or does the user have to have full access?

    • Hi Addie,
      Yes the script should work with both “Full” and “Restricted” Google Search Console Users. Let me know if there is a specific error I can help you with.

      -Ryan

      • Addie Olson

        This is the error I’m getting:
        “Request Status Code: 403
        Error in checkGoogleAPIError(req) :
        JSON fetch error: User does not have sufficient permission for site “

        • Hi Addie,
          Did you complete Step 5 & 6 from the post? The service account you create needs to be added as a user and given access to the account.
          5) Download and setup your client_secret.json.
          6) Give permission for the service account email you just created in the admin of your search console account.

  • Bernardo

    I’ve been using this script and is a REAL time saver, I use it to extract my main queries on a daily basis the only downside is that google privacy policy doesn’t allow you to have all the kws.

    Keep the good work Ryan!

  • anthony dazhan

    hello, the script is awesome; however, I am encountering an issue I was hoping to get help on. When setup a batch for more than 10k rows to pull dimensions: ‘query’, ‘page’, ‘country’, ‘date’ I keep getting an error “Error in rbind(deparse.level, …) :
    numbers of columns of arguments do not match”.
    When I do 10k or less, it’ll work. Also, I use seo for excel too and I know that the data can be parsed that way because I do so with the GSC connector
    thanks!

    • Hi Anthony,

      Thanks for the question. It looks like there is a data anomaly in the Google Search Console data for September 2-6 2016. See Google’s response here: https://support.google.com/webmasters/answer/6211453#search_analytics

      I’d suggest trying again in a few days. If your issue persists please let me know.

      Best,
      Ryan

    • Hi Anthony,

      I tried running the script again this morning and it is working again on my end so Google must have resolved their issue. Is your query setup to walk_data byDate like the sample code below:

  • Haidar

    Great explanation, thank you !

  • Tope Falade

    Thanks for the tutorial.

    Everything works perfectly fine up until trying to automate the download, do you have any tips on where I could be going wrong. I’m trying to save the output to a network location, could this be the issue? The task scheduler output doesnt record any problems whatso ever !

    • Hi Tope,

      It could be an issue with the output on a network drive instead of the local drive. Try saving the output to a local drive instead. If you want to troubleshoot further email me ryan@ryanpraski.com

      Best,
      Ryan

      • Tope Falade

        Thanks Ryan, I’ve just given that a try too with no success. I’ll send you an email in the morning.

      • Tope Falade

        Hi Ryan,

        I found an easy work around for my task scheduler issue. There is an R package that allows you to Schedule R scripts on Windows: https://github.com/bnosac/taskscheduleR

        • Hi Tope,

          Glad to hear you got it working. taskscheduleR is a great package. It would be great if you could share your script. I’m sure other people would also value your solution.

          • Tope Falade

            Hey Ryan, sorry it wasn’t a script it was a add on. Link above.

  • Ashwin

    Great writeup Ryan. As a developer, this looks like something that can be automated. Do you think, a web tool that downloads the data everyday would be helpful ?

  • Artur

    Hi Ryan

    I’m getting an error

    Warning message:
    In data_parse_function(req$content, …) : No data found

    have you ever had the same problem?

    • Hi Artur,
      I would suggest that you double check in the Google Search Console (fka webmaster tools) that the URL matches what you have in your R Script. I would also check that for the date range that you are pulling in R there is data in the Google Search console web interface.

      -Ryan

  • But if i want download only positions, not ctr and impressions. How i can do that?

  • Alex

    Hi Ryan, this is great thank you. I’ve been having issues with authentication trying to get this to work across multiple domains. I’ve tried adding and verifying and additional domain the Google Cloud project credentials and generating a new key but keep getting a “JSON fetch error: User does not have sufficient permission for site”. Do I actually need to create a new Google Cloud project for every domain?

    • Hi Alex,
      You should only have to create one project and one JSON key. You will have to repeat step 6 for each domain and give the service account access in the admin of the search console for each domain.
      -Ryan

      • Alex

        Thanks that was a quick response. That was what I was doing but I realised I made a stupid mistake and was calling the non secure version of the domain!

  • Nemo

    Do you have experience using list_crawl_error_samples? I keep running into the following error: Error in if (inherits(X[[j]], “data.frame”) && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
    missing value where TRUE/FALSE needed
    any help would be greatly appreciated!

  • Pedro

    Hi Ryan,

    I’m trying to import search console data into a database but i’m having some problems.
    I’m using basic coding:

    library(RODBC)

    specificDB= odbcConnect(dsn =”DB”)
    sqlSave(specificDB, data, tablename=’SEO’)

    This is creating my “seo” table but is not making the insert of the data. Do you have any idea how to overcome this?

    • Hi Pedro,

      Are you getting an error message when you try to save the Google Search Console data into a database?

      -Ryan