Introducing the DHCSS Stakeholder Map workflow (part 2)

Setting up the authorisations needed to read from Google Sheets

This is the third blog in a series about how to set up a data pipeline using R, GitHub Actions, Kumu, and Shiny that was developed for the SADiLaR Stakeholder Map project. Our first blog described the launch of the Stakeholder Map project, and the second blog provided an introduction to the data pipeline and the software used. Here, we look at how to set up the authorisations needed to read data from Google Sheets. To view a recording where the data pipeline is explained, please see a presentation using this link: https://youtu.be/dAx6CWbraA4 and the slides using this link: https://doi.org/10.5281/zenodo.6473091.

The data pipeline (Fig 1) is divided into three stages, namely the input or source, data processing, and the output or destination. In our pipeline, the input consists of data sourced from a Google Sheet linked to a Google Form. The output is the data visualisations in Shiny and Kumu. For the purposes of the next three blogs, the data processing stage will be explained in three steps: 1) data import and manipulation, including the authorisations needed to read from and write to Google Sheets, 2) the data output files (a Google Sheet and a RData file) used to run the visualisations, and 3) GitHub Actions that are used to automate the process.

The data pipeline overview.
The data pipeline overview.

1) Data import and manipulation, and authorisations

The first step for the data processing is to read data from the Google Sheet. In R, there is a package called googlesheets4, which can be used to read from and write to Google Sheets. Authorisations need to be set up to enable this interaction, for which R scripts can run locally but need interaction (watch the recording given above for more information). However, if you want your script to run non-interactively, i.e. without any manual authorisations needed to enable access to the Google Sheets, you can set this up following the process below (Jenny Bryan gives useful pointers in this issue about using a service account for non-interactive authorisations):

  1. Create a Google Cloud Platform account
  2. Create a service account
  3. Create a service account key and download the .json file
  4. Make the service account email address an editor to your google sheet
  5. Point gs4_auth() to the .json from step 3

This was first setup to work locally; then it was adapted for GitHub Actions.

Resources and pointers for each step
  1. Create a Google Cloud Platform account
  2. Create a service account (Figs 2 & 3)
Create a service account for your project on Google Cloud Platform.
Create a service account for your project on Google Cloud Platform.
Create a service account for your project on Google Cloud Platform.
Create a service account for your project on Google Cloud Platform.
  1. Create a service account key (Fig 4) and download the .json file (Fig 5)
  • See here for more info on service account keys
  • .json file: For working on the R project locally, the downloaded .json file was moved into the R project folder. For the GitHub Actions though, this file needs to be encrypted / hidden. Find more information on this below.
Create a service account key.
Create a service account key.
Download the key .json file.
Download the key .json file.
  1. Make the service account email address an editor to your Google Sheet (Fig 6)
  • Open the .json file and copy the email address of the service account (this can also be found on your Google Cloud Platform service accounts web page)
  • In the sharing settings of the Google Sheet (for us, the one linked to the Google Form), add this email address and give it editing access. This enables the authorisations needed to access the Google Sheet using your service account.
Make the service account email address an editor to your Google Sheet in the sharing settings.
Make the service account email address an editor to your Google Sheet in the sharing settings.
  1. Point gs4_auth() to the .json file
  • gs4_auth() is the function in googlesheets4 that sets the authorisations
  • In your R script, you need to point gs4_auth() to the .json file, giving the path to where the file is located
  • e.g. direct path to the .json file in the r project directory:
    • gs4_auth(email = "[my email address]", path = "~/[path to file]/[filename].json")

2) The data output files

After setting up the authorisations and importing data from the Google Sheet, the R scripts (which can be found in our GitHub repository) manipulate the data into formats needed for the data visualisations. First, a script writes data to a Google Sheet in a format ready for Kumu. This spreadsheet is linked to Kumu and therefore updates the visualisation automatically. A second script saves dataframes for Shiny to a RData file. Another R script then creates and runs the Shiny app.

Where to next

The way in which the authorisations were set up above is all very well for running your script locally within your R project. However, for the purposes of this project, a workflow that would enable everything to run in an automated way, and without any interaction was needed - GitHub Actions is a way to do this. The main challenge for this though was figuring out how to make the .json file a secret. The .json file contains information to access accounts and should not be made public. Therefore, it is very important to encrypt this file and to make the password to access (decrypt) the file a secret. See the next blog in our series for how this was done, using the tokencoder package and GitHub Actions. The GitHub Actions automate the whole workflow without any interaction, from the input (Google Sheet linked to a Google Form) to the output (Kumu and Shiny).

References

Below are useful links.

Service account tokens, non-interactive, workflows for this
Non-interactive authorisations
GitHub Actions
GitHub Actions Yaml’s
gargle documentation
Secrets
GitHub Actions and Shiny
Dr Anne Treasure
Dr Anne Treasure
Research Analyst & Trainer

Related