How to extract Core Web Vitals & Lighthouse Insights in Bulk

Sitebee

Technical SEO
Staff member
This tutorial will show you how to extract the Core Web Vitals scores and Lighthouse insights in a convenient way that will save you time on collecting the data to help inform your technical SEO auditing and analysis.

I started to use this method back in Summer 2020 and have saved countless hours across reporting and analysis. I will share with you the methodology that I use to extract Core Web Vitals scores and Lighthouse insights across sites, key URLs, and competitor URLs.

The data extracted can be used for:
  • Site migrations
  • Tech SEO audits
  • Competitor reports
  • Pitching new clients
Later in this guide, I will touch on usages for the extracted data and how you can present to stakeholders and development teams.

Prerequisites & dependencies:
Before we start running the simple CLI commands there are a couple of prerequisites that we will be requiring to run the reports in bulk. Please ensure that you have downloaded and configured Node JS and the Lighthouse Batch Report applications. Basic knowledge of Windows PowerShell is advantageous, but not essential.

In this tutorial, I will be using Windows 10, if you are running an Apple OS or Linux I’ll assume you follow steps taken and replicate in your OS environment.

Node JS
Node JS is a browserless JavaScript runtime engine that is required to execute the Lighthouse Batch Reports. If you haven’t got Node JS installed on your computer, please download it and install it.

Download URL - https://nodejs.org/en/download/


Lighthouse Batch Reporter
We now need to install the Lighthouse batch. Following the installation of Node JS open Windows PowerShell and run the following command.

Code:
npm install lighthouse-batch -g

The image below is not quite an installation, more of an update. The command used line is the same. When you have installed Lighthouse batch close Windows PowerShell.

lighthouse-batch-ps-01.PNG


For other installation methods, there is detailed information on the NPM repository page - https://www.npmjs.com/package/lighthouse-batch

Run your first Lighthouse CLI Report
Now that we have libraries and packages required to start running the reports. Let us perform a test to double-check we have everything required to run the batch reports.

lighthouse-batch-ps-02.PNG


Open a fresh instance of Windows PowerShell. For this test, I will be running Lighthouse CLI on two on The Guardian UK’s homepage. Run this command in PowerShell. Copy/paste it rather than freehand type it.

Code:
lighthouse-batch -s https://www.theguardian.com/uk

Once the script has completed the finished Lighthouse report for that single page should be in - C:\Users\chris\report\lighthouse [replace \chris\ with your ID]. In the interest of this tutorial, let us call that folder the ‘Lighthouse Batch Reports’ In that folder there will be two files.

  • * summary.json
  • * www_theguardian_com_uk.report.json
Image - the summary.json file has topline stats for the URL.
lighthouse-batch-ps-04.PNG


The ‘www_theguardian_com_uk.report.json’ file has all the good stuff. Open this URL in your web browser - https://googlechrome.github.io/lighthouse/viewer/ and drag and drop the json file across.

lighthouse-batch-ps-05.PNG

That should open the full lighthouse report into the browser window. If the data appears as you would expect, then great, we have everything we need to run the bulk reports. This part was a test and you auto-generate the reports export folder.


lighthouse-batch-ps-06.PNG



Run your first Lighthouse Batch CLI Report
Now that we have the installation, configuration, and test report out of the way, we can move on to running a batch report.

Head back into the Lighthouse Batch Reports’ folder. Delete both JSON files, we do not need them any longer. Create a new text file, name it – sites.txt

Open that file in your preferred text editor. Paste in a list of URLs that you are seeking to extract the Lighthouse and Core Web Vitals data from. In this example, I shall be sticking with The Guardian website. I will be running the batch report on 20 URLs.

lighthouse-batch-ps-07.PNG


The reason I have opted for 20 URLs in this batch report is purely in the interest of saving time. It can take around 15-20 seconds to run the Lighthouse report checks and metrics per URL. If you are going to be running 100’s or 1000’s of URLs be mindful of the time needed to run the batch analysis.

In a fresh instance of PowerShell run the following command –
Code:
lighthouse-batch -f C:\Users\chris\report\lighthouse\sites.txt

It took around 7 minutes to run the Lighthouse batch reporter on those 20 URLs with an average file size of 1mb each JSON file. Again, mindful on storage if you plan you use guide on 1000’s of URLs.

lighthouse-batch-ps-08.PNG


For each of those JSON files (except summary.json) you can open the files with https://googlechrome.github.io/lighthouse/viewer/ to get the full GUI version of the report, per URL.

Part 2 – Import in Web Core Vitals & Lighthouse Scores into Google Sheets

In part 1, I explained how to extract the data using Node JS and Lighthouse Batch. In this post, I shall show you how to import the batch reports into Google Sheets.

lighthouse-batch-p2-01.PNG



Prerequisites
1, Access to upload data to a domain (FTP, etc)
2, Microsoft Excel (or alternate)
3, Google Account with access to Google Sheets
4, Winzip (or alternate)


Upload the JSON files to a subdomain or subfolder
Assuming you still have the list of JSON files in the C:\Users\chris\report\lighthouse\ - we need to upload them to easily accessible webspace. I recommend hosting the files on a domain (not file hosting). It is important to host the JSON files on a domain as we need to call elements of those JSON files in Google Sheets.

Add the JSON files to a zipped folder and upload them to your webspace. In this example, I created a sub-domain, although, uploading the zipped folder to a subdomain on your website is also fine too.

lighthouse-batch-ps-09.PNG


Once you have uploaded you have uploaded the zipped folder unzip it. You should have a list of the Lighthouse batch JSON files. When you have done that, test one of the JSON file locations in your web browser. If you can view the JSON file in your web browser, we are good to move on the next step.

Concatenate the file name with the website location
Time to grab the list of filenames in the C:\Users\chris\report\lighthouse\ folder and concatenate the filenames with the web address. This is a timesaver tip – saves you time manually pasting URLs into the Google Sheet.

For this, I will be using Microsoft Excel, if you do not have access to Excel search for “How To Get A List Of File Names From A Folder”. Create a new excel sheet – head to Data > Get Data > From File > From Folder.

In the folder location, paste: (remember to update with your folder location)

C:\Users\chris\report\lighthouse\

When the folder contents have loaded in the Excel sheet, delete the unwanted columns. We only need the JSON filename. Next up we are going to concatenate the filenames with the web address.

lighthouse-batch-ps-11.PNG



Copy/Paste the URLs into the Core Web Vitals & Lighthouse Scores into Google Sheet

Head to this Google Sheet and make a copy to your Google Sheets account.

https://docs.google.com/spreadsheets/d/1B_g8fsuejVKtCaYVjhP8U3snVnAnKRLugYUo8Z8bnho/edit?usp=sharing

Copy and paste the list of URLs into column B on both tabs. If the URL paths are correct you should start to see the columns populate with data from the Lighthouse JSON reports.


The tabs

1, Core Web Vitals
In columns D, E & F - I have populated the formulas to extract the Core Web Vitals data directly from the Lighthouse JSON batch reports using the ImportJSON Google Sheets script.
lighthouse-batch-ps-12.PNG


2, Lighthouse Scores
In columns D, E, F, G, H & I - I have populated the formulas to extract the Lighthouse Scores data directly from the Lighthouse JSON batch reports using the ImportJSON Google Sheets script.

lighthouse-batch-ps-13.PNG


In both tabs, I have populated 20 rows of data. If you have additional URLs, drag down the formulas for each column to extract the data.

And there you have it. That is how you can extract Core Web Vitals & Lighthouse Insights in Bulk and view the data in Google Sheets.

Feel free to comment below.
 
Last edited: