Using the Google Drive API for public folders

Nick Felker
6 min readFeb 12, 2024

--

I grew up in a small town called Glassboro and while I don’t live there now I’d like to stay up-to-date on what’s going on. A good way of doing this is by reading the borough council minutes. These are all available on Google Drive.

I don’t necessarily want to go out of my way to remember to visit this once a month. Using a service like RSS would be a lot better. So I built a way to convert these minutes into an RSS feed.

To do this, I started with the Google Drive API. Normally the API is a bit of work to integrate since it’s normally meant for personal files and setup through OAuth. However, since I only want public files, I can just create a single API key in the Google Cloud Console and use a subset of the Google Drive API to download my files.

I found this out with a bit of searching on StackOverflow and did some experimentation. I figured I’d write this blog post to share what I’ve learned and hopefully others will benefit too.

Public Drive APIs

The Council Minutes are at a public URL with a particular ID. I’m going to save this ID. Then I can make a GET request to:

https://www.googleapis.com/drive/v3/files?q='${glassboroCouncilMinutes}'+in+parents&key=${apiKey}

This will create a JSON response with a bunch of files:

{
kind: 'drive#fileList',
incompleteSearch: false,
files: [
{
kind: 'drive#file',
mimeType: 'application/pdf',
id: '1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0',
name: '12-26-2023.pdf'
},
{
kind: 'drive#file',
mimeType: 'application/pdf',
id: '1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX',
name: '11-28-2023.pdf'
},
...
]
}

It’s straightforward. And if I want to be able to download one of these files directly, I can call:

https://drive.google.com/uc?export=download&id=${fileId}

Using the fs and request libraries, I can download this file to a local directory:

const download = (url, dest) => {
return new Promise((res, rej) => {
const file = fs.createWriteStream(dest);
const sendReq = request.get(url);

// verify response code
sendReq.on('response', (response) => {
if (response.statusCode !== 200) {
return rej('Response status was ' + response.statusCode);
}

sendReq.pipe(file);
});

// close() is async, call cb after close completes
file.on('finish', () => {
file.close()
res(url)
});

// check for request errors
sendReq.on('error', (err) => {
fs.unlink(dest); // delete the (partial) file and then return the error
rej(err.message)
});

file.on('error', (err) => { // Handle errors
fs.unlink(dest); // delete the (partial) file and then return the error
rej(err.message)
});
})
};

Putting these concepts together, I can create a module which exposes several capabilities together:

import * as fetch from 'node-fetch'
import * as request from 'request'
import * as fs from 'fs'

let apiKey: string | undefined = undefined

export function setApiKey(key: string) {
apiKey = key
}

interface DriveFileList {
kind: 'drive#fileList'
incompleteSearch: boolean
files: {
kind: 'drive#file'
mimeType: string
id: string
name: string
}[]
}

export async function listFiles(folderId: string): Promise<DriveFileList> {
const res = await fetch.default(`https://www.googleapis.com/drive/v3/files?q='${folderId}'+in+parents&key=${apiKey}`)
const data = await res.json()
return data
}

const download = (url, dest) => {
return new Promise((res, rej) => {
const file = fs.createWriteStream(dest);
const sendReq = request.get(url);

sendReq.on('response', (response) => {
if (response.statusCode !== 200) {
return rej('Response status was ' + response.statusCode);
}

sendReq.pipe(file);
});

file.on('finish', () => {
file.close()
res(url)
});

sendReq.on('error', (err) => {
fs.unlink(dest, (err) => {
rej(err?.message)
});
});

file.on('error', (err) => {
fs.unlink(dest, (err) => {
rej(err?.message)
});
});
})
};

export function downloadUrl(fileId: string) {
const dlPrefix = 'https://drive.google.com/uc?export=download&id='
return `${dlPrefix}${fileId}`
}

export async function downloadFile(fileId: string, filename: string) {
const dlUrl = downloadUrl(fileId)
await download(dlUrl, filename)
}

Creating an RSS Feed

I also want to turn this into an RSS feed so I can load it into my reader app and find new documents right away without spending time searching for them.

To make this faster, I actually created a library called standard-feeds which includes some types and helper functions for RSS.

Then I created a new module for Glassboro in which I can add in all of the Google Drive folder IDs that I want. Keep in mind that the Google Drive API does not appear to support nesting. I cannot grab every file in every directory in one go. I couldn’t find a way to get any folder at all. Since every minutes document is stored in folders by-year, I will eventually need to return to the source in order to get the folder ID for 2025 Minutes.

(If this is not the case, please let me know as I’d definitely like to streamline a bit of work here.)

I can write a function which returns an RssArticle array:

import { RssArticle, RssFeed, toRss } from "@fleker/standard-feeds";
import { downloadUrl, listFiles } from "./gdrive";

export async function getMinutesFeed(): Promise<RssArticle[]> {
const driveFolders = [
// Council Minutes 2023
'16BsP0MfJhPe1WEekmYiTNYpZgBycV-Yc',
// Planning Board 2024
'1nIHJPJNQakfWCHxOpeqb4BuWQE-4-Cnt'
]
const articles: RssArticle[] = []
for (const folder of driveFolders) {
const filesInFolder = await listFiles(folder)
for (const file of filesInFolder.files) {
const article: RssArticle = {
authors: ['Borough of Glassboro'],
link: downloadUrl(file.id),
title: file.name,
content: 'Click link to view minutes',
pubDate: new Date(file.name
.replace('PB MINUTES', '')
.replace('_001', '')
.replace(/(\d)\./g, '$1-')
.replace('-pdf', '')),
guid: file.id,
}
articles.push(article)
}
}
return articles
}

This process goes through each file of every folder and converts it into an RSS article. It’s an array of JSON objects of that type.

I can also use the Standard Feeds library to actually create an RSS feed with the toRss function.

  const feed: RssFeed = {
entries: articles,
icon: 'https://images.squarespace-cdn.com/content/v1/577196f35016e1776170568d/1476126101553-O3MYV288IPSHSDHGB4CJ/glassboro-logo-green-name.png?format=1500w',
lastBuildDate: new Date(),
link: 'https://drive.google.com/drive/folders/0B-l-QWJCLVkhdW52OHpZWFhLbm8?resourcekey=0-mbjKln7-ZtBHmHijO_ZIPg',
title: 'Glassboro Council Minutes',
}
console.log(toRss(feed))

When I run this, I get an output which looks great:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Glassboro Council Minutes</title>
<url>https://images.squarespace-cdn.com/content/v1/577196f35016e1776170568d/1476126101553-O3MYV288IPSHSDHGB4CJ/glassboro-logo-green-name.png?format=1500w</url>
<icon>https://images.squarespace-cdn.com/content/v1/577196f35016e1776170568d/1476126101553-O3MYV288IPSHSDHGB4CJ/glassboro-logo-green-name.png?format=1500w</icon>
<updated>2024-02-11T22:20:24.046Z</updated>
<id>https://drive.google.com/drive/folders/0B-l-QWJCLVkhdW52OHpZWFhLbm8?resourcekey=0-mbjKln7-ZtBHmHijO_ZIPg</id>
<link type="text/html" href="https://drive.google.com/drive/folders/0B-l-QWJCLVkhdW52OHpZWFhLbm8?resourcekey=0-mbjKln7-ZtBHmHijO_ZIPg" rel="alternate"/>
<itunes:block>yes</itunes:block>
<itunes:type>episodic</itunes:type>
<item>
<title>12-26-2023.pdf</title>
<description><![CDATA[Click link to view minutes]]></description>
<link>https://drive.google.com/uc?export=download&id=1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0</link>
<guid isPermalink="true">https://drive.google.com/uc?export=download&id=1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0?1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0</guid>
<id>https://drive.google.com/uc?export=download&id=1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0?1AKuAJcRhIz7ScbP5pMuMhCifebctS2T0</id>
<updated>2023-12-26T05:00:00.000Z</updated>
<published>2023-12-26T05:00:00.000Z</published>
<pubDate>Tue, 26 Dec 2023 05:00:00 GMT</pubDate>
<author>
<name>Borough of Glassboro</name>
</author>
</item>
<item>
<title>11-28-2023.pdf</title>
<description><![CDATA[Click link to view minutes]]></description>
<link>https://drive.google.com/uc?export=download&id=1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX</link>
<guid isPermalink="true">https://drive.google.com/uc?export=download&id=1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX?1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX</guid>
<id>https://drive.google.com/uc?export=download&id=1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX?1g8W2Sf60tvYob0wqeSGkL6XmPIx1IBsX</id>
<updated>2023-11-28T05:00:00.000Z</updated>
<published>2023-11-28T05:00:00.000Z</published>
<pubDate>Tue, 28 Nov 2023 05:00:00 GMT</pubDate>
<author>
<name>Borough of Glassboro</name>
</author>
</item>
...
</channel>
</rss>

Putting it all together

With my Google Drive module and my Glassboro feed, I can do a lot. But I’d definitely like to go beyond just reading these minutes. Any sort of advanced process will require going through a cloud service. I figured I’d use Google Cloud to make things a bit easier.

I can create a new main.ts with a script I can run at any point to get all of the files, download them from Drive, and upload them to Cloud Storage. To use Google Cloud Storage on my workstation, I did need to setup a service account.

import { RssFeed, toRss } from "@fleker/standard-feeds";
import { downloadFile, setApiKey } from "./gdrive";
import { getMinutesFeed } from "./glassboro";
import { Storage } from "@google-cloud/storage";
const key = require('../src/gloucester-gazette-efbc71e4fda6.json');

(async () => {
// Step 1. Gather all of the documents
setApiKey('PLACEHOLDER_API_KEY')
const articles = await getMinutesFeed()
console.info(` > found ${articles.length} articles`)
// Step 2. Create local store
for (const article of articles) {
console.info(`...downloading ${article.link} as ${article.title}`)
await downloadFile(article.guid, `${article.title}`)
}
console.info(` > downloaded ${articles.length} articles`)

// Step 3. Move document store to Google Cloud
const storage = new Storage({credentials: key})
const bucket = storage.bucket('glassboro-minutes')
for (const article of articles) {
await bucket.upload(`${article.title}`)
}
console.info(` > uploaded ${articles.length} articles`)
})()

This script is designed for flexibility. I can swap out different feeds if I ever want to support other towns or even other hosting services.

So this is where I am at right now. I had hoped that by using a Cloud Storage bucket I’d be able to shuttle these directly into Vertex AI for document summaries but that hasn’t been working out.

I found a codelab and went through it. The analysis works great on the sample’s document set but has been failing to get me any information using the data store of town minutes.

I’m going to keep looking at what tools Google has available because their Document AI suite of products is broader than what I’ve tried. Since my files are all in Cloud Storage, it’s actually easy to try others and see whether they work.

--

--

Nick Felker
Nick Felker

Written by Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google

No responses yet