Catching up on news through Feedly’s API and ePub
For many years I’ve been an avid user of Feedly to manage and read the news. As of the time of writing, I follow 420 feeds including newsletters. This results in hundreds of articles reaching my account every day. Despite my attempts, it is difficult to keep up.
More recently, I got back from a trip where I was relaxing a lot and not checking my phone constantly. You should try it sometime! Unfortunately, this meant my Feedly inbox reached just about 2000 articles. It took a while to pare down that number to something more reasonable.
I decided I couldn’t catch up using my usual habit of reading the news over coffee each morning. I needed something more elaborate.
I already have experience with the Feedly API, so I decided to spend a little bit of time throwing together a small script that has helped me not just catch up on the news but stay caught up. And here’s how I did it.
I use the ReadEra Premium app to listen to books in the epub format. It works great to load digital books, turn on the audio output, and go on the subway. I can annotate and add notes like a normal reader too.
So I decided to put these together. If I were to turn my Feedly articles into an epub book, I would be able to listen to them at work in the background. Beyond just listening to the articles, I would be able to highlight key parts just like using Feedly directly.
Building the script
First, I defined all the basic scaffolding to make Feedly API calls:
const userId = '...'
const accessToken = '...'
const apiCall = async (path: string, method = 'GET', data?: any) => {
const options: any = {
method,
headers: {
Authorization: `Bearer ${accessToken}`,
Accept: 'application/json',
}
}
if (data) {
options.body = data
}
const res = await fetch(`https://cloud.feedly.com/v3/${path}`, options)
const json = await res.json()
return json
}
function dateToJournal(date: Date) {
return `${date.getFullYear()}-${(date.getMonth() + 1).toString().padStart(2, '0')}-${date.getDate().toString().padStart(2, '0')}`
}
Next, I will make a simple repetitive request to Feedly to grab every article and put it into an array. Not every article includes the content or even a few paragraphs, like the New York Times. There’s no point in including those in my output, so I just filter those out.
const includeImages = false
const articles: any[] = []
let continuation = undefined
while (true) {
const query = continuation ? `&continuation=${continuation}` : ''
const res = await apiCall(`streams/contents?streamId=user/${userId}/category/global.all&unreadOnly=true&count=250${query}`)
const items = res.items
if (!items) {
console.error('err', res)
return
}
continuation = res.continuation
articles.push(...items)
if (items.length < 250) break
}
console.log(articles.length, 'items')
const articlesToExport = articles
.filter(x => {
return x.content !== undefined
})
console.log(articlesToExport.length, 'filter-items')
Take a note how I defined includeImages
to false. This is intentional, and I will be using this later on. When I ran this the first few times, with over a thousand entries, I realized the file size was reaching 100MB.
That’s way too much, especially for the rest of my expected process. I don’t need the pictures anyway. If an article does have extensive photography, I can wait to look at those in Feedly.
For now, I want to transform my list of articles into HTML so that I can display it on a page.
const contents = articlesToExport
.map(x => {
let data =
`<pre>---${x.canonicalUrl ? `
url: ${x.canonicalUrl}` : ''}
feedlyUrl: https://feedly.com/i/entry/${x.id}
title: ${x.title}
pubDate: ${dateToJournal(new Date(x.published ?? x.crawled))}
author: ${sanitizeFrontmatter(x.author ?? x.origin.title)}${x.origin?.title ? `
publisher: ${sanitizeFrontmatter(x.origin.title)}` : ''}
---</pre>
`
if (x.content) {
data += `
<strong>By ${x.author} in ${x.origin.title}</strong>
<br>
<em>Published ${dateToJournal(new Date(x.published ?? x.crawled))}</em>
<br>
<div>
${x.content.content}
</div>
`
}
const doc = new JSDOM(data).window.document;
if (!includeImages) {
doc.querySelectorAll('img').forEach(e => {
e.remove();
});
}
data = doc.body.innerHTML;
return {
title: x.title,
author: x.author,
data,
}
})
There are a number of things happening here. At the start, I’m defining frontmatter which helps define the article and will help later on.
After the frontmatter, I add the article, embedded in some HTML. I use the JSDOM library to delete all images. This is important for filesize. It also is a bit of an overkill fix for a common library bug.
I return the data as an HTML string, in addition to the title and author of the article.
I found an old-school library which converts HTML into an epub file, epub-gen. It was fairly easy to use. Since each article is in its own list item, each can serve as its own chapter. That makes it a lot easier to use for navigation in ReadEra.
const Epub = require('epub-gen')
const option = {
title: `Your Evening Discourse for ${new Date().toDateString()}`,
publisher: 'Quillcast',
author: 'Evening Discourse',
// Canonical data type: https://github.com/cyrilis/epub-gen
content: contents
}
const render = new Epub(option, `/mnt/c/Users/Nick/Dropbox/Obsidian/FeedlySync-${Date.now()}.epub`)
await render.promise
console.log('done')
This bit of code above is quite specific and hardcoded for my personal setup. You can see how I have setup some basic book metadata and then pass in the contents
array for the chapters. Then, I save it to Dropbox.
Once it’s in Dropbox, it automatically syncs to my phone’s file system with DropSync.
Tonight, I set my entire script to run at 8:30am, capturing all the articles already in my inbox and those from overnight.
sleep 12h && node run lib/index.js
Then, when I leave for work, the file is already downloaded to my phone and available for listening. I could configure this into a cron job if I wanted too, so it happens automatically every morning at 9am or something.
The code that I’ve written is not quite ready for ‘production’ but has been running over the last month without any problems.
When I hear an interesting paragraph, I try to highlight it in ReadEra. Then, I want to make sure that all my annotations are properly gathered and organized.
Organizing Annotations
Normally when I finish a book in ReadEra on my phone, I can share all of the quotes to a file in Obsidian. That syncs all of the quotes to my vault through Dropbox. That works without issue.
However, I don’t necessarily want to do that here. At least, not in the same way. I already have a Feedly plugin for Obsidian which downloads annotations to individual files, one-per-article. That makes it easy for me to search later on, and the frontmatter is really handy here.
So I need to turn my one file of article annotations into many separate files. It can’t be one-per-annotation and it can’t be one overall. I need to be able to separate each file based on its corresponding annotation.
I wrote another script that can help with this. I just have to remember to make each article’s frontmatter the first annotation.
import * as fs from 'fs'
const input = `/mnt/c/Users/Nick/Dropbox/Obsidian/Feedly Annotations 12.md`
const folder = `/mnt/c/Users/Nick/Dropbox/Obsidian/Feedly Annotations/`
const sanitizeFile = (path: string) => {
return path.replace(/[()]/g, '')
}
const inputFile = fs.readFileSync(input, 'utf8')
const separator = '*****'
const notes = inputFile.split(separator).map(x => x.trim())
console.log(notes.length)
let i = 0
let buffer: string[] = []
let bufferTitle = ''
const dryRun = false
function saveBuffer(buffer: string[], bufferTitle: string, dryRun = false) {
if (buffer.length <= 1) return
bufferTitle = bufferTitle.replace(/[#⤴⤵/]/g, '')
const output = buffer.join('\n\n')
if (dryRun) {
console.log(output)
console.log(` save as ${folder}${bufferTitle}`)
} else {
fs.writeFileSync(`${folder}${bufferTitle}`, output)
console.log(` save as ${folder}${bufferTitle}`)
}
}
while (i < notes.length) {
const n = notes[i]
if (n.startsWith('---')) {
saveBuffer(buffer, `${bufferTitle}.md`, dryRun)
// Reset buffer
buffer = []
bufferTitle = ''
const lines = n.split('\n')
for (const l of lines) {
if (l.startsWith('title: ')) {
bufferTitle = l.substring(7)
}
}
buffer.push(n.replace(/---B.*/g, '---'))
i++
} else {
buffer.push(`> ${n}`)
i++
}
}
saveBuffer(buffer, `${bufferTitle}.md`, dryRun)
The code here uses a bunch of filepaths that I’ve hard-coded. Even the Feedly Annotations 12.md
has been a distinct file slowly incrementing to prevent files from being overwritten before I have a chance to process them.
You can see the code is fairly straightforward, putting each annotation into a buffer until it’s time to flush the buffer to a file. Then the process goes back.
Wrapping Up
There’s a lot of bad news out there. And a lot of good news. And various other things going on. I don’t know how healthy this is, as a constant stream of news in my head as I’m writing email makes it hard to pay attention to either. And given the tone of the news, it might be better to find more times for serenity.
Caveats aside, this little side project has definitely helped me achieve my goal of catching up with the news. Should I have? I guess that’s a question for another time.