What can be done in the new semantic web?

4 min readMar 14, 2025

In an earlier blog post, I gave a brief introduction to using AI to turn screenshots, or any data really, into a standardized structured format.

If I want to connect my Goodreads account to the New York Public Library, I need to standardize on a particular Book format. Unfortunately, I couldn’t use the Book type from schema.org as-is. I created a simplified version which replaced richer types with simple strings.

Beyond these two services, I have thought about a platform that allows for a variety of microservices to pull or process your data. The end-user can install particular plugins which can tie into this platform, taking advantage of these standard schemas to make it easy to connect plugins together in data pipelines.

So I can define a Goodreads plugin:

import Plugin from "./_plugin"

const plugin: Plugin = {
author: 'Nick Felker',
description: 'Pulls all your Goodreads to-read books',
label: 'Goodreads To-Read',
source: 'default',
version: '2025-01-01',
onDataPull: async () => {
// ...
return true
}
}

export default plugin

This is just the initial scaffolding. Perhaps in the future I could add in more metadata. This code snippet shows you the concepts I’m going for. The user can enable a series of plugins which have different functions which are triggered based on platform events. A Refresh button can iterate through each plugin and run the onDataPull function.

What does this do? It can go to the Goodreads page in a browser, using browser automation to get around APIs, and then upsert a set of Book objects. These are richly typed and can actually be saved as files locally to make it easy for different services to access them.

So with the structured data returned from an AI response above, it can upsert the data to data/Book/the_new_yorkers.book.json.

export async function upsertData(type: string, json: string): Promise<void> {
const jsonObject = JSON.parse(json)
jsonObject['@context'] = 'https://schema.org'
jsonObject['@type'] = type
try {
fs.readdirSync('data')
} catch (e) {
fs.mkdirSync('data')
}

try {
fs.readdirSync(`data/${type}`)
} catch (e) {
fs.mkdirSync(`data/${type}`)
}

const filename = jsonObject.name || jsonObject.text || Math.random().toString()
const safeFilename = filename
.replace(/[^a-z0-9]/gi, '_')
.replace(/_+/gi, '_')
.toLowerCase();
fs.writeFileSync(`data/${type}/${safeFilename}.${type.toLowerCase()}.json`, JSON.stringify(jsonObject))
}

Files are alright, but not necessarily the ideal place to store them. If I want to query the data, every file would need to be read, processed, queried, and then available. But using this file structure maps very closely to a NoSQL database. I ended up installing a library Acebase which uses a Firebase-like interface that I’m familiar with while being able to run locally.

In my idealized end-state platform, I’d add a onDataSync event which can be called automatically after a refresh, which would then be able to sync all those files to whatever endpoint(s) you want. You could imagine having more than a single backend depending on the context. You could define a Firestore backend of your own for syncing all data and provide remote access. But you could also have a plugin which only reads Event objects and syncs those to your calendar, or Task objects and syncs them to your to-do list.

For now, I will just add support for Acebase in the upsertData function using a similar structure to the file structure.

// ...
await acebase.ref(`${type}/${safeFilename}`).set(jsonObject)

Once this works, my data is available as both files and in this NoSQL database. This means that I have immediate access to all my data. I can open them up as plaintext JSON files and modify them (although those modifications aren’t mirrored).

It’s still a work in progress

Now that this system is available, I can begin to pull in and access all my data. Then, I will be able to write higher-level short scripts which easily query my data and do simple operations like calculate the number of pages.

async function sumPages() {
const db = await loadDb()
const query = await db.query('Book').take(1000).get()
const numPages = query.map(v => (v.val().numberOfPages ?? 0))
return numPages.reduce((prev, next) => prev + next)
}

Since I know the data has well-defined fields, standardized regardless of where the Book object came from, it becomes straightforward to do these kinds of scripts.

And beyond books, I can use AI to transform any kind of content into a structured response and represent it as one of these objects.

This is currently where I’m at in the project. I’m still trying to develop a list of use-cases. Having a good replacement for the Goodreads API will at least help me regularly manage my library book list. Building a whole platform with plugins does feel a bit overkill to do that.

Going forward, when I do have more ideas for automation, I will at least know the approach will require getting data and doing something with that data. So this platform will probably end up being useful anyway. I could add additional book sources in the future and I could take advantage of existing scripts like the page summer above. At the moment it’s all in Nodejs scripts, but building a UI could make sense too.

There’s probably an opportunity for a Part 3 at some point. That will be focusing on the specific implementation of my end-to-end code and less on the platform as a whole. Once I got this done I can return to the platform idea and work through the next idea. And I will continue to explore how to use large language models to make this whole thing easier.

--

--

Nick Felker
Nick Felker

Written by Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google

No responses yet