Watching what I eat: Using LLMs & AI to record nutritional data

Nick Felker
8 min readMar 18, 2024

--

Everyone wants to eat better, but it’s quite hard to know what nutrients you need. If you buy packaged groceries in stores, you will be able to pull everything from a nice nutrition label. However, in a restaurant you can't ask how many grams of sodium your dish has. Nor do you have the time to meticulously enter every datum into a database.

I’m a foodie and I go to restaurants a lot for a good meal. I take photos of everything I eat, because bragging about it is half the point.

In the age of AI, where large language models like Gemini-1.0 can not just process text but also photos, it gave me the idea that maybe my photos can be more than just social value. Maybe they can be valuable to my health too.

So I built a simple web app which lets me select a photo and have that nutritional data be captured and sent directly to my records in Google Fit. The app is simple but is very powerful.

In this blog post I’ll explain how I did this.

Gemini Multimodal API

I’m well aware of the limits of large language models as well as their potential. While I could build and maintain and deploy a custom vision model trained on millions of food photos, I knew I didn’t have the time nor skill for that. At the same time, I didn’t want to lean on the LLM for everything.

I started out with Gemini’s multimodal API, which allows me to send a photo and a short text prompt.

As shown above, this photo I took was quickly identified in a list that’s machine parseable. Giving it some examples helped it format the result in the correct way. From my web app, I can write a bit of code to take a selected image file and send it alongside the rest of the prompt. Also note that I had to shrink the image before sending it, as the full-resolution file was too large.

async classify(event: any) {
this.processing = true
const reader = new FileReader();
if (event.target.files && event.target.files[0]) {
const file = event.target.files[0]
const miniFile = await this.resizeImage(file)
reader.onload = async (event: any) => {
this.journalImage = event.target.result
const base64Data = event.target.result.split(',')[1]; // Remove the "data:..." prefix
const plate = await this.gemini.runClassifier(base64Data)
this.journalPlate = plate
const nutrients = await this.getAllNutrients(plate.map(p => p.foodKey))
this.journalUsda = nutrients
this.processing = false
this.generatePayload(nutrients, plate, this.journalTime)
}
reader.readAsDataURL(miniFile)
}
}

Nutrition Database

Once I have a reliable input, or at least one that’s parseable, I want to get away from AI and into something more deterministic. I found the USDA’s nutritional database, which is a comprehensive dataset of different foods and their nutritional data in a lot of detail. They even have a REST API that I can use.

Once I get my results from Gemini, I can parse that into an array.

async runClassifier(image: string): Promise<Plate> {
const parts: Part[] = [
{
inlineData: {
mimeType: "image/jpeg",
data: image
}
},
{text: " identify each food item in this photo and its portion size in a bulleted list with item and size separated by a comma\n\nfor example, if this was a picture of a hot dog, you'd return something like:\n- hot dog, 1, bun\n- french fries, 8, ounces\n- salad, 2, cups\n- pickled radishes, 3, pieces\n\nnow you try:"},
]
const result = await this.model.generateContent({
contents: [{ role: "user", parts }],
generationConfig,
safetySettings,
})
const response = result.response
const results = response.text().trim()
const rows = results.split('-')
return rows.map((row) => {
const cols = row.split(',')
if (cols.length > 3 || cols.length < 2) return undefined
return {
foodKey: cols[0].trim(),
portion: (cols[2] ?? '').trim(),
portionNum: parseInt(cols[1].trim())
}
}).filter(x => x) as Plate
}

Then I can use that array to make API calls against this database.

async getAllNutrients(foods: string[]): Promise<FdaNutrtionData[]> {
return Promise.all(foods.map(x => this.getNutrient(x)))
}

async getNutrient(food: string): Promise<FdaNutrtionData> {
const endpoint = `https://api.nal.usda.gov/fdc/v1/foods/search?query=${encodeURIComponent(food)}&dataType=Survey%20%28FNDDS%29&pageSize=25&sortBy=dataType.keyword&sortOrder=asc&api_key=${USDA_KEY}`
const res = await fetch(endpoint)
const json = await res.json() as FdaNutrtionData
return json
}

I made some UI affordances to let users edit the results after, both the kind of food and the portion size.

You can see here that I can show you the total calories of the meal alongside the list of food items. While not a perfect representation of what I ate, it does get me pretty far into the process and saves a lot of time having to manually do all that work.

But now I need a place to store this data.

Google Fit API

I like how Google Fit works passively on my phone. My phone is just in my pocket and as I go about my day it can record my walking. So it’d be great to add nutritional information too as a one-stop shop for all health-related information.

Google Fit has a REST API as well. Once I log into my Google account, with the correct Fit permissions, I am able to make API calls to add nutrition data.

The documentation has a REST API which explains the calls that you can make.

First, I have to create a data source. This is a representation of my app, Food Fotos For Fitness. I register this with the user’s Fit account.

try {
const res = await gapi.client.fitness.users.dataSources.create({
userId: 'me',
dataStreamName: 'NutritionSource',
type: 'raw',
application: {
detailsUrl: 'http://example.com',
name: 'Food Fotos for Fitness',
version: '2024.03.13'
},
dataType: {
name: 'com.google.nutrition',
}
})
return res
} catch (e: any) {
console.error('got error with ds', e)
if (e?.result?.error?.status === 'ALREADY_EXISTS') {
// Data Source: raw:com.google.nutrition:763294835062:NutritionSource already exists
return {
dataStreamId: (e?.result?.error?.message as string).match(/raw:.*?\s/)?.[0].trim()
}
}
throw e
}

As you can see above, I do need to add an awkward parser since it throws an error after the data source has been registered before. The API throws an error, but since we can catch it, that’s not a big deal.

Then I need to send a PATCH request to my data source with a particular dataset ID. This ID can be any value, but using timestamps is probably the best way to guarantee uniqueness.

The JSON request I send has to be fairly precise. The mapVal, intVal and strVal need to be in that exact order. I take all my nutritional data and convert it into a type that works with the Fit API:

convertUsdaToFit(foods: FdaNutrtionData[], plate: Plate, meal: Meal) {
console.log(plate)
const now = Date.now()
const datum: NutritionSource = {
minStartTimeNs: now * 1_000_000 - 1000,
maxEndTimeNs: now * 1_000_000 + 1000,
point: foods.map((food, i) => ({
startTimeNanos: now * 1_000_000 - 100,
endTimeNanos: now * 1_000_000 + 100,
dataTypeName: 'com.google.nutrition',
value: [{
mapVal: [
this.convertUsdaNutrientToFit(food, 'calories', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'fat.total', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'fat.saturated', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'fat.polyunsaturated', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'fat.monounsaturated', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'cholesterol', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'sodium', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'potassium', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'carbs.total', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'dietary_fiber', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'sugar', plate[i].portionNum),
this.convertUsdaNutrientToFit(food, 'protein', plate[i].portionNum),
].filter(x => x) as NutritionMapVal[], // Defined only
}, {
intVal: meal,
}, {
strVal: `${food.foodSearchCriteria.query}, ${plate[i].portionNum} ${plate[i].portion}`,
}]
}))
}
console.debug(datum)
return datum
}

convertUsdaNutrientToFit(food: FdaNutrtionData, string: Nutrient, portionSize: number): NutritionMapVal | null {
const usdaKey = NutrientFitToUsda[string]
const finder = food.foods[0].foodNutrients.find(n => n.nutrientName === usdaKey)
if (!finder) return null
return {
key: string,
value: {
fpVal: finder?.value * portionSize
}
}
}

Once this is all ready, I can throw all of the data into the Fit API.

async patchFitJournalEntry(journalId: string, data: NutritionSource) {
const res = await fetch(`https://www.googleapis.com/fitness/v1/users/me/dataSources/${journalId}/datasets/${Date.now().toString()}`, {
body: JSON.stringify({
...data,
dataSourceId: journalId,
}),
method: 'PATCH',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${this.xs}`
}
})
const resData = await res.json()
}

Here's a screenshot from my phone afterwards. The Fit app shows calories and you can click on each entry to see even more information.

Everything is stored in the backend and can be recalled through their API. So maybe one day I can extend this further and take full advantage of the larger corpus of nutritional info I have because the data is now so easy to add.

After a few more days of improvements, I’ve improved the UI quite a bit. I’ve also added a shortcut for water, since there’s a specific hydration data type and I drink several cups of water at work everyday.

On the Quantified Self

I remember how big the idea of Quantified Self was back when I was just starting college. With so many sensors and a phone in our pockets, we thought we could quantify everything and we’d be able to form a better understanding of ourselves, optimizing everything for peak performance. We could measure our fitness, our nutrition, our sleep, our air quality, and even our social interactions. Then all that data could be used to give us better insights.

That idea hasn’t failed, but it definitely didn’t succeed. While new devices like smart watches can better assess our physical well-being, they haven't really solved our physical ailments. They might make us hyper aware of otherwise normal conditions.

At the same time, AI has absolutely made huge breakthroughs in medicine. Its ability to do things like fold proteins and detect early cancer are developments that will make our lives better.

Maybe the problem was never with the theory. Maybe with enough data we could improve our physical health. Maybe the issue was that manually generating data was too hard, so people didn’t do it.

If the time of recording nutritional info is as fast as snapping a photo, it becomes trivial to create a larger corpus. While there’s still a manual step, I hope this project can inspire more efforts in the fitness space.

All of the code for this project can be found on GitHub:

And you can try it yourself by visiting the website: https://felker.dev/food-foto-fitness

(There might be some OAuth warnings since I don’t want to go through the complex certification process.)

--

--

Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google