Migrating user data in Firestore for a Pokémon fan game

Nick Felker
13 min readMar 28, 2024

--

At the end of last year I shared a project I started five years ago: Pokémon as a Service. Since that time I’ve incrementally added a bunch of new content including Pokémon from the Paldea region.

When I created a design doc of what features had to be added, one was Terastallization. In Pokémon Scarlet and Violet, this is a phenomenon which allows your Pokémon to change its type in battle. This can quickly change the battle’s dynamic.

Bulbasaur wearing an assortment of terastallization crowns

When you catch a Pokémon in the game, their ‘Tera Type’ will be selected from one of its native types. However, players can also change their tera type to any type.

I want to replicate that kind of capability in the next release of my game. However, I haven’t been able to do that. When a user catches a Pokémon, the Pokémon’s data is encoded into four bytes, as shown below.

| BYTE 1                   | BYTE 2                                                  | BYTE 3   | BYTE 4       |
| Nature(3) | PokeBall (5) | Variant (4) | Gender (2) | Shiny (1) | Affectionate (1) | Form (8) | Location (8) |
| Timid: 3 | GreatBall: 1 | Var1: 1 | None: 0 | False: 0 | False: 0 | None: 255| Atlanta: 101 |

You can see there is no space to hold any other data. As there are 18 types in the game, I will need at least another 5 bits of information. Another missing feature is the Gigantamax factor from Pokémon Sword and Shield.

As I have a growing list of features I want to support, it became clear that I need to add a fifth byte to the game. In order to make this work, I need to update the encoding/decoding scheme and the tests. However, I can’t just change the code, I need to change all of the user data that has been stored in Firestore.

I’ve already made this migration, and it works well! There were a few days of bugs that I’ve worked out. There are a number of changes that I hope will allow me to add new features for many years. So this blog post will explain this updated design and how I performed this migration.

New Design

| BYTE 1                   | BYTE 2                                                  | BYTE 3                        | BYTE 4       | BYTE 5                                            |
| Nature(3) | PokeBall (5) | Variant (4) | Gender (2) | Shiny (1) | Affectionate (1) | GMax (1) | Unused | Form (6) | Location (8) | Tera Type (5) | Ability (2) | Ownership (1) |
| Timid: 3 | GreatBall: 1 | Var1: 1 | None: 0 | False: 0 | False: 0 | No: 0 | Nil: 0 | None: 255 | Atlanta: 101 | Normal (1) | Hidden Ability: 3 | Original: 1 |

After consulting with players, I settled on this new structure. You can see there are three new attributes which have been placed into the fifth byte. Additionally, the third byte has been refactored.

The third byte once stored just a Pokémon’s form as an index. In the Pokémon table, that index then corresponds to its form:

  'potw-412': {
species: 'Burmy', gender: ['male', 'female'],
needForm: true, syncableForms: ['plant', 'trash', 'sandy'],
type1: 'Bug',
tiers: ['Tiny Cup'],
pokedex: `To shelter itself from cold, wintry winds...`,
hp: 40, attack: 29, defense: 45,
spAttack: 29, spDefense: 45, speed: 36,
move: ['Bug Bite'],
moveTMs: [
'Snore', 'Electroweb', 'String Shot', 'Protect',
],
},

In Burmy’s lookup table, we can see that the indexes 0, 1, and 2 would correspond to the syncableForms array. Additionally, a value of 63 indicates no form at all. There are several checks so that an invalid index cannot be selected.

The Pokémon with the most forms, Alcremie, only has 63. That means I’ve never needed more than 6 bits within this byte. Perhaps one day GameFreak will outdo themselves, and on that day I’ll feel regret. But until then I’ve reclaimed two additional bits. One will be used for the Gigantamax factor. The other is currently unused!

The fifth byte includes three attributes. There is the tera type, which takes five bits. Then there’s an index for that Pokémon’s ability. Currently Pokémon do not have abilities, but this is a feature that I’ll now be able to add in the future. This might be a case of YAGNI, but I know if I don’t do it now it’ll be years before I try another migration. The last bit signifies whether you’re the original trainer of a Pokémon.

This is a lot of stuff, and the changes I made to the parser required a bunch of updates as well.

export interface Personality {
nature?: Nature
pokeball: PokeballId
variant?: number
gender: 'male' | 'female' | ''
shiny: boolean
affectionate: boolean
/** Whether this Pokémon has the ability to Gigantamax */
gmax?: boolean
form?: PokemonForm
location: LocationId
/** The type this Pokémon will naturally become when terastallizing */
teraType?: Type
/** The ability this Pokémon has in battle (if this ever happens) */
ability?: AbilityId
/** Whether this Pokémon originally belongs to the trainer (can have very specific mechanics) */
isOwner?: boolean
}

There are two key parts to this process, taking a personality string and parsing it, and taking a personality and generating a string.

export function toPersonality(personality64: string, id: number): Personality {
const pkmn = Pkmn.get(`potw-${id}`)
if (!pkmn) throw new Error(`No Pokemon exists potw-${id} w/${personality64}`)

const personality16 = toBase16(personality64).padStart(8, '0')
// Pad the start in the case of Hardy PokeBall
const personality16 = toBase16(personality64).padStart(10, '0')
// Breaks up the personality into base16 strings, where every 2 characters is one byte
// Byte 1
const byte1 = personality16.substring(0, 2)
const number1 = parseInt(byte1, 16)
// | NATURE (3) | POKEBALL (5) |
const natureIndex = number1 >> 5 // Apply mask
const nature = NatureArr[natureIndex]
const pokeballIndex = number1 & 31 // Apply mask
const pokeball = PokeballArr[pokeballIndex]
// Byte 2
const byte2 = personality16.substring(2, 4)
const number2 = parseInt(byte2, 16)
// | VARIANT (4) | GENDER (2) | SHINY (1) | AFFECTIONATE (1) |
const variantId = (number2 & 240) >> 4 // Apply mask
const variant = (() => {
if (variantId === 15) return undefined
return variantId
})()
const genderId = (number2 & 12) >> 2 // Apply mask
const gender = (() => {
if (!pkmn.gender) return ''
if (genderId === 3) return 'female'
if (genderId === 2) return 'male'
return ''
})()
const shiny = (number2 & 2) >> 1 === 1
const affectionate = (number2 & 1) === 1
// Byte 3 - Form
// Byte 3
// | GMAX (1) | UNUSED (1) | FORM (6) |
const byte3 = personality16.substring(4, 6)
const number3 = parseInt(byte3, 16)
const formIndex = (() => {
if (number3 === 255) return undefined
return number3
})()
const gmaxable = pkmn.gmax !== undefined
const gmax = (number3 & 128) !== 0 && gmaxable
const formIndex = (() => {
if (number3 === 63) return undefined
return number3 & 63
})()
const form = (() => {
if (formIndex !== undefined) {
// Do datastore lookup to get forms on base
if (!pkmn.syncableForms) return undefined
return pkmn.syncableForms![formIndex]
}
return undefined
})()
// Byte 4 - Location
const byte4 = personality16.substring(6, 8)
const number4 = parseInt(byte4, 16)
const location = locationArray[number4] ?? 'Unknown'

// Byte 5
// | TERA TYPE (5) | ABILITY (2) | OWNERSHIP (1) |
const byte5 = personality16.substring(8, 10)
const number5 = parseInt(byte5, 16)
const teraType = (() => {
if (isNaN(number5)) {
// Compute default tera type in a deterministic way
if (pkmn.type2) {
const teraModifier = ((number1 & 31) + (number2 & 12) + (number2 & 2) + number4) % 2
return [pkmn.type1, pkmn.type2][teraModifier]
}
return pkmn.type1
}
const teraIndex = (number5 & 248) >> 3
return types[teraIndex] ?? pkmn.type1
})()
const ability = (() => {
if (isNaN(number5)) {
return pkmn.abilities?.[1] ?? 'PlaceholderPower'
}
const abilityIndex = (number5 & 6) >> 1
return pkmn.abilities?.[abilityIndex] ?? 'PlaceholderPower'
})()
const isOwner = (() => {
if (isNaN(number5)) {
return true // We don't know for sure, so let's assume
}
const ownerIndex = (number5 & 1)
return ownerIndex !== 0
})()

return {
pokeball: pokeball as PokeballId,
nature: nature as Nature,
variant,
gender,
shiny,
affectionate,
gmax,
form,
location: location as LocationId,
teraType,
ability,
isOwner,
}
}

This function is quite long, and will require a lot of tests to make sure I’m not missing anything. You can see that I’ve done some specific checks based on a Pokémon’s lookup data for default values.

export function fromPersonality(personality: Personality, id: number): string {
const pkmn = Pkmn.get(`potw-${id}`)
if (!pkmn) throw new Error(`No Pokemon exists potw-${id}`)
// Byte 1
const byte1 = (() => {
const natureIndex = NatureArr.indexOf(personality.nature || 'Hardy')
const ballIndex = PokeballArr.indexOf(personality.pokeball)
return natureIndex << 5 | ballIndex
})().toString(16).padStart(2, '0')
const byte2 = (() => {
let variant = 15
if (personality.variant !== undefined) {
variant = personality.variant
}
let gender = 0
if (pkmn.gender) {
if (personality.gender === 'male') {
gender = 2
}
if (personality.gender === 'female') {
gender = 3
}
}
const shiny = personality.shiny ? 1 : 0
const affection = personality.affectionate ? 1 : 0
return variant << 4 | gender << 2 | shiny << 1 | affection
})().toString(16).padStart(2, '0')
const byte3 = (() => {
const gmaxable = pkmn.gmax !== undefined
const gmaxId = (personality.gmax === true && gmaxable ? 128 : 0)
if (personality.form) {
if (!pkmn.syncableForms) {
console.error(`No Pokemon syncable forms exist for potw-${id}` +
`/${personality.form}`)
return 63 | gmaxId // No form
}
const index = pkmn.syncableForms!.indexOf(personality.form)
if (index > -1) {
return index | gmaxId
}
}
return 63 | gmaxId
})().toString(16).padStart(2, '0')
const byte4 = (() => {
const locale = locationArray.indexOf(personality.location)
if (locale > -1) return locale
return 0 // Unknown
})().toString(16).padStart(2, '0')
const byte5 = (() => {
const teraIndex = types.indexOf(personality.teraType ?? pkmn.type1)
const abilityIndex = pkmn.abilities?.indexOf(personality.ability!) ?? 1
const isOwner = personality.isOwner ? 1 : 0
return (teraIndex << 3) | (abilityIndex << 1) | isOwner
})().toString(16).padStart(2, '0')
return toBase64((byte1 + byte2 + byte3 + byte4 + byte5).toUpperCase())
}

The string generation is equally complicated.

Testing

So with all these complications let’s actually make sure all these changes are valid. I’ve had to go through a lot of tests again and rewrite them because a lot of the values have changed. All of my hard-coded expectations required changes.

Tests are written using Ava, which is a useful tool when trying to run a lot of tests. At this point I’ve got roughly 100 different complex tests just to keep my head above water. Otherwise it’s very easy to miss something.

You can see an example of the changes below, though this is just one of the hundreds of test cases which had to be fixed.

  const BASCULIN = 550
const basculin = new Badge(B.Pokemon(BASCULIN, {
pokeball: 'repeatball',
variant: 1,
gender: '',
shiny: false,
affectionate: false,
form: 'blue_stripe',
location: 'AT-VIE',
}))
- t.is(basculin.toString(), '8C#h4043')
+ t.is(basculin.toString(), '8C#14g0gdy')

These tests also had to extend to new features I was including. For example, only a small number of Pokémon are legally allowed to have the Gigantamax factor. If one doesn’t, my parsing code automatically disables that.

test('Badge: Falinks cannot Gigantamax', t => {
const gmaxFalinks: B.Personality = {
form: undefined,
affectionate: false,
gender: '',
pokeball: 'cherishball',
location: 'US-MTV',
shiny: false,
variant: undefined,
gmax: true,
}
const nomaxFalinks: B.Personality = {
form: undefined,
affectionate: false,
gender: '',
pokeball: 'cherishball',
location: 'US-MTV',
shiny: false,
variant: undefined,
gmax: true,
}
t.is(B.fromPersonality(gmaxFalinks, 869), '1vMLUgO')
t.is(B.fromPersonality(gmaxFalinks, 869), B.fromPersonality(nomaxFalinks, 869))
})

Ensuring I had proper default cases was important, particularly as I later had to do a large data migration on user data in production:

test('Badge: Provide correct Tera types', t => {
const squirt = new Badge(Pokemon(7))

// Should be guaranteed Water-tera
t.log(squirt.toString())
t.is('7#3MfUhy', squirt.toString())
const personality = '3MfUhy'
const squirtPerson = toPersonality(personality, 7)
const personality16 = toBase16(personality).padStart(10, '0')

const byte5 = personality16.substring(8, 10)
const number5 = parseInt(byte5, 16)
const teraIndex = (number5 & 248) >> 3

t.is(personality16, '00F03F8462')
t.is(12, teraIndex)

t.is('Water', squirt.personality.teraType)
t.is('Water', squirtPerson.teraType)
})

Ribbons and Marks

Ribbons and marks were also features I wanted to bring from the main series games. Marks in particular are neat ways to create new game content over time without having to invent entirely new Pokémon. Conceptually they are just characteristics that give a Pokémon a unique tag and title.

For instance, a newly caught Pikachu could have the Lunchtime Mark and be referred to as a “Pikachu the Peckish” in the game.

A Pokémon can have several marks and countless ribbons. My design had to be able to support that, along with an uncertainty of how many to expect. Right now there are 53 marks and many more ribbons, making it hard to rely on my base64 encoding scheme.

Instead I decided to dedicate one character of any kind to represent a mark or a ribbon. These ribbons won’t be encoded into base64 so can stand on their own. In fact, you could use emoji to represent the marks in a literal way. For example, I could include “🍴” at the end of the Pikachu’s string representation.

I can also use a $ at the end of the string as a new delimiter, indicating that any character afterwards would use the ribbon/marks lookup table.

25#1vMLUg0$🍴

I then need to create a lookup table and amend the toLabel function to refer to it.

export const RibbonMarksTable: Record<string, RibbonMark> = {
'👑': {
kind: 'mark',
name: 'Mightiest Mark',
icon: 'menu-raid',
title: 'the Unrivaled',
description: 'A mark given to a formidable Pokémon',
conditions: 'Defeating a Pokémon in a six-star raid',
},
}
toLabel() {
const dbRes = Pkmn.get(`potw-${this.id}`)
if (!dbRes) {
return undefined
}
let res = dbRes.species
...
if (this.ribbons?.length) {
if (RibbonMarksTable[this.ribbons[0]].title) {
res += ` ${RibbonMarksTable[this.ribbons[0]].title}`
}
}
return res
}

Although a Pokémon can have multiple marks, only the one at the front of the array is used. This is sort of like a stack that I can peek on. In the future I’ll need to add a new capability to let players push a different ribbon/mark to the front.

All the things that could be done with marks and ribbons aren’t quite clear right now. However, since I’m doing this larger migration, this is the best opportunity to include changes for future proofing the game.

Firestore migration

What you saw above was just a fraction of all the tests and changes I had to make. This whole post is condensed in terms of time. I spoke to a bunch of players over the course of this design to get their feedback. One that was noted was that growing complexity would make it more likely to hit the 1MB limit for a document in Firestore, where all player data is stored.

They suggested updating the data type to go from a record of Pokémon IDs to a deeper-nested data structure that split Pokémon dex numbers and their personalities.

Basically the previous design was:

type TPokemon: Record<PokemonId, number>
const myPokemon: TPokemon = {
'1#Yf_4': 1,
'1#3L0m': 2,
'4#Yf_4': 1,
}

The new data format is:

export type TPokemon = Record<string, Record<string, number>>
const myPokemon: TPokemon = {
'1': {
'1hw043': 1,
'3LmGff': 2,
},
'4': {
'1hw044': 1,
}
}

As a player’s collection grows, you can see how you start saving data in this nested structure. However, this change did require another set of code and test changes to ensure things didn’t break. Thankfully, I had already developed helper functions scattered throughout the codebase like hasPokemon(...) and addPokemon(...) allowing me to make fewer changes.

Then all that I had to do was perform the migration. This was complicated by the fact that it was all in production. I tested my account first and it worked, but the broader deployment was more of a challenge. I shut down several features: raids, GTS, and others ahead of this change in order to minimize the risk to creating data compatibility glitches.

I had to write an adapter layer that would take in the older badge format and create a new one:

static from2023(pokemonId: string) {
const badge = new Badge()
if (typeof pokemonId !== 'string') {
throw new Error('Cannot interpret non-string badge')
}
if (pokemonId === 'potw-000') {
throw new Error('Cannot interpret badge string potw-000')
}
badge.original = pokemonId
const [id, personality, tags] = pokemonId.split('#')
badge.id = toBase10(id)
if (personality) {
badge.personality = Badge.toPersonality2023(personality, badge.id)
}
if (tags) {
badge.defaultTags = toDefaultTags(tags)
badge.tags = toTags(tags)
}
return badge
}

Then I incorporated that into a script I hoped to only run once.

async function forEveryUser(cb, lastDoc?: any) {
...
}

type NewPokemonMap = Record<string, number | FirebaseFirestore.FieldValue | undefined>

export const addPokemon = (user: NewPokemonMap, pkmn: Badge, count = 1) => {
const [id, personality] = pkmn.fragments
const key = `pokemon.${id}.${personality}`
if (user[key] && typeof user[key] === 'number') {
(user[key]! as number) += count
} else {
user[key]! = count
}
}

async function fixCorruptedData() {
const corruptedSet = new Set()
await forEveryUser(async (ref, user, uid, txn) => {
const {pokemon, ldap} = user
const newPokemonMap: NewPokemonMap = {}
let needsUpdate = false

for (const [key, value] of Object.entries(pokemon)) {
if (typeof value === 'number') {
corruptedSet.add(ldap)
needsUpdate = true
addPokemon(newPokemonMap, Badge.from2023(key), value as number)
newPokemonMap[`pokemon.${key}`] = admin.firestore.FieldValue.delete()
}
}
if (needsUpdate) {
console.log(newPokemonMap)
txn.update(ref, newPokemonMap)
}
})
console.log([...corruptedSet])
console.log([...corruptedSet].length)
}

async function main() {
await fixCorruptedData()
}

main()

However it was a bit more chaotic than that. The deployment process, updating the entire frontend and all the cloud functions, took a bit of time. In the interim some players found their data corrupted when a function encountered data it didn’t expect and changed player data in a way that was no longer valid.

Conclusion

It took about three days of off-and-on support to find all the glitches. Thankfully things stabilized soon after and the game is in a much better place now. With all of these changes, a bit overwhelming at first, the game is ready for several years of improvements and new features. Hopefully it’ll be a long time before I feel like doing this again haha.

Anyway the game’s source code is available on GitHub for anyone to checkout and host within their own community.

--

--

Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google