The first thing I saw was the body, and it was lying to me.
オルディネス. Clean. Sharp. Every stroke where it belonged, a name written in a language I don't speak, rendered perfectly in a column that said kanjiname. The next row: 荳肴?晁ュ縲」蝣ソ縲」邯カ縲」隰ア縲ヲ. Then another one clean. Then three more garbled. Then clean again. Eight hundred and fifty-two records in a MySQL table, and roughly half of them had their faces rearranged.
Not missing. Not blank. Changed. The bytes were all there. Every last one accounted for, sitting in their columns, present and correct. But something had happened between the moment those characters were written and the moment I was reading them, and whatever it was, it had turned half the Japanese titles in this database into a language that didn't exist. Not Japanese. Not English. Not any encoding a machine was supposed to produce. Just wreckage. Just the aftermath of a translation that went wrong so many times in sequence that the original meaning had been beaten out of it.
Mojibake. That's the word. It's Japanese, naturally. 文字化け. Character transformation. The Japanese have a word for it because they've been dealing with it since before most American programmers knew there were characters outside of ASCII. It's what happens when System A writes bytes in one encoding and System B reads them in another. Nobody's wrong, exactly. Both systems are following their instructions. The problem is that the instructions were written in different centuries by people who never met, and the thing in the middle — the translator, the Presentation layer, the part of the stack whose only job is to make sure that what was said is what gets heard — just shrugged and did the best it could with what it had.
The best it could was butchery.
I need to tell you about the dead man.
He ran the mailing list. Not a mailing list — the mailing list. If you collected PC Engine games in the 1990s, if you traded them, imported them, argued about them, cataloged them, or just wanted to know what the hell Legendary Axe II was actually about, you ended up on his list. It ran on a university server — an edu address from a school that probably never knew it was hosting the entire social infrastructure of an underground import gaming scene — and for ten years, that list was the gathering place. Everybody who was anybody in the scene passed through that server at some point. Developers. Importers. The guy who later started one of the biggest gaming festivals in the country. People who worked for the console maker's American distribution arm before it folded. All of them, connected by one man's willingness to keep the lights on and moderate the conversations and maintain the archive.
He also built a database. A catalog of every PC Engine, TurboGrafx-16 and PC-FX title ever released — Japanese and American, with kanji titles, ratings, metadata, the whole index. He ran it on his server for years, a living reference document for a community that existed before YouTube, before Wikipedia, before anyone could just look things up. If you wanted to know what something was called in Japan, you checked his site.
And then he died. Heart attack. Sudden. No succession plan, because mailing lists don't have succession plans. They have a guy, and when the guy goes down, the server goes dark and the community scatters to whatever platforms will take them — but that comes later. In the first days there's just confusion and people trying to confirm if a Facebook post was true.
The database survived. Barely. Before he died, he'd exported it — a .sql dump, the kind of thing you generate with a single command and email as an attachment, the kind of thing that captures every row of every table in a format that's perfectly faithful to the data and tells you absolutely nothing about the environment it came from. What version of MySQL. What default character set. What collation. What the connection encoding was when the dump was generated. Whether the client that ran the export even knew there was Japanese in the file.
A .sql dump is a snapshot of a conversation frozen mid-sentence. It captures the words. It doesn't capture the language they were spoken in. You have to already know.
And the only person who knew was dead.
The site was still up. That's the part that twisted the knife. The dead man's hosting was paid through the end of the cycle, and the server was still running, still serving pages, still displaying the catalog to anyone who happened to visit. A ghost ship with the lights on, sailing on a dead man's credit card, and when the next payment failed, everything on it would vanish — the database, the site, the mailing list archives, all of it, gone the way things go when the last person who maintained them isn't around to renew the contract.
I started asking around. Trying to find out who had copies of what. Whether anyone had backed up the mailing list. Whether the database could be recovered before the hosting lapsed and took the whole thing down with it. That's when Jimmy surfaced.
Every job has a Jimmy. The mistake is thinking that tells you what kind.
This one was a friend. A fellow traveler from the same era, the same community, the same mailing list. When the dead man's server went dark, Jimmy was the one who'd stepped forward first. He had a copy of the dump — the dead man had sent it to him a couple of years earlier, maybe for safekeeping, maybe for migration, maybe because he'd already noticed something was wrong with the kanji and was hoping fresh eyes would help. Nobody knows. Nobody can ask.
Jimmy tried to solve it himself. Of course he did. He had the .sql file. He had a MySQL server. He imported the dump and looked at the data and saw exactly what I'd later see — half the kanji intact, half of them wrecked — and he did what any reasonable person would do. He started pulling on threads. He tried different client encodings. He tried conversion functions. He probably exported it again at some point, into a new dump, through a new tool, with a new set of encoding assumptions layered on top of whatever had already gone wrong.
He couldn't crack it. The garbled records stayed garbled. The clean ones stayed clean, which was almost worse, because it meant the answer was right there in the gap between them and he just couldn't see it. After a while — months, maybe — he saw me asking around about preserving the site and passed the file my way. Not because he gave up. Because the dead man's catalog was rotting and someone needed to save it and he'd run out of things to try.
Jimmy handed me a crime scene that had been worked once already by an earnest amateur. Every cop knows what that means. The evidence isn't gone. It's just been touched. And now you have to figure out which fingerprints belong to the original crime and which ones belong to the guy who was trying to help.
Here's what they don't teach you in your certifications, and if they do, your eyes glazed over and you missed it: character encoding is a treaty. It is an agreement between the system that writes the bytes and the system that reads them about what those bytes mean. And like all treaties, it works perfectly as long as both sides show up and nobody changes the terms without telling the other party.
Shift-JIS is a treaty Japan signed with its machines in the early eighties. It encodes Japanese characters as one or two bytes — ASCII-compatible characters get a single byte, kanji and kana get two, and the first byte of every pair falls in a specific range that tells the reader this is the start of a Japanese character, take the next byte too. It's clever. It's efficient. It was designed for an era when memory cost more than rent and every byte mattered.
MySQL's default character set, for a very long time, was latin1. Western European. One byte per character. No kanji. No kana. No concept that a byte above 0x80 might be the first half of something rather than a thing unto itself.
So here's what happens. You create a table. You don't specify a character set because you don't know you need to, or because the tools don't ask, or because in 1998 on a university server in America you're building a database for a niche gaming community and the idea that your character encoding configuration is a decision that will outlive you simply does not occur. MySQL defaults to latin1. You insert Shift-JIS data through a connection that's also set to latin1. And it works. The bytes go in. The bytes come out. Kanji appears on your website. Everything looks fine.
Everything looks fine because you've accidentally built a system that works by coincidence. The bytes are being stored faithfully — latin1 doesn't understand them, but it doesn't corrupt them either, because latin1 is a single-byte encoding and it just passes everything through like a courier who can't read the language on the envelope but delivers it anyway. The treaty is being honored by accident, by two systems that don't know they're in a treaty, because the man in the middle — the one who set it up, the one who knows that the bytes are Shift-JIS even though the column says latin1 — is alive and present and his server is running and nobody ever needs to ask the question. Until he dies. Until someone else exports the data and the tools read latin1 and take it at its word, reinterpreting every byte through a lens the original data was never meant for, converting faithfully in the wrong direction, compounding the damage with each step.
Each step is correct. Each tool is following its instructions. And each translation makes it worse, because you're not translating the original anymore. You're translating the last translation. It's a game of telephone played across decades and character encodings by systems that each believe they're the first one to touch the message.
And some records survive. That's the cruelest part. The titles that happen to be pure ASCII — English names, romanized titles, anything that falls below 0x80 — they sail through every conversion unscathed. They're the witnesses who were in the other room when the crime happened. They didn't see anything. They can't tell you anything. They just make the rest of the damage look selective and random when it's actually systematic and inevitable.
You solve it the way you solve any cold case. You study the wounds.
Mojibake isn't random. It's a signature. Every encoding misinterpretation produces a specific, predictable pattern of garbage characters. Shift-JIS read as latin1 produces one pattern. latin1 converted to UTF-8 produces another. Double-encoded UTF-8 — data that was already UTF-8, reinterpreted as latin1, then converted to UTF-8 again — produces that unmistakable garbage where an em-dash turns into  and a curly quote turns into a string of characters that looks like someone fell on a keyboard.
I pulled the garbled records and read the damage like a ballistics report. The patterns told the story the dead man couldn't. This wasn't random corruption. This wasn't disk rot or a bad sector or a truncated transfer. This was a specific sequence of encoding misinterpretations, each one technically correct, each one faithfully performing the wrong translation on data whose original language had never been declared.
The clean records confirmed the shape of the damage. ASCII titles had walked through untouched. Some Japanese titles had survived because they had never passed through the bad path, or because they had been entered later, through a different tool, under a different set of assumptions. The table wasn't random. It was a layered crime scene.
I built the decoder in the database itself. No budget, no tools, no authority — same as always. PHP functions, hex manipulation, mb_convert_encoding with explicitly declared source and target encodings. The trick was knowing what to declare. The garbled output told me the input chain: Shift-JIS bytes, stored as latin1, exported as latin1, interpreted as latin1, and I needed to run the whole sequence in reverse. Take the mangled UTF-8, undo the last conversion, undo the one before that, and get back to the raw Shift-JIS bytes that were sitting in a latin1 column on a dead man's MySQL instance a thousand miles and several years away.
The first title that came back clean — I sat there for a minute. Not because it was hard. Because it was the first proof that the dead man's work wasn't gone. The bytes had been there the whole time. They'd just been misheard. Three decades of unbroken chain from a Shift-JIS character table in a 1980s Japanese encoding standard, through a database that started on a university server in the late 90s by a man who loved a game console nobody else in America cared about, through an export tool that didn't ask the right questions, through a friend's hands that couldn't find the right answers, to a PHP function on my server that finally said the words in the right language.
Eight hundred and fifty-two titles. Every one of them recovered.
I built the site and posted it. Two URLs. One for the mailing list archive — tens of thousands of emails preserved from every iteration of the list, back to the days when the arguments were about which import shop had the best prices on PC Engine Duos and whether Dracula X was really worth what people were paying. One for the catalog — every game, every title, every kanji name restored, searchable, browsable, hosted on my server under a domain I've kept alive for thirty years.
The response was what you'd expect from a community that's been losing ground for two decades. A few people found it. A few people cared. Someone asked if the cheat codes could be recovered.
That should have been a dead end. The cheats weren't in the .sql dump. They'd been in a separate database, behind a login wall that kept the Wayback Machine from ever crawling the actual content. Different database, different system, no export, no backup. The kind of thing you write off and move on.
But the dead man's site had been rendering those cheats as HTML before the login wall went up. Somewhere in the crawl history, the rendered pages still existed — not the database, but the output. The testimony, not the evidence locker. I scraped what was there, parsed the HTML back into structured data, and loaded it into the catalog. Nine hundred and seventy-four cheat codes across three hundred and eighty-five games, recovered not from the database that stored them but from the web pages that displayed them, pulled back from the cache of a machine that had read them when the dead man's server was still alive to answer.
Some things stay dead. But sometimes the echo outlives the voice.
I don't know if Jimmy ever saw the finished site. I never circled back to tell him the encoding was cracked — one of those things you mean to do and then the weeks become months and the months become the kind of silence that gets harder to break the longer it goes. He handed me a broken .sql dump and a dead friend's legacy and I took it into a room and didn't come out until the kanji were clean, and by then the work had become its own conversation and the person who started it had receded into the background the way people do when you're deep enough in a problem that the problem is all you see.
He deserved better than that. He was the one who stepped forward. Who sat with a corrupted database and pulled on every thread he could reach, and when the threads ran out, he didn't throw it away. He passed it to someone else. That's not failure. That's the hardest thing a person can do with a problem they care about — admit they've hit the wall and trust someone else to get over it.
Every job has a Jimmy. In this one, he wasn't the thief. He wasn't the henchman. He wasn't the kid who hasn't learned yet. He was the man who knew when to pass the case.
The dead man's server is dark. The mailing list is silent. The community lives in fragments now — a subreddit here, a Discord there, a Facebook post that nobody set to memorial mode. Someone from his family showed up in a Reddit thread once to confirm what happened. Heart attack. Treadmill. Private service.
Translation sits between the data and the world. Its job is simple — take what was written and make sure what arrives is the same thing. The interpreter in the courtroom. The subtitle on the foreign film. The charset declaration at the top of the .sql dump that nobody thinks to check until the words come out wrong.
When it works, you never notice it. When it breaks, meaning doesn't disappear. It transforms. It becomes something else. Something that looks like data but reads like a dream someone else had in a language you almost recognize. You can stare at 荳肴?晁ュ縲」蝣ソ all day and never see the title that's buried under it, because the translation happened so many times in so many directions that the original is layered under strata of good intentions and bad assumptions, and the only person who could point at the first layer and say this is where it started, this is what it was died on a Tuesday afternoon and never told anyone where he kept the keys.
You don't fix that with a patch. You fix it with patience, and stubbornness, and a willingness to read the damage backward until you can hear what was originally said.
Eight hundred and fifty-two titles. Nine hundred and seventy-four cheat codes. Every last one.
Case closed.