Deciphering XP3 files

2023-10-08 20:38:46
Tagged:

I wish to have a pure python tool for extracting (and perhaps creating) XP3 archives, used by the KiriKiri visual novel engine. I'll use the free visual novel Shirogaku Misuken Man'yuutan Kyouto Yoru Genshou (η™½ε­¦γƒŸγ‚Ήη ”ζΌ«ιŠθ­šο½žδΊ¬ιƒ½ε€œεΉ»ζŠ„ο½ž) (homepage, VNDB) as a test subject. It can be downloaded from Freem!.

I've made an attempt at this before, but had difficulty because I couldn't find a good specification of the format, and I didn't fully trust the several tools that exist for this purpose–they fail to work with some files, which doesn't inspire confidence in their correctness. I have since found a blog post by Jakob Kreuze which looks to be a good reference. I'll work with that post and the source of several tools in various languages in hand.

Each XP3 file begins with a header, which starts with an 11-byte magic number, XP3\x0D\x0A\x20\x0A\x1A\x8B\x67\x01.

It appears (from reading xp3-extract.py from XP3Tools) that an XP3 file might contain more than one XP3 header (maybe only files made with Kirikiri Z?) within the first 4096 bytes of the file. I'll deal with that issue if I encounter it, but for now I'm going to assume files will have just one header at the top. From the start of the header, it contains (per Kreuze):

struct header {
    char     magic[11];
    uint64_t info_offset;
    uint32_t version;
    uint64_t table_size;
    uint8_t  flags;
    uint64_t table_offset;
};

The info_offset is a little-endian offset to (according to Kreuz) the table_size member of the header, relative to the start of the header. Why? It should always be 0x17, shouldn't it? Perhaps I'm misreading. XP3Tools doesn't seem to think that's what this offset means.

xp3-extract.py ignores everything after info_offset and jumps to info_offset. Then, if the next byte is 0x80, it jumps ahead 9 bytes, reads a 8-byte new offset, then jumps there (from the start of the header). Otherwise, it stays in place.

At the start of the file table, it reads the next byte, expecting it to be 0x01. If so, we're at the start of the file table.