My python XP3 tool can now extract unencrypted archives successfully. Probably it should be cleaned up a little, and there's room for some optimization, but this is a promising start. Still todo: setting the timestamp on the extracted files, handling encryption, creating archives.
I wish to have a pure python tool for extracting (and perhaps creating) XP3 archives, used by the KiriKiri visual novel engine. I'll use the free visual novel Shirogaku Misuken Man'yuutan Kyouto Yoru Genshou (白学ミス研漫遊譚～京都夜幻抄～) (homepage, VNDB) as a test subject. It can be downloaded from Freem!.
I've made an attempt at this before, but had difficulty because I couldn't find a good specification of the format, and I didn't fully trust the several tools that exist for this purpose–they fail to work with some files, which doesn't inspire confidence in their correctness. I have since found a blog post by Jakob Kreuze which looks to be a good reference. I'll work with that post and the source of several tools in various languages in hand.
Each XP3 file begins with a header, which starts with an 11-byte magic number, XP3\x0D\x0A\x20\x0A\x1A\x8B\x67\x01.
It appears (from reading xp3-extract.py from XP3Tools) that an XP3 file might contain more than one XP3 header (maybe only files made with Kirikiri Z?) within the first 4096 bytes of the file. I'll deal with that issue if I encounter it, but for now I'm going to assume files will have just one header at the top. From the start of the header, it contains (per Kreuze):
The info_offset is a little-endian offset to (according to Kreuz) the table_size member of the header, relative to the start of the header. Why? It should always be 0x17, shouldn't it? Perhaps I'm misreading. XP3Tools doesn't seem to think that's what this offset means.
xp3-extract.py ignores everything after info_offset and jumps to info_offset. Then, if the next byte is 0x80, it jumps ahead 9 bytes, reads a 8-byte new offset, then jumps there (from the start of the header). Otherwise, it stays in place.
At the start of the file table, it reads the next byte, expecting it to be 0x01. If so, we're at the start of the file table.
How is it that doing simple things requires so much power, these days? I was uploading a file to a new github release:
When displaying this tab, my GPU usage jumps from about 5% (on the page where I'm writing this, for example) to about 65%. It's got a little spinner and a little blue progress bar, and a blinking cursor, and that's it except for static elements. Why does it take more than half the power of a GTX 960 to display that? This makes no sense.
Who was it that said I should check in on my reading progress on the 15th of each month? Someone who doesn't know me very well, perhaps. I'm only a few days late, though.
I completed several Very Short Introductions on topics in philosophy. They weren't spectacular, but the time wasn't wasted, I guess.
I'm in the middle of Chapter 8 in The Great Conversation, on Plato. I've read enough about Plato that I'm not seeing anything new here, but I'll stick with it. The next chapter is on Aristotle, which I expect will be more helpful to me; I find Aristotle rather harder to understand.
I read Plato's Cratylus. Some interesting ideas about how names and things correspond, but bloated by too much uninteresting (to me, since I don't understand ancient Greek) discussion of etymology.
I read through all of Harry Potter and the Methods of Rationality over the course of a few days (a substantial undertaking–it's about 2000 pages). It's entertaining, but I found Harry to be even more annoying in it this time through than I did a decade ago. And since he's really channeling Yudkowsky, I've found that I similarly don't enjoy Less Wrong as much as I did a decade ago. It introduced me to some interesting ideas, so I'm glad I met it, but there are better places to learn philosophy.
I've resumed reading Reasons and Persons. It's very interesting, but it is really hard, sometimes, to keep a sense of where the argument is headed. Probably this is due at least in part to how slowly I'm reading it–I should bump up the priority in my scheduler so I see it more often. Anyway, at this point Parfit is working on discrediting the Self-interest Theory, and will be doing so for the next fifty pages or so. Then it's onward to the question of personal identity.
I've begun reading Probability Theory: The Logic of Science. Jaynes argues that probability theory should be understood as an extension of logic, and its results interpreted as reasoning from uncertain information. Seems very interesting, so far.
Also begun reading Chemical Principles by Zumdahl. Chemistry class in high school was so very long ago, but it's really worth understanding at least a bit of chemistry, so I hope this will be worth the time investment. I'll probably combine this with some MOOC or other.
Intertwingled: The Work and Influence of Ted Nelson continues to be moderately entertaining, but not really informative.
It's been a little more than a month since I last checked in with my reading. Maybe the 15th of the month would be a good, regular day to do that. Right, as if I could keep up a habit like that… well, it's good to dream.
Russell's The Problems of Philosophy continues to be moderately entertaining. I'm not exactly sure how Russell is going to build a useful theory of knowledge on the back of acquaintance, but we'll see how it goes. Should get clearer in the next few chapters.
I'm making slow progress on the Trek project. Gerrold's The Galactic Whirlpool was fun, but Sky's Death's Angel wasn't so good. Unless you're a completionist, I'd skip both of Sky's contributions to the Trek universe.
“We Owe It to Them to Interfere”: Star Trek and U.S. Statecraft in the 1960s and the 1990s offers an interesting perspective on the Federation's treatment of 'more primitive' cultures as a metaphor for the US's treatment of developing nations.
An extensive reading program is unlikely to provide sufficient contact with words beyond the 2000 most common families (Cobb, 2007). However, there are some possibilities for finding or deliberately constructing texts that will work better. An example is given of software that allows the reader to click a word and get a set of KWIC-style lines showing other uses of the word, either from some large corpus or, potentially, from a restricted one–the current text alone, perhaps, or a set of selected texts such as graded readers.
Swaffar (1985) observes that readers' knowledge of the genre of foreign-language text is important to their ability to understand it.
A couple of years ago, I wrote about "The Business, as Usual, during Altercations" that it was "simply unbelievable that the dilithium shortage could have reached such a critical stage galaxy-wide before anyone noticed", now it's 2021 and there's a chip shortage, caused by unexpected demand, and here we all are suffering for it. Sometimes sf is more realistic than expected, I guess.
Context matters for learning. We remember better when we are in the same context as we were when we learned. What exactly counts as context? The visual and auditory environment, our state of mind (including mind-altering substances), even the color of the paper used for studying and testing. It's not clear what all the contributing factors are, or to what extent each factor impacts learning. However, since we will often, in real situations, not have the original context available, we will achieve better results by varying the context, contrary to the usual advice to always study according to an exacting routine with as little variation as possible (Carey, 2014).
This doesn't take into account student affect, though it's plausible that, for at least some people, varied study environments will also have a positive impact on that.
Extensive reading could be reasonably effective for vocabulary acquisition, providing exposure to the most common 1000-2000 words a useful number of times, but it becomes less practical for rarer words. News articles seem to give the best variety of words, as compared with fiction and academic articles (Cobb, 2007).
Mozi, a Chinese philosopher from around the fifth century B.C., suggested a set of three criteria for testing claims:
Examine it by examining "the affairs of the first sages and great kings". What was accepted by authorities in the past has reason to be accepted now.
Examine its origins by "look[ing] at the evidence from the ears and eyes of the multitude". If people can judge the claim for themselves, then their judgments are evidence of the claim's acceptability.
Put it to use: "use it in governing the state, considering its effect on the ten thousand people". An acceptable claim should be beneficial.
These criteria don't necessarily get at truth, but Mozi may not have distinguished between what we would call truth and merely 'beneficial opinion' (Melchert & Morrow, 2018, pp. 77–80).
Carey, B. (2014). How We Learn: The Surprising Truth About When, Where, and Why It Happens. Random House.
Carey (2014) writes about the importance of forgetting in learning. He describes the forgetting curve and the spacing effect, as documented by Ebbinghaus (Carey, 2014, paras. 7.25).
Philip Boswood Ballard repeated performed an experiment in the early 1900s on more than ten thousand schoolchildren, documenting their ability to recall a poem. He found that the students' performance actually improved over the first four days or so, and then plateaued (Carey, 2014, paras. 7.62). This effect is observed when the material to be recalled involves imagery, but not for mere nonsense syllables, such as Ebbinghaus used.
Robert and Elizabeth Bjork describe a "new theory of disuse" which separates memory into two quantities: storage strength and retrieval strength (Bjork & Bjork, 1992). The former increases as the memory is used, but does not decrease. The latter decreases over time, and is strengthened by use. It is the behavior of the retrieval strength that is responsible for the spacing effect.
Traditional measures of vocabulary knowledge, such as Wesche and Paribakht's (1996) vocabulary knowledge scale (VKS), probably underestimate "hidden learning", when a word becomes more familiar to the reader, but not enough to count as fully known. A system such as described by Horst (2000, pp. 149–150) may be used to estimate this learning (Cobb, 2007, pp. 39–41).
Swaffar (1985) writes that it is important that the reader of a foreign-language text understand the purpose of the text and the cultural environment. She describes an experiment: two groups read one of two letters describing the marriage of a Hindu and Christian couple, each group (Indian and American) reading the letter in their native language. Both groups made significant misinterpretations, which are attributed to the unfamiliarity of readers with cultural elements.
Landow (2006, pp. 13–22) describes the different types of links that may be used in hypertext, and questions whether there is really any difference between an active reader pausing to perform searches for related material and a system automatically generating links to related material on demand, giving the example of such menus of links generated by Microcosm, or dictionary integration in Intermedia.
Haugeland (1985, pp. 48–52) writes on formal games, such as Chess or tic-tac-toe. In such a game, what counts are (only) the rules and the state of the game. This discussion is working toward a definition of a computer as an "interpreted automatic formal system".
Bjork, R. A., & Bjork, E. L. (1992). A New Theory of Disuse and an Old Theory of Stimulus Fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), Learning Processes to Cognitive Processes: Essays in Honor of William K. Estes (Vol. 2, pp. 35–67). Erlbaum.
Carey, B. (2014). How We Learn: The Surprising Truth About When, Where, and Why It Happens. Random House.
When reading books, I take notes on the book--brief notes indicating points of interest as well as longer notes, especially if I disagree with some point. The typical direct product of these is a chapter-by-chapter summary, with a greater or lesser degree of detail depending on my interest.
For ebooks, my notes are often organized around highlights:
Each night, a cronjob runs that extracts my highlights and comments from calibre and imports them into my database:
The imported annotations are automatically tagged inbox, and I have a weekly repeating item on my schedule to process items from this inbox. These are presented to me in a simple list, with highlights from the same source sorted together:
If those annotations are citations to some other book or article, I add the article to zotero, in an inbox collection, which I also work through weekly (removing the inbox tag from the quotation, but retaining it as a record). Otherwise, I use them to construct my notes--either the summaries mentioned, or topical notes.
Highlights in my database can be transcluded into other notes or blog posts, like this:
Transclusion preserves the source of the quotation automatically. For annotations that aren't to be included directly, they remain linked in my database to the source document, included in searches, so I can refer to them in the future. My topic-oriented notes then come out like these notes on Newcomb's problem. I typically write them in emacs using org-roam:
A post-save hook I've written automatically synchronizes my notes in org-roam with my database, translating org-roam links to ID links in the database, which can then be automatically exported in a form like this blog post. My database has support for making only some notes or some parts of notes public, so I can use this single system for all stages of the process.
For PDFs, I usually read either in polar or on my tablet (in Moon Reader+), manually adding the annotations to my database, since it's impractical to handle automatically. For paper books, I take notes on paper. The rest of the process is the same.