Type	Software
Platform	Linux

Promnesia (Link)

Improvements

Rethinking visits and normalized URLs

DB structure

Right now, if I have saved a reddit post with a big list of URLs in it (like this one, say), it gets added to the visits table once for each URL in the list. Wasted space isn't really that much of an issue, but it's pretty clear that in this case the DB model is diverging from the mental model of the data.

A more natural model might be: I have a context (which might be empty) that applies to (potentially) multiple resources, somehow identified. Currently, these identifiers are nominally normalized URLs, but they could be any string–even now, I have some from hypothesis that look like x-pdf%3A581e7123cddc6cccbf5d44798db73c63.

URLs don't make good identifiers

If I have a book in my database with an associated ISBN, it includes that ISBN in my data export in the form of a URL like https://www.worldcat.org/search?q=isbn:0538116706 (which presently gets normalized to worldcat.org/search, which is useless). What I really want is for the identifier to be something more like isbn:0538116706, and then if I'm visiting the Amazon page for the book or a library I want to be able to extract the ISBN and have promnesia notice that I have a note related to that identifier. Sometimes it's easier to extract that info than others, but I want it to be possible in principle.

I might also want those contexts to be associated with the worldcat URL in particular (and for something like hypothesis annotations this is pretty certain), but the most important identifier there is the ISBN, not the URL. For some other books, I might have both ISBN and DOI, plus a link to the publisher's page. Only the final of these is most reasonably treated as a URL.

There are a lot of popular sites that have some identifier that is better than a URL. A youtube video at https://www.youtube.com/watch?v=z8VhNF_0I5c might be youtube-video:z8VhNF_0I5c. A tweet at https://twitter.com/karlicoss/status/1195256775555059713 might be twitter-status:1195256775555059713. These identifiers can be reconstructed into URLs approximating the originals, or other relevant URLs like https://threadreaderapp.com/thread/1195256775555059713.html or https://mattw.io/youtube-metadata/?url=z8VhNF_0I5c&submit=true. That's maybe outside the scope of promnesia, but what is within its scope is that those last two URLs should map back to the same identifiers as the first two, so that I see my contexts related to a tweet even if I'm viewing it on threadreader.

Name	Role
karlicoss	Author

Relations

Relation	Sources
Discussed in	Promnesia \| beepb00p