Right now, if I have saved a reddit post with a big list of URLs in it (like this one, say), it gets added to the
visits table once for each URL in the list. Wasted space isn't really that much of an issue, but it's pretty clear that in this case the DB model is diverging from the mental model of the data.
A more natural model might be: I have a context (which might be empty) that applies to (potentially) multiple resources, somehow identified. Currently, these identifiers are nominally normalized URLs, but they could be any string–even now, I have some from hypothesis that look like
If I have a book in my database with an associated ISBN, it includes that ISBN in my data export in the form of a URL like https://www.worldcat.org/search?q=isbn:0538116706 (which presently gets normalized to
worldcat.org/search, which is useless). What I really want is for the identifier to be something more like
isbn:0538116706, and then if I'm visiting the Amazon page for the book or a library I want to be able to extract the ISBN and have promnesia notice that I have a note related to that identifier. Sometimes it's easier to extract that info than others, but I want it to be possible in principle.
I might also want those contexts to be associated with the worldcat URL in particular (and for something like hypothesis annotations this is pretty certain), but the most important identifier there is the ISBN, not the URL. For some other books, I might have both ISBN and DOI, plus a link to the publisher's page. Only the final of these is most reasonably treated as a URL.
There are a lot of popular sites that have some identifier that is better than a URL. A youtube video at https://www.youtube.com/watch?v=z8VhNF_0I5c might be
youtube-video:z8VhNF_0I5c. A tweet at https://twitter.com/karlicoss/status/1195256775555059713 might be
twitter-status:1195256775555059713. These identifiers can be reconstructed into URLs approximating the originals, or other relevant URLs like https://threadreaderapp.com/thread/1195256775555059713.html or https://mattw.io/youtube-metadata/?url=z8VhNF_0I5c&submit=true. That's maybe outside the scope of promnesia, but what is within its scope is that those last two URLs should map back to the same identifiers as the first two, so that I see my contexts related to a tweet even if I'm viewing it on threadreader.