by denormative » Sun Mar 24, 2013 5:10 am
rpapo wrote:denormative wrote:It's doable, just inelegant. Would work like my get-all-editors-for-a-volume script I guess.
The biggest problem is that because nothing enforces any particular naming convention for web page URLs, it is easy to wind up with missed pages. That could be corrected, of course, with a cleanup of the wiki.
I haven't written a web spider in a long time (since the 90s, I think).
How do you mean? Novel chapters of a volume not being translated or something?
There are some 'special pages' that may be already handling what you're asking about:
There are a bunch if orphaned pages that need to be cleaned up or re-linked:
http://www.baka-tsuki.org/project/index ... 0&offset=0 (Granted, probably about 20 of them are my fault when I was running around <ref>-ing old series'.)
And way too many dead-end pages:
http://www.baka-tsuki.org/project/index ... 0&offset=0 (Since all pages should be linking to the next/previous chapters and 'up' there really should be at least two links per page.)
But yes, the chapter names are a bit eccentric. Technically speaking it shouldn't be difficult to automate the renaming of all the volumes and chapters (of the 60ish volumes I'm PDFing, there's only half a dozen different patterns to match). And the wiki handles the reverse lookup of "what pages reference what pages" already, so it's merely somewhat more dangerous to do the reverse replace, not a technical challenge.
It's mainly a process issue: getting the approval of one of the higher ups to make the change; making sure people are notified before you do the rip-and-replace from underneath them (you don't need to put in thousands of redirects if you're doing this right); working out whether you want to handle the alt-language ones as well, or just english; and so on.
[quote="rpapo"][quote="denormative"]It's doable, just inelegant. Would work like my get-all-editors-for-a-volume script I guess.[/quote]
The biggest problem is that because nothing enforces any particular naming convention for web page URLs, it is easy to wind up with missed pages. That could be corrected, of course, with a cleanup of the wiki.
I haven't written a web spider in a long time (since the 90s, I think).[/quote]
How do you mean? Novel chapters of a volume not being translated or something?
There are some 'special pages' that may be already handling what you're asking about:
There are a bunch if orphaned pages that need to be cleaned up or re-linked: http://www.baka-tsuki.org/project/index.php?title=Special:LonelyPages&limit=500&offset=0 (Granted, probably about 20 of them are my fault when I was running around <ref>-ing old series'.)
And way too many dead-end pages: http://www.baka-tsuki.org/project/index.php?title=Special:DeadendPages&limit=500&offset=0 (Since all pages should be linking to the next/previous chapters and 'up' there really should be at least two links per page.)
But yes, the chapter names are a bit eccentric. Technically speaking it shouldn't be difficult to automate the renaming of all the volumes and chapters (of the 60ish volumes I'm PDFing, there's only half a dozen different patterns to match). And the wiki handles the reverse lookup of "what pages reference what pages" already, so it's merely somewhat more dangerous to do the reverse replace, not a technical challenge.
It's mainly a process issue: getting the approval of one of the higher ups to make the change; making sure people are notified before you do the rip-and-replace from underneath them (you don't need to put in thousands of redirects if you're doing this right); working out whether you want to handle the alt-language ones as well, or just english; and so on.