Because splitting on file: gives a more accurate split where as /b/b2/ changes for each image.Guest wrote:Why not just parse through to the original file at https://baka-tsuki.org/project/images/b ... 1_000a.jpg and save that directly?Index wrote: Ignore the search part delimiter thats not even gonna be a problem. I plan to just split the page url at "file:" and use whats after as the file name.
And example of this would be.
Image Page Url: https://baka-tsuki.org/project/index.ph ... 1_000a.jpg
Where image would be "BTS_vol_01_000a.jpg"Code: Select all
let page = "https://baka-tsuki.org/project/index.php?title=File:BTS_vol_01_000a.jpg" let image = page.split(/file:/gi)[1];
I plan to use the external function call logic here in util like i did with the chapter name which should help keep the issue of other sites at bay.
I will also do this regardless of the resolution setting to prevent having to worry about extra parameters getting in the way,
Edit: There done.
Chrome Plug-in to Convert Web Page to EPUB
Moderators: thelastguardian, Fringe Security Bureau, Senior Editors, Senior Translators, Alt. Language Translator/Editor, Executive Council, Project Translators, Project Editors
-
- Mikuru's Master
- Posts: 28
- Joined: Fri May 08, 2015 7:00 pm
- Favourite Light Novel: Toaru Majutsu No Index
Re: Chrome Plug-in to Convert Web Page to EPUB
-
- Literature Club Member
- Posts: 31
- Joined: Fri Sep 19, 2014 10:02 pm
- Favourite Light Novel:
Re: Chrome Plug-in to Convert Web Page to EPUB
version 0.0.0.6 has just been submitted to Google and Mozilla. Chrome should be available in a few hours. Firefox version should be available now (just not validated.).
Fixes (supplied by Index a.k.a belldandu)
Fixes (supplied by Index a.k.a belldandu)
- Image Gallery box is reformatted, should be cleaner.
- Option to remove images from the gallery that appear elsewhere in the text.
- Files inside ePUB now have more meaningful names.
- ken_arata
- Literature Club Member
- Posts: 39
- Joined: Sat Sep 27, 2014 12:38 pm
- Favourite Light Novel: Kamachi Projects, Clockwork Planet, Campione, Fate Zero and many others
- Contact:
Re: Chrome Plug-in to Convert Web Page to EPUB
Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"
Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.
Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?
Sorry if I sounded rude, I'm just curious about what lead to current situation.
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"
Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.
Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?
Sorry if I sounded rude, I'm just curious about what lead to current situation.
I do... stuff.
-
- Astral Realm
Re: Chrome Plug-in to Convert Web Page to EPUB
Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error
Tried it with a couple of other series and had no problems there, so I'm not entirely sure what's going on here.
Code: Select all
Error packing EPUB. Error: TypeError: Cannot read property 'nextSibling' of null at BakaTsukiParser.insertAfter (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:219:29) at BakaTsukiParser.stripGalleryBox (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:160:14) at BakaTsukiParser.processImages (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:141:10) at BakaTsukiParser.epubItemSupplier (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:88:10) at packEpub (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/main.js:86:60) at chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/main.js:76:13
-
- Mikuru's Master
- Posts: 28
- Joined: Fri May 08, 2015 7:00 pm
- Favourite Light Novel: Toaru Majutsu No Index
Re: Chrome Plug-in to Convert Web Page to EPUB
https://github.com/dteviot/WebToEpub/pull/19 Fixes a bunch of craziness with epubcheck. As well as 1 other bug that flew under the radar >.>dteviot wrote:version 0.0.0.6 has just been submitted to Google and Mozilla. Chrome should be available in a few hours. Firefox version should be available now (just not validated.).
Fixes (supplied by Index a.k.a belldandu)
- Image Gallery box is reformatted, should be cleaner.
- Option to remove images from the gallery that appear elsewhere in the text.
- Files inside ePUB now have more meaningful names.
-
- Literature Club Member
- Posts: 31
- Joined: Fri Sep 19, 2014 10:02 pm
- Favourite Light Novel:
Re: Chrome Plug-in to Convert Web Page to EPUB
Because the converter will be working with other sites that don't follow the rules.ken_arata wrote:Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"
Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
Also, if you take the reduced size images, you get filenames like https://baka-tsuki.org/project/thumb.ph ... &width=427, which require some, ah... fixing.
My preferred ePUB reader works better with lots of small files, rather than one large one.ken_arata wrote: There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.
Because I did not know epubcheck existed.ken_arata wrote: Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?
Sorry if I sounded rude, I'm just curious about what lead to current situation.
I originally built the converter for me, and it works fine for the epub reader I use.
Making it available to others was, well, I hoped people would find it useful.
-
- Literature Club Member
- Posts: 31
- Joined: Fri Sep 19, 2014 10:02 pm
- Favourite Light Novel:
Re: Chrome Plug-in to Convert Web Page to EPUB
See: https://github.com/dteviot/WebToEpub/issues/18Guest wrote:Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error
-
- Mikuru's Master
- Posts: 28
- Joined: Fri May 08, 2015 7:00 pm
- Favourite Light Novel: Toaru Majutsu No Index
Re: Chrome Plug-in to Convert Web Page to EPUB
I preserve as much of the file name as possible while trying to keep it unique. The reason we are doing what we are doing is because we are trying to stay within file system limits and within the bounds of epubcheck.ken_arata wrote:Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"
Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.
Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?
Sorry if I sounded rude, I'm just curious about what lead to current situation.
The main thing is cross compatibility.
If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).
The main point here is that we don't care if you yourself have never had problems with them before. The issues caused by extensively long file names (especially on ntfs systems) are more annoying then you think. If the name is longer then 20 characters it will now get truncated in the middle and a number is appended to the beginning in case things aren't unique. This is to prevent any and all issues that could occur with any epub reader on any os.
Also it is actually smarter to split on h3 headings rather then having 1 big file. Epub is a zip file after all. By splitting on h3 headings (AKA chapters/parts) it makes it easier for anyone that wants to make small edits (after the source is no longer available), since they wont have to scroll for a while to edit what they want to edit.
As for the "many" bugs. Some of these bugs that show up only happen on specific pages and not every page. If we had time to go through every single page on baka-tsuki looking for slight format changes then we wouldn't need the help of our dear users who are testing now would we?
Also i'm pretty sure the extension would never have gotten this far if we weren't checking the output of the extension, hell there are even test scripts that check if the output is incorrect and scream at us if even the tiniest thing is different.
- ken_arata
- Literature Club Member
- Posts: 39
- Joined: Sat Sep 27, 2014 12:38 pm
- Favourite Light Novel: Kamachi Projects, Clockwork Planet, Campione, Fate Zero and many others
- Contact:
Re: Chrome Plug-in to Convert Web Page to EPUB
Ah that happens, I myself had no idea epub existed until sometime around last year when I was completely frustrated by how not-user-friendly pdfs are. Good to see that you're fixing it for increasing compatibility.dteviot wrote: Because I did not know epubcheck existed.
I originally built the converter for me, and it works fine for the epub reader I use.
Making it available to others was, well, I hoped people would find it useful.
I will continue to post stuff to improve it. (Right now I am thinking of adding an optional credits page. I have the html template almost ready, but I figured you're busy with the new fixes so I'm holding onto this for the time being.)
I was referring to the more common issues, like wrong syntax in the opf and the like. ;_;Index wrote:As for the "many" bugs. Some of these bugs that show up only happen on specific pages and not every page. If we had time to go through every single page on baka-tsuki looking for slight format changes then we wouldn't need the help of our dear users who are testing now would we?
I understand this, that's why I asked to split at main headings (chapter headings). That way I can get 5-10 files that I can edit easily. The parts (h3 and lower) are usually way too small to require a new file and it's actually confusing trying to find the correct file among almost 50-100 files in some of the epubs (one of the zashiki warashi epubs had 72 html files). It's up to you though, I won't be bugging about this any further.Index wrote:Also it is actually smarter to split on h3 headings rather then having 1 big file.
No idea how this works so I'm trusting you on this.Index wrote:The main point here is that we don't care if you yourself have never had problems with them before. The issues caused by extensively long file names (especially on ntfs systems) are more annoying then you think.
Thanks. Noticed that you fixed a bunch of stuff I was about to report.Index wrote:If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).
Also just realized that I could just change the naming scheme by myself, should've thought things through a bit more.
If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
I do... stuff.
-
- Literature Club Member
- Posts: 31
- Joined: Fri Sep 19, 2014 10:02 pm
- Favourite Light Novel:
Re: Chrome Plug-in to Convert Web Page to EPUB
That's about right. Although you'll want to either wait until I merge Belldandu's stuff into the Sonako branch https://github.com/dteviot/WebToEpub/branches, or grab from Belldandu. https://github.com/belldandu/WebToEpub/ ... e10Cleanupken_arata wrote:If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
Also, congratulations, you've just seen the beauty of Open Source software. If it doesn't quite do what you want, you can change it.
-
- Mikuru's Master
- Posts: 28
- Joined: Fri May 08, 2015 7:00 pm
- Favourite Light Novel: Toaru Majutsu No Index
Re: Chrome Plug-in to Convert Web Page to EPUB
both safeForFileName and makeStorageFileNameken_arata wrote:Thanks. Noticed that you fixed a bunch of stuff I was about to report.Index wrote:If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).
Also just realized that I could just change the naming scheme by myself, should've thought things through a bit more.
If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
-
- Literature Club Member
- Posts: 31
- Joined: Fri Sep 19, 2014 10:02 pm
- Favourite Light Novel:
Re: Chrome Plug-in to Convert Web Page to EPUB
And.... version 0.0.0.7 which fixes the problem has been sent to Google.dteviot wrote:See: https://github.com/dteviot/WebToEpub/issues/18Guest wrote:Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error
Problem is stories where the Image Gallery at the start lacks a Title.
Question: should someone update these stories on the Baka-Tsuki site?
-
- Mikuru's Master
- Posts: 28
- Joined: Fri May 08, 2015 7:00 pm
- Favourite Light Novel: Toaru Majutsu No Index
Re: Chrome Plug-in to Convert Web Page to EPUB
It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes to tags. And some are just doing their own thing with url's that are not in an . See https://github.com/dteviot/WebToEpub/issues/22
Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
Code: Select all
<desc>
Code: Select all
<a href
Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
-
- Astral Realm
Re: Chrome Plug-in to Convert Web Page to EPUB
Using the "remove duplicate images" option leaves a bunch of empty
scattered around in the illustrations html.
I mean, I guess it doesn't really affect the user-facing result so it's hardly high priority, but it does seem a wee bit untidy.
Code: Select all
<div>
</div>
<div>
</div>
I mean, I guess it doesn't really affect the user-facing result so it's hardly high priority, but it does seem a wee bit untidy.
Is there any chance of making adding the URLs to the image descriptions optional just as a workaround, in case of developers being lazy shits?Index wrote:It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes totags. And some are just doing their own thing with url's that are not in anCode: Select all
<desc>
. See https://github.com/dteviot/WebToEpub/issues/22Code: Select all
<a href
Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
-
- Astral Realm
Re: Chrome Plug-in to Convert Web Page to EPUB
Just tested and found that Moon+ Reader (https://play.google.com/store/apps/deta ... moonreader) does have this problem with showing the link below the image. The intricacies of HTML code have gone a little over my head though so I'm not quite sure what to say in an email to adequately describe the cause of the issue to the developer, would it be okay if anybody else more technically-minded could please shoot him an email? Developer's email is on the Play Store page.Index wrote:It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes totags. And some are just doing their own thing with url's that are not in anCode: Select all
<desc>
. See https://github.com/dteviot/WebToEpub/issues/22Code: Select all
<a href
Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.