Chrome Plug-in to Convert Web Page to EPUB

Forum for volunteer developers working on Baka-Tsuki related applications (Baka-Reader, BTprince, etc).

Moderators: thelastguardian, Fringe Security Bureau, Senior Editors, Senior Translators, Alt. Language Translator/Editor, Executive Council, Project Translators, Project Editors

Index
Mikuru's Master
Posts: 28
Joined: Fri May 08, 2015 7:00 pm
Favourite Light Novel: Toaru Majutsu No Index

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Index »

Guest wrote:
Index wrote: Ignore the search part delimiter thats not even gonna be a problem. I plan to just split the page url at "file:" and use whats after as the file name.

And example of this would be.

Image Page Url: https://baka-tsuki.org/project/index.ph ... 1_000a.jpg

Code: Select all

let page = "https://baka-tsuki.org/project/index.php?title=File:BTS_vol_01_000a.jpg"
let image = page.split(/file:/gi)[1];
Where image would be "BTS_vol_01_000a.jpg"
I plan to use the external function call logic here in util like i did with the chapter name which should help keep the issue of other sites at bay.

I will also do this regardless of the resolution setting to prevent having to worry about extra parameters getting in the way,

Edit: There done.
Why not just parse through to the original file at https://baka-tsuki.org/project/images/b ... 1_000a.jpg and save that directly?
Because splitting on file: gives a more accurate split where as /b/b2/ changes for each image.
dteviot
Literature Club Member
Posts: 31
Joined: Fri Sep 19, 2014 10:02 pm
Favourite Light Novel:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by dteviot »

version 0.0.0.6 has just been submitted to Google and Mozilla. Chrome should be available in a few hours. Firefox version should be available now (just not validated.).
Fixes (supplied by Index a.k.a belldandu)
  • Image Gallery box is reformatted, should be cleaner.
  • Option to remove images from the gallery that appear elsewhere in the text.
  • Files inside ePUB now have more meaningful names.
User avatar
ken_arata
Literature Club Member
Posts: 39
Joined: Sat Sep 27, 2014 12:38 pm
Favourite Light Novel: Kamachi Projects, Clockwork Planet, Campione, Fate Zero and many others
Contact:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by ken_arata »

Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"

Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.

Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?

Sorry if I sounded rude, I'm just curious about what lead to current situation.
I do... stuff.
User avatar
Guest
Astral Realm

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Guest »

Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error

Code: Select all

Error packing EPUB. Error: TypeError: Cannot read property 'nextSibling' of null    at BakaTsukiParser.insertAfter (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:219:29)    at BakaTsukiParser.stripGalleryBox (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:160:14)    at BakaTsukiParser.processImages (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:141:10)    at BakaTsukiParser.epubItemSupplier (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/parsers/BakaTsukiParser.js:88:10)    at packEpub (chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/main.js:86:60)    at chrome-extension://akiljllkbielkidmammnifcnibaigelm/js/main.js:76:13
Tried it with a couple of other series and had no problems there, so I'm not entirely sure what's going on here.
Index
Mikuru's Master
Posts: 28
Joined: Fri May 08, 2015 7:00 pm
Favourite Light Novel: Toaru Majutsu No Index

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Index »

dteviot wrote:version 0.0.0.6 has just been submitted to Google and Mozilla. Chrome should be available in a few hours. Firefox version should be available now (just not validated.).
Fixes (supplied by Index a.k.a belldandu)
  • Image Gallery box is reformatted, should be cleaner.
  • Option to remove images from the gallery that appear elsewhere in the text.
  • Files inside ePUB now have more meaningful names.
https://github.com/dteviot/WebToEpub/pull/19 Fixes a bunch of craziness with epubcheck. As well as 1 other bug that flew under the radar >.>
dteviot
Literature Club Member
Posts: 31
Joined: Fri Sep 19, 2014 10:02 pm
Favourite Light Novel:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by dteviot »

ken_arata wrote:Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"

Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
Because the converter will be working with other sites that don't follow the rules.
Also, if you take the reduced size images, you get filenames like https://baka-tsuki.org/project/thumb.ph ... &width=427, which require some, ah... fixing.
ken_arata wrote: There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.
My preferred ePUB reader works better with lots of small files, rather than one large one.
ken_arata wrote: Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?
Sorry if I sounded rude, I'm just curious about what lead to current situation.
Because I did not know epubcheck existed.
I originally built the converter for me, and it works fine for the epub reader I use.
Making it available to others was, well, I hoped people would find it useful.
dteviot
Literature Club Member
Posts: 31
Joined: Fri Sep 19, 2014 10:02 pm
Favourite Light Novel:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by dteviot »

Guest wrote:Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error
See: https://github.com/dteviot/WebToEpub/issues/18
Index
Mikuru's Master
Posts: 28
Joined: Fri May 08, 2015 7:00 pm
Favourite Light Novel: Toaru Majutsu No Index

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Index »

ken_arata wrote:Original filename: "Kamachi_Crossover_005.jpg"
Filename in epub: "[0004]Kamachi_Crossover_00....jpg"

Why do you do this? I've been saving images from Baka-Tsuki for years but I've never encountered images with invalid names/characters. So why do you still need to rename them?
There's also no reason to create a new file every time a new heading appears, specially h3 headings.
Now to me it just sounds like I'm being too obnoxious about really tiny details. ._.

Also, for many of the bugs me and other users are reporting, you could easily find them by checking the output of your extension. Do you not check what your extension is doing?

Sorry if I sounded rude, I'm just curious about what lead to current situation.
I preserve as much of the file name as possible while trying to keep it unique. The reason we are doing what we are doing is because we are trying to stay within file system limits and within the bounds of epubcheck.

The main thing is cross compatibility.

If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).

The main point here is that we don't care if you yourself have never had problems with them before. The issues caused by extensively long file names (especially on ntfs systems) are more annoying then you think. If the name is longer then 20 characters it will now get truncated in the middle and a number is appended to the beginning in case things aren't unique. This is to prevent any and all issues that could occur with any epub reader on any os.

Also it is actually smarter to split on h3 headings rather then having 1 big file. Epub is a zip file after all. By splitting on h3 headings (AKA chapters/parts) it makes it easier for anyone that wants to make small edits (after the source is no longer available), since they wont have to scroll for a while to edit what they want to edit.

As for the "many" bugs. Some of these bugs that show up only happen on specific pages and not every page. If we had time to go through every single page on baka-tsuki looking for slight format changes then we wouldn't need the help of our dear users who are testing now would we?

Also i'm pretty sure the extension would never have gotten this far if we weren't checking the output of the extension, hell there are even test scripts that check if the output is incorrect and scream at us if even the tiniest thing is different.
User avatar
ken_arata
Literature Club Member
Posts: 39
Joined: Sat Sep 27, 2014 12:38 pm
Favourite Light Novel: Kamachi Projects, Clockwork Planet, Campione, Fate Zero and many others
Contact:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by ken_arata »

dteviot wrote: Because I did not know epubcheck existed.
I originally built the converter for me, and it works fine for the epub reader I use.
Making it available to others was, well, I hoped people would find it useful.
Ah that happens, I myself had no idea epub existed until sometime around last year when I was completely frustrated by how not-user-friendly pdfs are. Good to see that you're fixing it for increasing compatibility.
I will continue to post stuff to improve it. (Right now I am thinking of adding an optional credits page. I have the html template almost ready, but I figured you're busy with the new fixes so I'm holding onto this for the time being.)
Index wrote:As for the "many" bugs. Some of these bugs that show up only happen on specific pages and not every page. If we had time to go through every single page on baka-tsuki looking for slight format changes then we wouldn't need the help of our dear users who are testing now would we?
I was referring to the more common issues, like wrong syntax in the opf and the like. ;_;
Index wrote:Also it is actually smarter to split on h3 headings rather then having 1 big file.
I understand this, that's why I asked to split at main headings (chapter headings). That way I can get 5-10 files that I can edit easily. The parts (h3 and lower) are usually way too small to require a new file and it's actually confusing trying to find the correct file among almost 50-100 files in some of the epubs (one of the zashiki warashi epubs had 72 html files). It's up to you though, I won't be bugging about this any further.
Index wrote:The main point here is that we don't care if you yourself have never had problems with them before. The issues caused by extensively long file names (especially on ntfs systems) are more annoying then you think.
No idea how this works so I'm trusting you on this.
Index wrote:If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).
Thanks. Noticed that you fixed a bunch of stuff I was about to report.
Also just realized that I could just change the naming scheme by myself, should've thought things through a bit more.
If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
I do... stuff.
dteviot
Literature Club Member
Posts: 31
Joined: Fri Sep 19, 2014 10:02 pm
Favourite Light Novel:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by dteviot »

ken_arata wrote:If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
That's about right. Although you'll want to either wait until I merge Belldandu's stuff into the Sonako branch https://github.com/dteviot/WebToEpub/branches, or grab from Belldandu. https://github.com/belldandu/WebToEpub/ ... e10Cleanup

Also, congratulations, you've just seen the beauty of Open Source software. If it doesn't quite do what you want, you can change it.
Index
Mikuru's Master
Posts: 28
Joined: Fri May 08, 2015 7:00 pm
Favourite Light Novel: Toaru Majutsu No Index

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Index »

ken_arata wrote:
Index wrote:If you want the original file name then look at the sources. The source url will be in the desc tag within the svg tag in 0.0.0.7 thanks to this pull request https://github.com/dteviot/WebToEpub/pull/19 all you have to do is right click the image in your epub reader and click inspect (depending on your reader).
Thanks. Noticed that you fixed a bunch of stuff I was about to report.
Also just realized that I could just change the naming scheme by myself, should've thought things through a bit more.
If I really want to make a version for myself that keeps image names as is or uses a different naming convention altogether, should I only change the stuff in "var safeForFileName = function (title)" in the Util.js or is there more stuff? Help please.
both safeForFileName and makeStorageFileName
dteviot
Literature Club Member
Posts: 31
Joined: Fri Sep 19, 2014 10:02 pm
Favourite Light Novel:

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by dteviot »

dteviot wrote:
Guest wrote:Tried 0.0.0.6 with Blood Sign (https://www.baka-tsuki.org/project/inde ... gn:Volume1) and got this error
See: https://github.com/dteviot/WebToEpub/issues/18
And.... version 0.0.0.7 which fixes the problem has been sent to Google.
Problem is stories where the Image Gallery at the start lacks a Title.
Question: should someone update these stories on the Baka-Tsuki site?
Index
Mikuru's Master
Posts: 28
Joined: Fri May 08, 2015 7:00 pm
Favourite Light Novel: Toaru Majutsu No Index

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Index »

It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes to

Code: Select all

<desc>
tags. And some are just doing their own thing with url's that are not in an

Code: Select all

<a href
. See https://github.com/dteviot/WebToEpub/issues/22

Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
User avatar
Guest
Astral Realm

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Guest »

Using the "remove duplicate images" option leaves a bunch of empty

Code: Select all

<div>
			
			
		</div>
<div>
			
			
		</div>
scattered around in the illustrations html.

I mean, I guess it doesn't really affect the user-facing result so it's hardly high priority, but it does seem a wee bit untidy.

Index wrote:It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes to

Code: Select all

<desc>
tags. And some are just doing their own thing with url's that are not in an

Code: Select all

<a href
. See https://github.com/dteviot/WebToEpub/issues/22

Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
Is there any chance of making adding the URLs to the image descriptions optional just as a workaround, in case of developers being lazy shits?
User avatar
Destinot
Astral Realm

Re: Chrome Plug-in to Convert Web Page to EPUB

Post by Destinot »

Index wrote:It has come to my attention that some epub readers are being non complaint with the SVG standard when it comes to

Code: Select all

<desc>
tags. And some are just doing their own thing with url's that are not in an

Code: Select all

<a href
. See https://github.com/dteviot/WebToEpub/issues/22

Please report this non-compliance to these readers developers in whatever way you can so that this can be fixed. Feel free to link that issue when reporting bugs like this.
Just tested and found that Moon+ Reader (https://play.google.com/store/apps/deta ... moonreader) does have this problem with showing the link below the image. The intricacies of HTML code have gone a little over my head though so I'm not quite sure what to say in an email to adequately describe the cause of the issue to the developer, would it be okay if anybody else more technically-minded could please shoot him an email? Developer's email is on the Play Store page.
Post Reply

Return to “Developers and Code”