Page 1 of 1

Database for all current novels

Posted: Tue Sep 29, 2015 12:14 am
by Shadowys
I'm thinking of continuing EusthEnoptEron's idea of periodically saving all of our novels into a database for better search and indexing, but don't know if that was viable in BT's framework.

The proposed mechanism will be as follows:
1. Initial seeding :
--1. All uploads are stopped for one day. A MySQL DB is created for storing all the data.
--2. Data is pulled through the parser API ( example, http://btapi-shadowys.rhcloud.com/, Hopefully this API can get merged into the baka-tsuki domain before this starts to handle the load)
--3. It is then channeled to another application that does the downloading and sorting of all the text and images in each volume and chapter.
--4. Once sorted, the same application will update the DB and record the time it has completed the update.
2. Continuous integration:
--1. The app will monitor MediaWiki each minute ( example using the above API, with the /update route) and if there are changes, the app will begin a new process to read from the mediawiki site and update the DB again.
--2. Each month the app will do a whole site check and update if necessary.

Once this is done we could probably move to using markdown (or orgmode :P) or other structured methods to input data.

Re: Database for all current novels

Posted: Sun Oct 25, 2015 3:59 am
by _AzSiAz
Hi it's AzSiAz from github :)
I was thinking on doing the same on my own to store novel chapter page and avoid making a request to download text and download image every time someone ask for a page and in addition I wanted to use it to create real notification for my push server based on time edit so I can send which novel chapter was updated/added and not just say novel is updated

Re: Database for all current novels

Posted: Sun Oct 25, 2015 4:45 pm
by Shadowys
Agreed, the current api makes it extremely hard to analyse the updates of each novel.

Re: Database for all current novels

Posted: Fri Oct 30, 2015 4:17 am
by _AzSiAz
Yeah we can't see them by language and to be sure it's the english version updated and not spanish/french/... version is near impossible with every language available in baka-tsuki.

I'm still wondering how I will do it, specifically on the database side^^

Re: Database for all current novels

Posted: Fri Oct 30, 2015 11:47 pm
by Shadowys
I think it would be possible to cross reference the names for the alternate language novel types from the light novel and language tag search and the updates once the database has been built.

For example we first build a database table with the name and language headers, populate it with information from /category?type=LIGHT_NOVEL&language={language}
We get the updated novels from the /time?updates=100 api, and then from there we do a

Code: Select all

SELECT * FROM table WHERE NAME={name_on_the_list} AND language=ENGLISH
, which is better than doing it from the mediawiki side.

Re: Database for all current novels

Posted: Sat Oct 31, 2015 2:27 am
by _AzSiAz
It could work but sometimes we get a name like "No Game No Life-Spanish Español:Volumen 4 Epílogo" we can still use regex to filter some but I don't think we can't filter every possibility
I was thinking about something even simple not sure it would work : two server one who make the database and check for update every 5 minutes for example and in case of update, detected when we update a certain table to add a chapter, send a message through icp/amqp/socket or something similar to the second server with is used to generate/update a rss flux

Re: Database for all current novels

Posted: Sat Oct 31, 2015 8:53 am
by Cthaeh
Not sure if it would work with whatever you're developing, but I think the most accurate automated way to check the language of a given page only with information on the wiki is to look at the pages that link to the given page, and then crossreference those pages against a list of project pages for each language (category members for each language). If only one project page links to that page in question, then you know it's part of the language that project page is defined under.

I think you're right that the page names are not standard enough to be work that well currently. Page name standardization was proposed, which would make it possible for regex on the page title to give the language; but that'd be a bit of work to do, so only worth it if there will be actual significant benefits.

Re: Database for all current novels

Posted: Sat Oct 31, 2015 6:56 pm
by Shadowys
Cthaeh wrote: but I think the most accurate automated way to check the language of a given page only with information on the wiki is to look at the pages that link to the given page, and then crossreference those pages against a list of project pages for each language (category members for each language).
Yeah, that's what I was thinking too, the api is available to pull these data from BT, but as was what AzSiAz has stated, some of the pages do not have standard naming conventions, and may or may not contain the title, or even the title is translated (I'm not sure if we have this), but if they have some kind of resemblance to the original title, then I would think a regex with some modifications should do. (Just stripping of the _ and special characters before comparing should do, I think)

Turn [No Game No Life-Spanish Español:Volumen 4 Epílogo -> No Game No Life Spanish Español]
Then find [No Game No Life in No Game No Life Spanish Español], the former from the list of novels compiled previously.

After that we could treat every page just like how it is done in the wiki, and do polling as AzSiAz has suggested.

Re: Database for all current novels

Posted: Thu May 26, 2016 5:27 am
by _AzSiAz
Any news on this side ?
Since it could help to have an API with consistent data from database without time to parse it^^

Re: Database for all current novels

Posted: Thu May 26, 2016 6:38 pm
by Shadowys
I've got it working for English entries, but I haven't got down to sort out the right schema for all the data needed for this. Probably would start next month, but I will post what I have now later when I get back to my comp.

Re: Database for all current novels

Posted: Thu May 26, 2016 11:00 pm
by _AzSiAz
Well good to know now let's wait :D