Cleaning up the API

Post by **Shadowys** » Sat Apr 18, 2015 9:47 pm

I think I could try to clean up the API by creating a separate service that would parse normal requests to mediawiki ones or create a structured API doc.
Either way, maybe someone can post examples of mediawiki API uses, and also what API should I clean up first.

For example:

Usage: Getting all sections of Seirei_Tsukai_no_Blade_Dance in JSON
API: baka-tsuki.org/project/api.php?action=parse&prop=sections&page=Seirei_Tsukai_no_Blade_Dance&format=json
Suggestion:
Getting only the volumes listed in Seirei_Tsukai_no_Blade_Dance sections from the above API
Maybe like: bt.api/lntitle=Seirei_Tsukai_no_Blade_Dance&volume=all

Post by **cloudii** » Sat Apr 18, 2015 10:28 pm

From what I've heard from the Android App developers, it's easier to parse the html than use the XML/JSON returned from mediawiki.

The reason being, it is actually impossible to reliably obtain a list of volumes/chapters owned by series.

Mediawiki API can fetch you the contents of any given page (in wiki markup q____q), but the problem is how do you extract the order of volumes, the chapters, and whether they're complete (many of them aren't)? The formatting of every project page is not consistent and there are many more links on a page than of actual content we want.

It's kind of a painful undertaking, but kudos to you if you can make something out of it!

Post by **Shadowys** » Sun Apr 19, 2015 5:30 am

So I tried my hand at it and got something like this:

baka-tsuki-api.herokuapp.com/api?title=Chrome_Shelled_Regios&volume=available

baka-tsuki-api.herokuapp.com/api?title=Campione!&volume=9

Some docs here: baka-tsuki-api.herokuapp.com/

Not perfect, but I hope it's in the right direction.
For CSR and Campione! , the urls for the short stories and volumes are quite different @@
Hopefully the folks over at android app dev can give me a tip into the miscellaneous loop holes in formats.

Post by **Shadowys** » Sun Apr 19, 2015 4:45 pm

New API docs:
https://baka-tsuki-api.herokuapp.com/

Example API:
baka-tsuki-api.herokuapp.com/api?title=Date_A_Live&series=date&volume=1|puppet

Please disregard the previous post. I have rewritten it to use a filtering mechanism and regex instead.

Sometimes startup may be slow as heroku's dynos need time to warm up, but subsequent calls should be fast.

Github repo:
https://github.com/Shadowys/btapi

Post by **Shadowys** » Mon Apr 20, 2015 8:11 pm

Switched to host the api here due to some problems with heroku. The one at heroku is still usable though, and both are updated simultaneously :

http://btapi-shadowys.ngapp.io/

Post by **Simon** » Tue Apr 21, 2015 11:12 am

Rendered:
http://www.baka-tsuki.org/project/index ... lade_Dance

Raw:
http://www.baka-tsuki.org/project/index ... lade_Dance

Just a random script I tried to create to parse the Raw Text. It works... more or less. 70% should, the other 30%... well, a pain in the ass
https://github.com/Lord-Simon/Scripts/b ... ter/btp.py

Post by **Shadowys** » Tue Apr 21, 2015 8:16 pm

Thanks! So there is a render action that gets me the html output directly
Is there a core template on projects that BT uses?

animeout · Post by **animeout** » Wed Apr 22, 2015 6:39 pm

cloudii wrote:From what I've heard from the Android App developers, it's easier to parse the html than use the XML/JSON returned from mediawiki.

The reason being, it is actually impossible to reliably obtain a list of volumes/chapters owned by series.

Mediawiki API can fetch you the contents of any given page (in wiki markup q____q), but the problem is how do you extract the order of volumes, the chapters, and whether they're complete (many of them aren't)? The formatting of every project page is not consistent and there are many more links on a page than of actual content we want.

It's kind of a painful undertaking, but kudos to you if you can make something out of it!

I would advise the same as well, currently I am working on the iOS app (LN reader) which uses BT as main source. I started the project almost a year ago but had to drop due to the API (official mediawiki one) being a complete mess.
Since then I have slowly and steadily created my own unofficial and even better API which uses the HTML content of page, scrapes it and then parses it.

This has a few benefits:
1. It is easier
2. Content is always latest, upto date and you avoid making sub-API calls to mediawiki
3. You dont rely on mediawiki API being changed or broken with its updates changing structures
4. The returned content is full-proof as long as the tags on the page dont change
5. You can create unlimited sub-features and APIs based on the content and even filter out the content you want or dont want.

In my case, I created a sub-api for mangaupdates as well to go with my BT API. What it does is, it queries a LN I get from BT on MU, to get extra info like synopsis (BT one is not proper in all series), author, artist, start date, alternate titles and genres. The genre thing is important as this will let me create filters for users to search specific light novels.

Right now, I am still working on improving the API, but as I mentioned in my other thread, there are some series on BT that really need to be updated to use proper tags for chapters block (::* is the tag to be used) so that the parser doesn't break.
I have not yet created a json output function but I will post the results to show how my API looks like.

Post by **Shadowys** » Wed Apr 22, 2015 10:35 pm

I agree on the HTML part though, since for each title most of the info is hidden in the main page of the project. For some other information like time and chapter updated though I used the mediawiki API.

Currently I'm basing the features of the API on the mobile version of this site: http://lknovel.lightnovel.cn/ , so the data pulled should be enough to recreate something like it.

Post by **EusthEnoptEron** » Thu Apr 23, 2015 12:37 am

For the HTML content you can just use action=parse in the MW API, which you're already using at one point I think. IMO that's a more robust way than relying on the website output that could change any time depending on the theme.

It would be great if the API provided a way to fetch the latest updates in chapters, but for that you would need to use a database in the background (unless you parsed the recent updates section for each project...). Well, you'll ultimately want a DB anyway for caching purposes as I already mentioned in the PM.
I haven't used nodejs in a while, but last I checked Redis, MongoDB, MariaDB and RethinkDB were some good options.

Post by **Shadowys** » Thu Apr 23, 2015 3:55 am

For the recent updates of chapters, I found this API from MW
baka-tsuki.org/project/api.php?action=query&list=recentchanges&rclimit=100&rctoponly=true&format=json

Which lists all the current changes. The current API already provides a way to get the last revision time for each project (needs query to /api and /api/time )

I would integrate this into the API sometime later, but I'm still hoping that a database of everything can be compiled but then again this would require an API to pull the data from the pages, which then is where we come back to the same point...

Post by **EusthEnoptEron** » Thu Apr 23, 2015 4:04 am

No, the output of the Recent Pages page is pretty much garbage. What I'm interested in as a user of the API are the chapters that have been uploaded, not the MediaWiki pages that have been changed. That would probably mean keeping track of the links in a project page that have turned from red to blue, so to speak, and don't have some template that indicates progress.

Post by **Shadowys** » Thu Apr 23, 2015 4:19 am

I see. Indeed that would be harder to get from the current API except by looping through the recent changes for the "new" flag.

animeout · Post by **animeout** » Thu Apr 23, 2015 5:04 am

So, thanks to your replies and clearing my doubt on parsing chapter links, I was again motivated to work on my version of the API (and the iOS app to follow)
Worked for a few hours to finish almost all of the pending changes and get the API to output in proper json

Here is what have been done for now (The below links are temporary till I host it on a domain of its own):

/**
* Baka-Tsuki List Of Light Novels
* url - /bakatsuki/list
* method - GET
* params -
*/
http://gator3224.hostgator.com/~whyclou ... tsuki/list
Right now shows a total count of series available and their titles

/**
* Baka-Tsuki Light Novel Information
* url - /bakatsuki/novel
* method - GET
* params - title
*/
http://gator3224.hostgator.com/~whyclou ... lute%20Duo
Use the LN title to get its detailed information, the title should have space (use the title as it is shown in listAPI)
Uses MangaUpdates data for synopsis, author, genre, illustrator, date. Might add more.

/**
* Baka-Tsuki Chapters For Title
* url - /bakatsuki/chapters
* method - GET
* params - title
*/
http://gator3224.hostgator.com/~whyclou ... 20Tsukaima
Use the LN title to get its detailed information, the title should have space (use the title as it is shown in listAPI)
Will be updating this API to have a "chapterAPI" url as well which will be internal API link to get chapter data within API (to-do)

My current To-Do list:
-Added chapters API to show the chapter data itself within API
-Save all the above data in my own database with date_added and date_updated
-Add a lastUpdated column in all the APIs to create another sub-api for changes and updates, can be integrated in RSS feeds as well
-Add caching and CDN to images and the database
-Add a Put/Insert API for users and translators outside BT to submit their translations to the database

PS: Right now, the API is matching content and structure from baka-tsuki however, it will be evolving more to suit my needs for the iOS project. Such as adding things like discussions, ratings, views, popularities, related anime/manga, amazon/YP buy now link, blah blah

Am welcome to suggestions and feedback.
@OP It would have been so great if we were working in same backend language, would have made developing the API more faster. Still, if you need any general help or such, let me know

Post by **Shadowys** » Thu Apr 23, 2015 5:47 am

I've implemented the API to search for the newest updated pages, excluding user and talk pages.

http://btapi-shadowys.ngapp.io/api/time ... T20:58:55Z

docs:
http://btapi-shadowys.ngapp.io/time

For structure's sake, one day all of BT data should be placed into a database, but that will have to wait until the API is stable.
Just a question here, is Baka Tsuki going for a responsive website front end for readers?

animeout wrote:PS: Right now, the API is matching content and structure from baka-tsuki however, it will be evolving more to suit my needs for the iOS project. Such as adding things like discussions, ratings, views, popularities, related anime/manga, amazon/YP buy now link, blah blah

Does your app plan to sync with BR-EX?

ばか！バカ！　馬鹿ー月！

Cleaning up the API

Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API

Re: Cleaning up the API