Golden Time: Help Offers

Off-shoot branches of the main brigade who toil under the glorious leadership of our Editor-in-Chief.

General discussion related to projects by the Computer or Anime & Manga Club.....

Moderators: Fringe Security Bureau, Senior Editors, Senior Translators, Alt. Language Translator/Editor, Executive Council, Project Translators, Project Editors

User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

I correct myself as far as the dictionary size is concerned: running the program right now, the raw dictionary had 156,208 entries. With conjugations, the size ballooned to 5,511,101 entries. The conjugations currently performed are:
  • Stem
    Plain Non-Past (positive and negative (2 forms))
    Plain Past (positive and negative)
    Plain Imperative (2 forms)
    Plain Presumptive (2 forms)
    Plain Conjunctive
    Plain Provisional (positive and negative)
    Plain Conditional (positive and negative)
    Polite Conditional (positive and negative)
    Plain Passive Non-Past
    Plain Passive Past
    Plain Wish
    Polite Non-Past (positive and negative)
    Polite Past (positive and negative)
    Polite Presumptive
    Potential (2 forms)
    Causative
    Negative Conjunctive (-zu)
    Plain Alternative (positive and negative)
    Polite Alternative (positive and negative)
    Plain Progressive
    Plain Progressive Casual
    Polite Progressive
    Plain Progressive Past (2 forms)
    Polite Progressive Past
    Plain Negative Presumptive
    Polite Negative Presumptive
    Noun (from adjective)
    Adverb (from adjective)
    Seems like (from adjective)
Some of the conjugations combine, so there's even more stuff generated. Only the current memory consumption is stopping me from adding even more variants . . . some of which I could really use.
User avatar
YoakeNoHikari
Project Translator
Posts: 1367
Joined: Sat Mar 26, 2011 12:39 pm
Favourite Light Novel: Ahouka!
Location: この混乱の街並みの中でも、この2人は独自の世界を展開・・・

Re: Help Offers for Golden Time

Post by YoakeNoHikari »

Oh my Lord. I didn't even realize there were so many conjugations.
Even eternity can be encased in ice.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

YoakeNoHikari wrote:Oh my Lord. I didn't even realize there were so many conjugations.
I haven't finished putting them all in, either... 8)

See http://www.epochrypha.com/japanese/materials/verbs/ for a pretty good list.
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

hobogunner wrote:...what language is your Program in, Rpapo? would it be possible to get it? (I suppose language doesn't matter if it's an .exe. :lol: )
Come and get it. Download http://mywebpages.comcast.net/rpapo/Nihongo.zip.

The program to be run is at x64\release\nihongo.exe, and it requires two main input parameters: the name of the file to be read in (it must be Unicode TXT format), and the name of the file to be generated as output. Additionally, you can specify the line number (zero-based) where you want to start parsing, and the line number where you want to stop parsing. A complete command line might look like this:

Code: Select all

c:\Nihongo> x64\release\nihongo.exe GoldenTime_1.txt OutputText.txt 1623 1631
Beyond that, the dictionary extensions are not yet configurable. The program has a number of extended dictionary entries hard-coded in it (for like Banri, Taiga Aisaka, etc), but that extension list is not in a separate file at this point.
User avatar
hobogunner
Administrator
Posts: 8820
Joined: Wed Aug 17, 2011 2:24 pm
Favourite Light Novel:
Location: Elsewhere.

Re: Help Offers for Golden Time

Post by hobogunner »

Awesome...I will download that when I get home. :D

Can you create custom output? Such as after it references the dictionaries, create a seperate entry that goes through and tells you any names? (the romaji converter that I use online writes Nemoto as Konpon....) so it would be nice to be able to explicitely tell it what it is. (In a different line is fine, but just to make sure I know.)
Maybe this is just too fast, too real -Stay Close, Parabelle
Snails see the benefits, the beauty in every inch -Snails, The Format
You thought you could find happiness just over that green hill; you thought you would be satisfied, but you never will learn to be still
-Learn To Be Still, The Eagles
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

hobogunner wrote:Awesome...I will download that when I get home. :D

Can you create custom output? Such as after it references the dictionaries, create a seperate entry that goes through and tells you any names? (the romaji converter that I use online writes Nemoto as Konpon....) so it would be nice to be able to explicitely tell it what it is. (In a different line is fine, but just to make sure I know.)
That's what the dictionary extensions are all about. I don't have that externalized at this point: the extensions are registered in the code. Currently I use this for most of the character's names, plus for a few place names and spoken elisions like してしまった => しちゃった.
User avatar
hobogunner
Administrator
Posts: 8820
Joined: Wed Aug 17, 2011 2:24 pm
Favourite Light Novel:
Location: Elsewhere.

Re: Help Offers for Golden Time

Post by hobogunner »

Hmmm....I will certainly look into figuring that out. Not too familiar with C++, but if you parse the text, it reads it line by line, how hard would it be to create a new sub routine to add a secondary 'personal dictionary' reference? As in have a third line aside from the other 2 dictionaries?
Maybe this is just too fast, too real -Stay Close, Parabelle
Snails see the benefits, the beauty in every inch -Snails, The Format
You thought you could find happiness just over that green hill; you thought you would be satisfied, but you never will learn to be still
-Learn To Be Still, The Eagles
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

hobogunner wrote:Hmmm....I will certainly look into figuring that out. Not too familiar with C++, but if you parse the text, it reads it line by line, how hard would it be to create a new sub routine to add a secondary 'personal dictionary' reference? As in have a third line aside from the other 2 dictionaries?
The logic is more complicated that what you're thinking. For one thing, the basic unit for parsing is not a line, but rather a phrase, which is something delimited by punctuation. After all, Japanese words in general do not contain periods, commas, spaces, parentheses nor quotation marks. Since we are attempting to break the text into inindividual words, we can safely bet that punctuation divides words.
User avatar
hobogunner
Administrator
Posts: 8820
Joined: Wed Aug 17, 2011 2:24 pm
Favourite Light Novel:
Location: Elsewhere.

Re: Help Offers for Golden Time

Post by hobogunner »

I understand that, however even without dividing marks, one could find a certain piece of text by simply running through it again, there is obviously a text scraper set up, to compare and get the longest phrase it can, therefore, it has to have some way of going through every individual level.....telling it to scrape again and specificaly find certain characters would't be too hard, would it?

Sorry if that idea makes no sense, I have yet to look at it since I'm on my iPad.
Maybe this is just too fast, too real -Stay Close, Parabelle
Snails see the benefits, the beauty in every inch -Snails, The Format
You thought you could find happiness just over that green hill; you thought you would be satisfied, but you never will learn to be still
-Learn To Be Still, The Eagles
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

hobogunner wrote:I understand that, however even without dividing marks, one could find a certain piece of text by simply running through it again, there is obviously a text scraper set up, to compare and get the longest phrase it can, therefore, it has to have some way of going through every individual level.....telling it to scrape again and specificaly find certain characters would't be too hard, would it?

Sorry if that idea makes no sense, I have yet to look at it since I'm on my iPad.
If you want the parser to recognize names, you simply add a dictionary entry for the name. With the current code of my parser, you need to add to the stuff that starts in file Nihongo.cpp, line 977 and later. You will find entries like the following:

Code: Select all

AddWord ( L"竹宮", L"たけみや", L"(name) Takemiya" ) ;
All you have to do is add entries to suit yourself, and then recompile the application afterwards (of course).
User avatar
hobogunner
Administrator
Posts: 8820
Joined: Wed Aug 17, 2011 2:24 pm
Favourite Light Novel:
Location: Elsewhere.

Re: Help Offers for Golden Time

Post by hobogunner »

That's awesome....now I also see why it takes so much memory. :lol:
Maybe this is just too fast, too real -Stay Close, Parabelle
Snails see the benefits, the beauty in every inch -Snails, The Format
You thought you could find happiness just over that green hill; you thought you would be satisfied, but you never will learn to be still
-Learn To Be Still, The Eagles
User avatar
rpapo
I.D.S.E Humanoid Interface [LSB]
Posts: 1530
Joined: Mon Dec 21, 2009 5:15 am
Favourite Light Novel: Ahouka!
Location: Michigan, USA
Contact:

Re: Help Offers for Golden Time

Post by rpapo »

hobogunner wrote:That's awesome....now I also see why it takes so much memory. :lol:
My intent is to externalize the list of additional words, so you don't have to be a programmer with Visual Studio 2008 handy to modify the list and recompile the program.

FWIW, as a side-effect of the parsing run, two special TXT files get created: Dictionary.txt and Index.txt. These files are a text representation of the fully elaborated dictionary with all the generated conjugations. They are huge...
jonathanasdf
Kyon's Imouto-Chan
Posts: 454
Joined: Mon Feb 28, 2011 11:35 pm
Favourite Light Novel: Ahouka!

Re: Help Offers for Golden Time

Post by jonathanasdf »

Hey, could I add you on msn/gtalk or something? If you would like, I could work together with you on this. The ability to add your own words and phrases and stuff seem it might be pretty nice.

Edit: actually nevermind, it's a bit too long that I suddenly got lazy...
Last edited by jonathanasdf on Tue Nov 22, 2011 8:40 pm, edited 1 time in total.
User avatar
hobogunner
Administrator
Posts: 8820
Joined: Wed Aug 17, 2011 2:24 pm
Favourite Light Novel:
Location: Elsewhere.

Re: Help Offers for Golden Time

Post by hobogunner »

I created this topic: viewtopic.php?f=31&t=4691#p121114

as to help with us guys who are going to be adding individual components into it. Discussion of what we've done.
Maybe this is just too fast, too real -Stay Close, Parabelle
Snails see the benefits, the beauty in every inch -Snails, The Format
You thought you could find happiness just over that green hill; you thought you would be satisfied, but you never will learn to be still
-Learn To Be Still, The Eagles
User avatar
rock96
Senior Project Translator
Posts: 333
Joined: Wed Jul 27, 2011 12:16 am
Favourite Light Novel:

Re: Help Offers for Golden Time

Post by rock96 »

Is there need to merge posts from this topic?
Kadi - hero we don't deserve.
Honorable mention for moderating and translating Campione, editing Gekkou and Hakomari.
O White Knight standing sadly amidst hordes of filthy plebs, return to us, please.
Locked

Return to “Auxiliary Brigades”