Page 4 of 6

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 9:00 am
by rpapo
I correct myself as far as the dictionary size is concerned: running the program right now, the raw dictionary had 156,208 entries. With conjugations, the size ballooned to 5,511,101 entries. The conjugations currently performed are:
  • Stem
    Plain Non-Past (positive and negative (2 forms))
    Plain Past (positive and negative)
    Plain Imperative (2 forms)
    Plain Presumptive (2 forms)
    Plain Conjunctive
    Plain Provisional (positive and negative)
    Plain Conditional (positive and negative)
    Polite Conditional (positive and negative)
    Plain Passive Non-Past
    Plain Passive Past
    Plain Wish
    Polite Non-Past (positive and negative)
    Polite Past (positive and negative)
    Polite Presumptive
    Potential (2 forms)
    Causative
    Negative Conjunctive (-zu)
    Plain Alternative (positive and negative)
    Polite Alternative (positive and negative)
    Plain Progressive
    Plain Progressive Casual
    Polite Progressive
    Plain Progressive Past (2 forms)
    Polite Progressive Past
    Plain Negative Presumptive
    Polite Negative Presumptive
    Noun (from adjective)
    Adverb (from adjective)
    Seems like (from adjective)
Some of the conjugations combine, so there's even more stuff generated. Only the current memory consumption is stopping me from adding even more variants . . . some of which I could really use.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 9:19 am
by YoakeNoHikari
Oh my Lord. I didn't even realize there were so many conjugations.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 9:21 am
by rpapo
YoakeNoHikari wrote:Oh my Lord. I didn't even realize there were so many conjugations.
I haven't finished putting them all in, either... 8)

See http://www.epochrypha.com/japanese/materials/verbs/ for a pretty good list.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 9:30 am
by rpapo
hobogunner wrote:...what language is your Program in, Rpapo? would it be possible to get it? (I suppose language doesn't matter if it's an .exe. :lol: )
Come and get it. Download http://mywebpages.comcast.net/rpapo/Nihongo.zip.

The program to be run is at x64\release\nihongo.exe, and it requires two main input parameters: the name of the file to be read in (it must be Unicode TXT format), and the name of the file to be generated as output. Additionally, you can specify the line number (zero-based) where you want to start parsing, and the line number where you want to stop parsing. A complete command line might look like this:

Code: Select all

c:\Nihongo> x64\release\nihongo.exe GoldenTime_1.txt OutputText.txt 1623 1631
Beyond that, the dictionary extensions are not yet configurable. The program has a number of extended dictionary entries hard-coded in it (for like Banri, Taiga Aisaka, etc), but that extension list is not in a separate file at this point.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 11:20 am
by hobogunner
Awesome...I will download that when I get home. :D

Can you create custom output? Such as after it references the dictionaries, create a seperate entry that goes through and tells you any names? (the romaji converter that I use online writes Nemoto as Konpon....) so it would be nice to be able to explicitely tell it what it is. (In a different line is fine, but just to make sure I know.)

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 11:32 am
by rpapo
hobogunner wrote:Awesome...I will download that when I get home. :D

Can you create custom output? Such as after it references the dictionaries, create a seperate entry that goes through and tells you any names? (the romaji converter that I use online writes Nemoto as Konpon....) so it would be nice to be able to explicitely tell it what it is. (In a different line is fine, but just to make sure I know.)
That's what the dictionary extensions are all about. I don't have that externalized at this point: the extensions are registered in the code. Currently I use this for most of the character's names, plus for a few place names and spoken elisions like してしまった => しちゃった.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 1:03 pm
by hobogunner
Hmmm....I will certainly look into figuring that out. Not too familiar with C++, but if you parse the text, it reads it line by line, how hard would it be to create a new sub routine to add a secondary 'personal dictionary' reference? As in have a third line aside from the other 2 dictionaries?

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 3:06 pm
by rpapo
hobogunner wrote:Hmmm....I will certainly look into figuring that out. Not too familiar with C++, but if you parse the text, it reads it line by line, how hard would it be to create a new sub routine to add a secondary 'personal dictionary' reference? As in have a third line aside from the other 2 dictionaries?
The logic is more complicated that what you're thinking. For one thing, the basic unit for parsing is not a line, but rather a phrase, which is something delimited by punctuation. After all, Japanese words in general do not contain periods, commas, spaces, parentheses nor quotation marks. Since we are attempting to break the text into inindividual words, we can safely bet that punctuation divides words.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 3:11 pm
by hobogunner
I understand that, however even without dividing marks, one could find a certain piece of text by simply running through it again, there is obviously a text scraper set up, to compare and get the longest phrase it can, therefore, it has to have some way of going through every individual level.....telling it to scrape again and specificaly find certain characters would't be too hard, would it?

Sorry if that idea makes no sense, I have yet to look at it since I'm on my iPad.

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 3:19 pm
by rpapo
hobogunner wrote:I understand that, however even without dividing marks, one could find a certain piece of text by simply running through it again, there is obviously a text scraper set up, to compare and get the longest phrase it can, therefore, it has to have some way of going through every individual level.....telling it to scrape again and specificaly find certain characters would't be too hard, would it?

Sorry if that idea makes no sense, I have yet to look at it since I'm on my iPad.
If you want the parser to recognize names, you simply add a dictionary entry for the name. With the current code of my parser, you need to add to the stuff that starts in file Nihongo.cpp, line 977 and later. You will find entries like the following:

Code: Select all

AddWord ( L"竹宮", L"たけみや", L"(name) Takemiya" ) ;
All you have to do is add entries to suit yourself, and then recompile the application afterwards (of course).

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 3:20 pm
by hobogunner
That's awesome....now I also see why it takes so much memory. :lol:

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 3:26 pm
by rpapo
hobogunner wrote:That's awesome....now I also see why it takes so much memory. :lol:
My intent is to externalize the list of additional words, so you don't have to be a programmer with Visual Studio 2008 handy to modify the list and recompile the program.

FWIW, as a side-effect of the parsing run, two special TXT files get created: Dictionary.txt and Index.txt. These files are a text representation of the fully elaborated dictionary with all the generated conjugations. They are huge...

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 8:18 pm
by jonathanasdf
Hey, could I add you on msn/gtalk or something? If you would like, I could work together with you on this. The ability to add your own words and phrases and stuff seem it might be pretty nice.

Edit: actually nevermind, it's a bit too long that I suddenly got lazy...

Re: Help Offers for Golden Time

Posted: Tue Nov 22, 2011 8:22 pm
by hobogunner
I created this topic: viewtopic.php?f=31&t=4691#p121114

as to help with us guys who are going to be adding individual components into it. Discussion of what we've done.

Re: Help Offers for Golden Time

Posted: Wed Nov 23, 2011 3:10 am
by rock96
Is there need to merge posts from this topic?