Page 5 of 9

Re: Rpapo's Translation Assistant.

Posted: Sat Jun 23, 2012 6:09 pm
by Mystes
rpapo wrote:Oops. Something's not working as well as before. Going to have to look into it . . . :oops:
Don't worry, take your time. XD

Re: Rpapo's Translation Assistant.

Posted: Sun Jun 24, 2012 2:13 am
by rpapo
Kira0802 wrote:
rpapo wrote:Oops. Something's not working as well as before. Going to have to look into it . . . :oops:
Don't worry, take your time. XD
The problem I saw turns out to not be as serious as I thought. The parser has it's weaknesses still, and the biggest one is a relatively low tolerance for less-than-perfect spelling. I had mis-transcribed ヅラ as ジラ, and it got all confused.

Re: Rpapo's Translation Assistant.

Posted: Mon Jun 25, 2012 2:29 pm
by rpapo
I've posted an update to the program which punts around the problem mentioned above. Now, if we find a sequence of katakana characters, and we cannot find anything matching it in the dictionary or it's extensions, then we gather those characters together and call it a word with an unknown meaning. Most (though not all) such words are actually borrowed from English.

By making this change, the misbehavior I was seeing goes away . . . at least for now.

This time around, strangely enough, the culprit was ツン (tsun), which was in the dictionary in hiragana format, but not in katakana.

Re: Rpapo's Translation Assistant.

Posted: Sun Mar 17, 2013 12:16 pm
by rpapo
An update on this. I finally kicked myself hard and did some work on an interactive version of my Japanese translation assistant program. This is what it looks like now:

Image

The dictionary file must have been built first using the NIHONGO.exe program, but once you have that, the above program may be run. Once it has fully started, you type or paste the text you want to analyze into the upper entry field, then click on the Translate button. The results appear in the lower window.

This is very much a work in progress. I did what you see above for the sake of helping me with manga translations, where I don't bother to make a TXT transcript file. Most of my time, however, is spent on transcribing and translating the Golden Time books. Witness the fact that this forum topic hasn't been posted too in over six months...

Things to be done (no ETA, so don't ask for one):
(1) Provide for dictionary maintenance from the above GUI.
(2) Provide for user hints (split a word here, join these two words...).
(3) Make the parser less reliant on the dictionary having every possible verb conjugation already precomputed.
(4) Make the parser smarter.

Making it into a real translator is a whole different can of worms...

Re: Rpapo's Translation Assistant.

Posted: Sun Mar 17, 2013 9:57 pm
by didntloginD:
rpapo wrote:Things to be done (no ETA, so don't ask for one):
(1) Provide for dictionary maintenance from the above GUI.
(2) Provide for user hints (split a word here, join these two words...).
(3) Make the parser less reliant on the dictionary having every possible verb conjugation already precomputed.
(4) Make the parser smarter.
:lol:

My biggest question is that of overhead: is there any way for you to make it so that one doesn't have to have as much as is required right now? IIRC, one has to load the entire dictionary and index into RAM in order to access it through the program itself. That's probably the biggest "improvement" I could think of. I don't know what all the WWDIJ(?)(?)(?) (I can never remember the entire acronym) file gives you access to in terms of interaction with the dictionary they have generated. I assume that the WWDIJ(?)(?)(?) thing you're loading when you say to load the dictionary beforehand. (Unless this is all wrong and I'm just going on about nothing.) Either way, it'd be nice to slim down the amount of overhead some.

I sit around doing nothing nowadays too much anyways, feel free to PM me if you want to expand it / bounce ideas back and forth. I am quite interested in this thing (and the overhead taken with my WinXP 32bit system :roll: ).

Re: Rpapo's Translation Assistant.

Posted: Mon Mar 18, 2013 2:11 am
by rpapo
didntloginD: wrote:My biggest question is that of overhead: is there any way for you to make it so that one doesn't have to have as much as is required right now? IIRC, one has to load the entire dictionary and index into RAM in order to access it through the program itself. That's probably the biggest "improvement" I could think of. I don't know what all the WWDIJ(?)(?)(?) (I can never remember the entire acronym) file gives you access to in terms of interaction with the dictionary they have generated. I assume that the WWDIJ(?)(?)(?) thing you're loading when you say to load the dictionary beforehand. (Unless this is all wrong and I'm just going on about nothing.) Either way, it'd be nice to slim down the amount of overhead some.

I sit around doing nothing nowadays too much anyways, feel free to PM me if you want to expand it / bounce ideas back and forth. I am quite interested in this thing (and the overhead taken with my WinXP 32bit system :roll: ).
You are correct in remembering that the program loads the whole dictionary before getting to work. But it's actually much worse than simply loading all of WWWJDICT. That part is easy, with "only" about 160,000 entries. The problem is that my current quite dumb parser relies on the dictionary being pre-processed with tons of verb and adjective conjugations, which swells the basic EDICT (English Dictionary) file from 13Mb to an indexed binary image that currently takes almost 1.4Gb. The dictionary expands roughly 100 times in size.

I know how I want to get around the problem, but it will take some time to do it right. Instead of having a dumb parser that relies on longest matches against a huge pre-processed dictionary (with a few minor optimizations), I need to make a smart parser that evaluates how Japanese words conjugate and relate to each other dynamically. I have two partially attempted prototypes (Analyzer, Parser2) for that in the code package I publish:

http://home.comcast.net/~rpapo/Nihongo.zip

Those prototypes are far from ready for anybody else's evaluation, though, and I haven't spent time on them in quite a while. By the time I get back to them, I may simply start yet a third new project...

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 12:32 pm
by Lery
OMG...
This is a really nice program you've made here.

It will spare me tons of time when translating ^^''
(Because I'm terrible with kanji so far... :oops: )

I'm going to try it asap. :mrgreen:

Edit : heck, unable to get it to work: the Nihongo.exe tells me "no source file specfied" and so it doesn't build the dictionary. (On Win7x64)

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 12:42 pm
by rpapo
Lery wrote:OMG...
This is a really nice program you've made here.

It will spare me tons of time when translating ^^''
(Because I'm terrible with kanji so far... :oops: )

I'm going to try it asap. :mrgreen:

Edit : heck, unable to get it to work: the Nihongo.exe tells me "no source file specfied" and so it doesn't build the dictionary. (On Win7x64)
In your case, do the following steps:

(1) Open a command prompt. Change (CD) to the directory you extracted my entire package to.
(2) Create a Unicode TXT file with nothing in it, and save it to that directory. Let's suppose you called it TEST.TXT.
(3) Execute "x64\release\nihongo.exe TEST.txt OUTPUT.txt 0 9999". This will take a while, and consume monstrous amounts of memory (4-5Gb), but will create a dictionary file that can be loaded quickly the next time you run.
(4) Now that the dictionary DICTIONARY.DAT has been created, you can run "x64\release\Honyaku_no_Hojo.exe"

I have not yet integrated it all into Honyaku.

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 12:49 pm
by Lery

Code: Select all

>>C:\Nihongo\x64\release>Nihongo.exe test.txt output.txt 0 9999
Loading dictionary.
ERROR: Unable to open dictionary file 'Dictionary.dat' for reading.
Building dictionary file from EDICT.
ERROR: Unable to open source file.  Error 2:(null)
:?
Still seems to be a problem.

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 12:53 pm
by rpapo
Lery wrote:Still seems to be a problem.
But it looks like it got the dictionary built. Check for the DICTIONARY.DAT. It should exist, and it should be quite big. There will also be a file DICTIONARY.TXT that shows the entire dictionary in human readable form. Beware: the file is so large most editors cannot swallow it whole.

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 1:00 pm
by Lery
Doesn't look like it's the case :

Code: Select all

>>C:\Nihongo\x64\release>dir /B
Honyaku_No_Hojo.exe
Juman.dll
Nihongo.exe
test.txt

>>C:\Nihongo\x64\release>Nihongo.exe test.txt output.txt 0 9999
Loading dictionary.
ERROR: Unable to open dictionary file 'Dictionary.dat' for reading.
Building dictionary file from EDICT.
ERROR: Unable to open source file.  Error 2:(null)

>>C:\Nihongo\x64\release>dir /B
Honyaku_No_Hojo.exe
Juman.dll
Nihongo.exe
test.txt
It seems to be unable to connect to the EDICT-thing. :?

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 1:02 pm
by rpapo
Run from c:\Nihongo, not from c:\Nihongo\x64\Release. The program is looking for specific files relative to the project home directory, which in your case is c:\Nihongo.

My earlier instructions stated that...

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 1:03 pm
by Lery
My bad, the habit... :oops:

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 1:05 pm
by rpapo
Likewise, when you run Honyaku, do so from c:\Nihongo.

Re: Rpapo's Translation Assistant.

Posted: Sat May 25, 2013 1:07 pm
by Lery
Mhhh.

Code: Select all

>>C:\Nihongo>x64\release\Nihongo.exe test.txt output.txt 0 9999
Loading dictionary.
ERROR: Unable to open dictionary file 'Dictionary.dat' for reading.
Building dictionary file from EDICT.
  167018 dictionary entries.
Loading additional words.
ERROR: Unable to open file 'AddedWords.txt'.
Dumping dictionary to text file.
Building word/phrase index.
  6851438 index entries.
Saving dictionary.
ERROR: Unable to delete old dictionary file 'Dictionary.dat'.
Processing document.
ERROR: Invalid source file 'test.txt'.
Is that normal ?? There isn't any Dictionnary.dat in my folder, but a Dictionary.txt appeared. :wink:

Mhhh, Honyaku isn't working like that... So I guess the Dictionary.dat is required.
But if i create an empty file called "Dictionary.dat" then it still does the same error : "Unable to delete old dictionary file 'Dictionary.dat'."