Some additional information about the program. It was written to help me work on these translations, automating what was within my abilities. I am not a professor of linguistics with a specialty in computer science, but I have been writing code professionally for more than thirty years now.
The program is NOT a translator. It is a parser. That is, it attempts to break Japanese text into it's constituent words, and then once that is done it does exhaustive dictionary lookups on the resultant words, giving a translator much of the raw material he needs to piece together the meaning of the text. It does not attempt to provide the resultant translation, though it provides much of what is needed as input to for a translator program, should someone attempt to write one. That task would require a considerable background in theoretical linguistics, I think.
To use what I have built, in its current form, you need to download the file given in the previous posting and unzip it to a directory/folder all it's own. On my system, this is c:\Projects\Nihongo, though you may wish to just use c:\Nihongo. If you have Microsoft Visual Studio 2008 installed on your system, then you can load the solution file, Nihongo.sln and rebuild the whole thing for yourself. The startup project should be set to "Nihongo". There are several other semi-independent projects in the solution, "Analyzer" and "Parser2", as well as several subcomponents, "JDICT", "JIS0208" and "Juman". More on that stuff later, as it is only relevant to people actually trying to play with the code and modify it.
A copy of the main dictionary file used by the program, EDICT, can be found in the JDICT folder. This file may be updated at any time by downloading a new dictionary from the WWWJDICT web site at
http://www.csse.monash.edu.au/~jwb/edict.html. Since this dictionary is constantly improving, updating it every now and then is a good idea. Nothing like getting free improvements!
There are a number of things that need to be done with the program yet, including:
- (1) Externalize the personal additions to the dictionary, placing them into a unicode TXT file that you don't have to be a programmer to update.
(2) Create a graphical user interface with (as a minimum) controls for specifying the input file, the output file and which line(s) to parse. Later improvements could include providing a user interface for the maintenance of the personal dictionary extensions.
(3) Improve the parser to handle yet more of the possible verb/adjective conjugations. To do this without requiring a system with 32Gb (or more!) of RAM, I started some while ago a parallel project which determines conjugations on the fly. This is the "Parser2" project. It is far from complete, and not ready to use.
I have been trying to restrain myself from working too much on the program, mainly because I need/want to improve my personal grasp of Japanese. Computer programming for me is fun, and all too tempting when there are other things I ought to be doing with my time. That said, though, writing this program hugely improved my knowledge of how the language works grammatically, and especially in the area of verb and adjective conjugations.