So it seems that a lot of what I have done around here is writing up the daily kanji posts (approx. 66% of my posts). Boy, oh boy do I love writing those posts. However, the thought of opening the edict in a text editor and using the ol' ctr-f left me feeling nauseous with cold sweats, ants in my pants and even caused my lawn to turn a shade directly between green and dead. To get over this horrible feeling I immediately opened my nearest text editor and started checking what was new on slashdot. Once I had that out of my system, I came up with a little script to do all that searching for me. After adding in the ability to know what words were already shown, I had a way to find words from edict that would include only the kanji I have posted, contain the (P) flag, which is the words flagged as priority to learn, and only show new words that haven't been shown yet. So when we did 二 it showed a bunch of words based on that kanji. When I showed 十, it showed 二十 because 二 was a kanji we had done before. I also made it match with any hirigana or katakana.
To note: I had saved the edict file in UTF-8 encoding. It will not work correctly with the JIS encoding it comes in.
You can get the edict file here.
To note: I had saved the edict file in UTF-8 encoding. It will not work correctly with the JIS encoding it comes in.
You can get the edict file here.
<html> <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><head> <body> <?php //This code is not under any liceance and can be considered public domain setlocale( LC_ALL, 'en_US.UTF-8' ); //This section grabs all the words that have already been posted. $shownalreadywords=array(); if(file_exists("showalready.txt")) $shownalready = fopen("showalready.txt","rw"); else $shownalready = fopen("showalready.txt","x+"); while (!feof($shownalready)) { $shownalreadywords[trim(fgets($shownalready))]=true; } fclose($shownalready); //end section //reopen the file for appending new words found $shownalready = fopen("showalready.txt","a+"); $handle = fopen("edict.txt","r"); //This regex will get run on each line as we read through the file. //To get the new words, simply add the new kanji into the regex. $preg='/^([\p{Katakana}\p{Hiragana}一二三四五六七八九十右左百上下千中人日円金見天気水木火]*) \[.*\(P\).*/u'; //this loop will grab each line of the edict file and run the regex //against it. When it finds a match it will check if that word //was posted yet. If it has not it echos it to the browser and //appends it to the showalready.txt file. while (!feof($handle)) { $buffer = fgets($handle); if(preg_match($preg,$buffer,$matches)) { if(!isset($shownalreadywords[$matches[1]])) { //write out that we have shown this word. fputs($shownalready,$matches[1]."\n"); //There seems to be multiple listings for the same word. //This will make sure we only get it once. $shownalreadywords[$matches[1]]=true; echo $matches[1]."<BR>"; } } } fclose($handle); fclose($shownalready); ?> </body> </html>
Comments (0)