I wrote some articles about the way of converting StarDict dictionaries into the Apple’s Dictionary format by means of DictUnifier.app, but it does not seem that all of StarDict dictionaries can be always easily converted into Apple’s format. Quite a few errors were reported remaining unsolved on the bulletin board of the project “mac-dictionary-kit”.
This issue also occurred to me yesterday, during conversion of Klaus Mylius’ Sanskrit-Deutsch Dictionary:
An error occurred during the
tr: illegal byte sequence
$ cd ~/tmp $ wget http://mac-dictionary-kit.googlecode.com/files/sdconv-0.3.tar.bz2 $ bunzip2 -c sdconv-0.3.tar.bz2 | tar xvf - $ cp -auv sdconv /usr/local/ $ /usr/local/sdconv/convert stardict-mylius-sanskrit-deutsch.tar.bz2 ... - Building mylius.dictionary. - Cleaning objects directory. - Preparing dictionary template. - Preprocessing dictionary sources. tr: Illegal byte sequence Error. ...
OK. I confirmed the same error. Next time, I tried with LC_ALL=C (given a hint here).
$ LC_ALL=C /usr/local/sdconv/convert stardict-mylius-sanskrit-deutsch.tar.bz2 ... - Building mylius.dictionary. - Cleaning objects directory. - Preparing dictionary template. - Preprocessing dictionary sources. utf8 "xB9" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 5045. utf8 "xC4" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 9403. utf8 "xB9" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 25687. utf8 "xB9" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 27566. utf8 "xB9" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 36932. utf8 "xDC" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 50059. utf8 "xC4" does not map to Unicode at /usr/local/sdconv/bin/make_line.pl line 51, <> chunk 52864. - Extracting index data. - Preparing dictionary bundle. - Adding body data. - Preparing index data. - Building key_text index. - Building reference index. - Fixing dictionary property. - Copying CSS. - Finished building objects/mylius.dictionary. Done. ...
In this turn, the conversion completed even with some warnings. Check out the operation from Dictionary.app:
I don’t have full assurance of the continuous stability, but for the moment, it seems good.
DictUnifier.app is a very useful tool, but it appears that we must not put too much confidence in it.
Convert Babylon Dictionaries?!
While browsing the reports of the mac-dictionary-kit’s Issues, I found the interesting comment posted on the Issue 4, “Stardict-babylon format not supported” (Comment no. 27). He explained “HOW CONVERT/ADD BABYLON DICTIONARIES TO MAC DICTIONARY”. Provided that this comes up on Mac OS X, we could get use of thousands of Babylon free dictionaries found here:
After having finished following these procedures by myself, to tell the conclusion first, I found them a little bit complicated. Furthermore, to realize this conversion, we have to work on Linux (such as Ubuntu). So this will not be a topic toward all of Mac Users.
Nevertheless, I will concisely note the procedure below before I forget it. Remember that it could be useful, but without any warranty. I’m NOT matured with Linux, please let me know if I commit fatal mistakes.
The idea is to convert existing Babylon dictionaries to StarDict format (on Linux), then StarDict to Apples Dictionary (on Mac OS X). A dictionary that I’ve chosen is Jeffrey Hopkins’ Tibetan-Sanskrit-English Dictionary, which is provided here only as *.bgl format.
Hereafter for a while, I will work on Ubuntu 10.10 (Maverick Meerkat) using VMWare Fusion (3.1.3) in Mac OSX.
In a terminal (Ubuntu):
$ cd ubuntu-work # <-- shared with osx $ mkdir hopkins $ cd hopkins $ wget http://buddhistinformatics.ddbc.edu.tw/glossaries/files/babylon-hopkins.ddbc.bgl ... $ ls babylon-hopkins.ddbc.bgl $ sudo apt-get install stardict stardict-tools dictconv ... $ dictconv babylon-hopkins.ddbc.bgl -o babylon-hopkins.ddbc.ifo # <-- pay attention "*.ifo" ... Results File: babylon-hopkins.ddbc.ifo Title: Jeffrey Hopkins' Tibetan-Sanskrit-English Dictionary Author: Jeffrey Hopkins Email: email@example.com Version: License: This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Description: Original Language: Other Destination Language: English Headwords: 18441 Words: 18382 $ ls babylon-hopkins.ddbc.idx babylon-hopkins.ddbc.bgl # <-- confirm 3 files (*.ifo, *.dict, *.idx) babylon-hopkins.ddbc.ifo babylon-hopkins.ddbc.dict $ mkdir hopkins-stardict $ mv babylon-hopkins.ddbc.idx babylon-hopkins.ddbc.ifo babylon-hopkins.ddbc.dict hopkins-stardict/ $ sudo cp -auv hopkins-stardict /usr/share/stardict/dic/hopkins $ stardict & # <-- launch stardict
At this point, you may get to see the following window:
At first sight, it seems OK, but some characters are broken and the HTML tags (<p> </p>, etc.) remain displayed.
Bizarre… I have to check the contents.
$ cd hopkins-stardict/ $ stardict2txt babylon-hopkins.ddbc.ifo # <-- convert to text file Write to file: babylon-hopkins.ddbc.txt $ emacs babylon-hopkins.ddbc.txt
Some characters are not correctly converted (ā, ī, ū, etc.). So I MANUALLY picked up the mojibake, verifying the correct characters as compared with GoldenDict’s display1, and tried to convert HTML tags into linefeed code (n) like this:
# hopkins_conv.sed # for unicode characters s/Ä/ā/g s/âˆš/√/g s/á¹/ṭ/g s/á¹ƒ/ṃ/g s/Å›/ś/g s/á¹‡/ṇ/g s/Ã±/ñ/g s/á¹›/ṛ/g s/á¸/ḍ/g s/Å«/ū/g s/á¹£/ṣ/g s/Ä€/Ā/g s/Ä«/ī/g s/á¸¥/ḥ/g s/á¹…/ṅ/g s/á¹/ṝ/g s/Åš/Ś/g s/â€”/---/g # for HTML tags s/<p><b>/n/g s/<b>/n/g s/</b>/n/g s/</p>//g s/<ul><li>//g s/</li><li>/n/g s/</li></ul>//g s/nn/n/g
This manual procedures were so annoying and some oversight may remain… Does anyone know the better way?
And then, apply this to original file (babylon-hopkins.ddbc.txt).
$ sed -f hopkins_conv.sed babylon-hopkins.ddbc.txt > babylon-hopkins-rev.txt $ /usr/lib/stardict-tools/tabfile babylon-hopkins-rev.txt Convert over. babylon-hopkins-rev wordcount: 18382 $ mkdir hopkins-rev $ mv babylon-hopkins-rev.ifo babylon-hopkins-rev.idx babylon-hopkins-rev.dict.dz hopkins-rev/ $ sudo cp -auv hopkins-rev /usr/share/stardict/dic/hopkins $ stardict &
This time seems good.
OK, now we come back to Mac OS X.
In a terminal (Mac OS X):
$ cd ~/ubuntu-work/hopkins/hopkins-stardict/ # <-- shared with ubuntu $ tar -jcvf babylon-stardict-hopkins.tar.bz2 hopkins-rev $ /usr/local/sdconv/convert babylon-stardict-hopkins.tar.bz2 -n Hopkins_Tibetan_Dictionary -i hopkins ... Done. To test the new dictionary, try Dictionary.app. $ open -a Dictionary.app
Voilà, it works!