Penelope
Abstract
Penelope is a multi-tool for creating, editing, converting, and merging electronic dictionaries, especially for eReader devices, like Kobo or Bookeen Cybook Odyssey devices.
I do not assume any legal liability or responsibility for any damage, data loss or inconvenience that you might cause to yourself or to other people by following the procedures below. RTFM, first.
Updates
IMPORTANT UPDATE (2015-02-22) This page is being discontinued and it will be kept online only for historical reasons. Please refer to the GitHub page instead.
IMPORTANT UPDATE (2014-06-30) I moved Penelope to GitHub, and released it under the MIT License, with the version code v2.0.0.
Features
With the current version (v. 2.0.1, 2015-01-25) of Penelope you can:
- convert a dictionary FROM/TO the following formats:
- Bookeen Cybook Odyssey (R/W)
- Kobo (R index only, W unencrypted/unobfuscated only)
- StarDict (R/W)
- XML (R/W)
- CSV (R/W)
- merge more dictionaries (of the same type) into a single dictionary
- define your own parser for each word/definition
- define your own collation function when outputting to Bookeen Cybook Odyssey format
- generate an EPUB file containing the index of a given dictionary (e.g., to cope with the lack of a search function on your eReader)
Download
Please download the files from the GitHub repo.
You can either:
- download the handy ZIP archive from the Releases tab (preferred option);
- clone the repository using Git (git); or
- download all the source files into the same directory, in raw format (not as HTML pages!).
You need Python
, either version 2.x or 3.x, installed on your
system to run Penelope.
You might need dictzip
installed in your system to read
from/write to StarDict dictionaries.
If you want to read from/write to Kobo format, you need a compiled version
of MARISA
.
In case, you must modify the value of variables
MARISA_BUILD_PATH
and MARISA_REVERSE_LOOKUP_PATH
in
penelope.py
(Python 2.x) or penelope3.py
(Python
3.x), making it pointing to the marisa-build
and
marisa-reverse-lookup
executables (see the corresponding
comments in the source code).
Usage
In a terminal, issue:
$ python penelope.py -h
to get the list of available options:
$ python penelope.py -p <prefix list> -f <language_from> -t <language_to> [OPTIONS]
Required arguments:
-p <prefix list> : list of the dictionaries to be merged/converted (without extension, comma separated)
-f <language_from> : ISO 631-2 code language_from of the dictionary to be converted
-t <language_to> : ISO 631-2 code language_to of the dictionary to be converted
Optional arguments:
-d : enable debug mode and do not delete temporary files
-h : print this usage message and exit
-i : ignore word case while building the dictionary index
-z : create the .install zip file containing the dictionary and the index
--sd : input dictionary in StarDict format (default)
--odyssey : input dictionary in Bookeen Cybook Odyssey format
--xml : input dictionary in XML format
--kobo : input dictionary in Kobo format (reads the index only!)
--csv : input dictionary in CSV format
--output-odyssey : output dictionary in Bookeen Cybook Odyssey format (default)
--output-sd : output dictionary in StarDict format
--output-xml : output dictionary in XML format
--output-kobo : output dictionary in Kobo format
--output-csv : output dictionary in CSV format
--output-epub : output EPUB file containing the index of the input dictionary
--title <string> : set the title string shown on the Odyssey screen to <string>
--license <string> : set the license string to <string>
--copyright <string> : set the copyright string to <string>
--description <string> : set the description string to <string>
--year <string> : set the year string to <string>
--parser <parser.py> : use <parser.py> to parse the input dictionary
--collation <coll.py> : use <coll.py> as collation function when outputting in Bookeen Cybook Odyssey format
--fs <string> : use <string> as CSV field separator, escaping ASCII sequences (default: \t)
--ls <string> : use <string> as CSV line separator, escaping ASCII sequences (default: \n)
Examples:
$ python penelope.py -h
$ python penelope.py -p foo -f en -t en
$ python penelope.py -p bar -f en -t it
$ python penelope.py -p "bar,foo,zam" -f en -t it
$ python penelope.py --xml -p foo -f en -t en
$ python penelope.py --xml -p foo -f en -t en --output-sd
$ python penelope.py -p bar -f en -t it --output-kobo
$ python penelope.py -p bar -f en -t it --output-xml -i
$ python penelope.py --kobo -p bar -f it -t it --output-epub
$ python penelope.py --odyssey -p bar -f en -t en --output-epub
$ python penelope.py -p bar -f en -t it --title "My EN->IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"
$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"
$ python penelope.py -p foo -f en -t en --collation custom_collation.py
$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n"
$ python penelope.py --csv -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n"
Notes
- If you use Python 3.x, replace
penelope.py
withpenelope3.py
. - You must have the Python executable (or a directory containing it) listed
in your
PATH
environment variable, or you need to supply its full path. - If you get an error about
MARISA
, check that you have compiled it correctly, and that your user has the execution right on them. - Bear in mind that no official specifications are published by either Bookeen or Kobo, hence the dictionaries produced by Penelope for Bookeen Cybook Odyssey and Kobo devices work as far as their specifications have been reverse-engineered, by others and myself. (See, for example, the following MobileRead forum threads: T1 T2 T3 T4)
- I tried to comment every key point of my script and it should be easy to follow. I took this as a practical exercise to learn Python, so please forgive me if you find my code naive, and drop me an email with your advice to improve it, thanks!
Commented Examples
Example 1
$ python penelope.py -h
Print usage message and exit
Example 2
$ python penelope.py -p foo -f en -t en
Create English monolingual dictionary en.foo.dict
and
en.foo.dict.idx
from StarDict files foo.*
Example 3
$ python penelope.py -p bar -f en -t it
Create English-to-Italian dictionary en-it.dict
and
en-it.dict.idx
from StarDict files bar.*
Example 4
$ python penelope.py -p "bar,foo,zam" -f en -t it
Create English-to-Italian dictionary en-it.dict
and
en-it.dict.idx
merging together StarDict dictionaries
bar
, foo
, and zam
Example 5
$ python penelope.py --xml -p foo -f en -t en
Create English monolingual dictionary en.foo.dict
and
en.foo.dict.idx
, but the input dictionary foo.xml
is in XML format
Example 6
$ python penelope.py --xml -p foo -f en -t en --output-sd
As above, but output in StarDict format instead of Bookeen Cybook Odyssey format
Example 7
$ python penelope.py -p bar -f en -t it --output-kobo
As above, but outputs in Kobo format, creating
dicthtml-en-it.zip
Example 8
$ python penelope.py -p bar -f en -t it --output-xml -i
Reads from StarDict format and outputs in XML format, creating
bar.xml
, lowercasing all the keywords
Example 9
$ python penelope.py --kobo -p bar -f it -t it --output-epub
Reads from Kobo format and outputs the XML format, creating the dictionary
index in EPUB format bar.epub
Example 10
$ python penelope.py --odyssey -p bar -f en -t en --output-epub
As above, but input is in Bookeen Cybook Odyssey format
Example 11
$ python penelope.py -p bar -f en -t it --title "My EN-IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"
Create English-to-Italian dictionary but also set title, year and license metadata
Example 12
$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"
As above but set its title and use foo_parser.py
to parse the
input dictionary definitions. A detailed description of custom
parser/collation can be found in the old
page.
Example 13
$ python penelope.py -p foo -f en -t en --collation custom_collation.py
As above but use custom_collation.py
to perfom key collation.
A detailed description of custom parser/collation can be found in the
old page.
Example 14
$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n"
Create CSV English dictionary foo.csv
from XML dictionary
foo.xml
, and using a double tab as field separator, and a
newline as line separator
Example 15
$ python penelope.py --csv -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n"
Create XML English dictionary foo.xml
from CSV dictionary
foo.csv
, and using a double tab as field separator, and a
newline as line separator
Links
- The project files at GitHub
- The old repo at Google Code
- Related SBF Thread (Italian)
- MobileRead Thread about the dictionaries for Odyssey
- MobileRead Thread about the dictionaries for Kobo 1
- MobileRead Thread about the dictionaries for Kobo 2
- StarDict format
- XDXF format
- Bookeen homepage
- Kobo homepage
- List of ISO 639-1 codes for languages
- Old page, containing a detailed discussion of the dictionary format used by Bookeen Odyssey and Kobo devices