Penelope
Abstract
Penelope is a multi-tool for creating, editing, converting, and merging electronic dictionaries, especially for eReader devices, like Kobo or Bookeen Cybook Odyssey devices.
I do not assume any legal liability or responsibility for any damage, data loss or inconvenience that you might cause to yourself or to other people by following the procedures below. RTFM, first.
Updates
IMPORTANT UPDATE (2015-02-22) This page is being discontinued and it will be kept online only for historical reasons. Please refer to the GitHub page instead.
IMPORTANT UPDATE (2014-06-30) I moved Penelope to GitHub, and released it under the MIT License, with the version code v2.0.0.
Features
With the current version (v. 2.0.1, 2015-01-25) of Penelope you can:
- convert a dictionary FROM/TO the following formats:
- Bookeen Cybook Odyssey (R/W)
- Kobo (R index only, W unencrypted/unobfuscated only)
- StarDict (R/W)
- XML (R/W)
- CSV (R/W)
- merge more dictionaries (of the same type) into a single dictionary
- define your own parser for each word/definition
- define your own collation function when outputting to Bookeen Cybook Odyssey format
- generate an EPUB file containing the index of a given dictionary (e.g., to cope with the lack of a search function on your eReader)
Download
Please download the files from the GitHub repo.
You can either:
- download the handy ZIP archive from the Releases tab (preferred option);
- clone the repository using Git (git); or
- download all the source files into the same directory, in raw format (not as HTML pages!).
You need Python
, either version 2.x or 3.x, installed on your
system to run Penelope.
You might need dictzip
installed in your system to read
from/write to StarDict dictionaries.
If you want to read from/write to Kobo format, you need a compiled version
of MARISA
.
In case, you must modify the value of variables
MARISA_BUILD_PATH
and MARISA_REVERSE_LOOKUP_PATH
in
penelope.py
(Python 2.x) or penelope3.py
(Python
3.x), making it pointing to the marisa-build
and
marisa-reverse-lookup
executables (see the corresponding
comments in the source code).
Usage
In a terminal, issue:
to get the list of available options:
Notes
- If you use Python 3.x, replace
penelope.py
withpenelope3.py
. - You must have the Python executable (or a directory containing it) listed
in your
PATH
environment variable, or you need to supply its full path. - If you get an error about
MARISA
, check that you have compiled it correctly, and that your user has the execution right on them. - Bear in mind that no official specifications are published by either Bookeen or Kobo, hence the dictionaries produced by Penelope for Bookeen Cybook Odyssey and Kobo devices work as far as their specifications have been reverse-engineered, by others and myself. (See, for example, the following MobileRead forum threads: T1 T2 T3 T4)
- I tried to comment every key point of my script and it should be easy to follow. I took this as a practical exercise to learn Python, so please forgive me if you find my code naive, and drop me an email with your advice to improve it, thanks!
Commented Examples
Example 1
Print usage message and exit
Example 2
Create English monolingual dictionary en.foo.dict
and
en.foo.dict.idx
from StarDict files foo.*
Example 3
Create English-to-Italian dictionary en-it.dict
and
en-it.dict.idx
from StarDict files bar.*
Example 4
Create English-to-Italian dictionary en-it.dict
and
en-it.dict.idx
merging together StarDict dictionaries
bar
, foo
, and zam
Example 5
Create English monolingual dictionary en.foo.dict
and
en.foo.dict.idx
, but the input dictionary foo.xml
is in XML format
Example 6
As above, but output in StarDict format instead of Bookeen Cybook Odyssey format
Example 7
As above, but outputs in Kobo format, creating
dicthtml-en-it.zip
Example 8
Reads from StarDict format and outputs in XML format, creating
bar.xml
, lowercasing all the keywords
Example 9
Reads from Kobo format and outputs the XML format, creating the dictionary
index in EPUB format bar.epub
Example 10
As above, but input is in Bookeen Cybook Odyssey format
Example 11
Create English-to-Italian dictionary but also set title, year and license metadata
Example 12
As above but set its title and use foo_parser.py
to parse the
input dictionary definitions. A detailed description of custom
parser/collation can be found in the old
page.
Example 13
As above but use custom_collation.py
to perfom key collation.
A detailed description of custom parser/collation can be found in the
old page.
Example 14
Create CSV English dictionary foo.csv
from XML dictionary
foo.xml
, and using a double tab as field separator, and a
newline as line separator
Example 15
Create XML English dictionary foo.xml
from CSV dictionary
foo.csv
, and using a double tab as field separator, and a
newline as line separator
Links
- The project files at GitHub
- The old repo at Google Code
- Related SBF Thread (Italian)
- MobileRead Thread about the dictionaries for Odyssey
- MobileRead Thread about the dictionaries for Kobo 1
- MobileRead Thread about the dictionaries for Kobo 2
- StarDict format
- XDXF format
- Bookeen homepage
- Kobo homepage
- List of ISO 639-1 codes for languages
- Old page, containing a detailed discussion of the dictionary format used by Bookeen Odyssey and Kobo devices