EPUB reading systems vs invalid OCFs
RSS • Permalink • Created 08 Jun 2014 • Written by Alberto Pettarin
Prompted by a Twitter conversation generated by my previous post, I investigated whether some popular, real world EPUB reading systems actually check for OCF conformance when loading an EPUB file. You know, for science.
Methodology
I started with creating this valid EPUB file.
Then, I derived from it three variations:
- testocf1.epub by just zipping all the files with
zip -r testocf1.epub *
, - testocf2.epub by also deleting the
mimetype
file, - testocf3.epub by also deleting the
mimetype
file and theMETA-INF
directory.
(I also modified the title and the UUID of each EPUB file, to avoid caching effects.)
Clearly, these files are detected as invalid by EpubCheck:
Please observe that in the third case, the content validation is not performed,
as the container.xml
is not present, so the OPF cannot be read.
I tried to sideload these three files on the following reading systems:
- Blio (iOS)
- Bluefire Reader (iOS)
- Calibre (Debian)
- Kobo Glo (eReader)
- iBooks (iOS)
- Lektz (iOS)
- Lucifox (plugin Mozilla Firefox, Debian)
- Lydhor (iOS)
- Marvin (iOS)
- Menestrello (Android, iOS)
- Readium (plugin Google Chrome, Debian)
- txtr (iOS)
And verified whether the files were sideloaded successfully and available to the user.
I chose these ones because I have them readily at hand this morning, they all allow sideloading, and they are quite popular. Building an extensive survey was not the purpose of this experiment.
Results
All the above reading systems open testocf1.epub (incorrect zipping).
All the above reading systems open testocf2.epub (no mimetype
),
except iBooks which alerts the user that the file is invalid.
On testocf3.epub (no mimetype
and META-INF
),
there are several different behaviors:
- Blio does not open the book and it does not alert the user,
- Bluefire Reader creates a dummy "ePub" item in the library that cannot be opened (but it can be deleted),
- Calibre opens the file (I guess it searches for an OPF file in the container),
- iBooks does not open the book and it does not alert the user,
- Kobo Glo reports the file as "protected by Adobe DRM" (and it does not open it, as my test device is not registered with Adobe),
- Lektz alerts the user that the file is invalid,
- Lucifox alerts the user that the file is invalid,
- Lydhor crashes,
- Marvin opens the file (I guess it searches for an OPF file in the container),
- Menestrello alerts the user that the file is invalid,
- Readium alerts the user that the file is invalid,
- txtr crashes.
Comments
It looks like no tested reading system actually checks whether an EPUB file has been properly zipped.
The same observation, except for iBooks, holds for the presence of the mimetype
file.
These two facts support the argumentation of those proposing to ditch it in a future version of the EPUB specification, (at least for what concerns reading systems).
(Note: iBooks seems to also check that mimetype
actually starts with the application/epub+zip
string,
but it does not check if it contains only that.
See testocf4.epub, testocf5.epub, and testocf6.epub.)
On the other hand, almost all the reading systems
do not render the third test file, since they miss the location of the OPF file,
which must be coded in the (missing) META-INF/container.xml
.
The exceptions are Calibre and Marvin, which I guess take an heuristic approach,
looking for an OPF file in the EPUB container anyway.
(And, I do not like it. At least, alert the user that the file is malformed.)
It is unclear to me whether multiple renditions (say, XHTML+SVG, or XHTML+PDF)
are actually popular, or we are really going to have multiple OPF files
(in EPUB2, having multiple OPF-based rootfile
elements was discouraged,
and only the first one was to be processed),
so that META-INF/container.xml
will actually prove useful.
Open problems and future work
What do you think about this issue? Is the complexity of this (and similar) mechanism necessary? What use scenarios does it serve? Shall the IDPF consider simplifying the OCF (and other aspects of EPUB)?