Subsetting Fonts with glyphIgo
RSS • Permalink • Created 16 Sep 2014 • Written by Alberto Pettarin
glyphIgo is a handy Python script that helps you deal with fonts and EPUB eBooks.
At the moment glyphIgo offers seven main commands. To get its usage message, run it without arguments:
$ python glyphIgo.py
usage: $ python glyphIgo.py check|convert|count|list|lookup|obfuscate|subset [options]
or add -h
or --help
to get the full list of options and examples:
$ python glyphIgo.py --help
You might also want to have a look at the usage examples with their output hosted in the GitHub repo.
In this post I would like to focus on the subset
command.
As the name suggests, when you subset a font file a.otf
according to a character set S
,
you want to get a font file b.otf
containing only the glyphs
present both in a.otf
and S
.
For example, a.otf
might be a universal font,
containing glyphs for all Unicode codepoints,
and S
might be the union of Basic Latin
and Cyrillic
Unicode Blocks.
You want the new font b.otf
to contain the glyphs of these two blocks only.
You might want to subset a given font to reduce its size (e.g., for embedding it into an eBook or to reduce the bandwidth of your Web site) or because you are required by the font license (many font foundries require you to do so, before embedding their font files into apps or documents, so that a user cannot extract the whole font and use it later).
In the following examples I will use an OTF font, but glyphIgo also works on TTF and WOFF fonts, as the heavy lifting is done by the FontForge library, which supports those formats as well.
Subsetting using EPUB files (-e
switch)
If you have an EPUB eBook ebook.epub
, and you want to embed
a big font file font.otf
, but you only used, say, Latin characters,
you might want to subset the font in order to avoid embedding
also the Cyrillic, Greek, Math, Mahjong Tiles (yes, it exists!), etc. glyphs.
With glyphIgo you do not even need to know exactly what characters you used: the script will compute the character set of your eBook and use it to subset your font:
$ python glyphIgo.py subset -f font.otf -e ebook.epub -o minimized.otf
[INFO] Subsetting font 'font.otf' with ebook 'ebook.epub' into new font 'minimized.otf', containing the following glyphs:
' ' 32 0x20 SPACE
'!' 33 0x21 EXCLAMATION MARK
''' 39 0x27 APOSTROPHE
'(' 40 0x28 LEFT PARENTHESIS
')' 41 0x29 RIGHT PARENTHESIS
'*' 42 0x2a ASTERISK
'+' 43 0x2b PLUS SIGN
',' 44 0x2c COMMA
'-' 45 0x2d HYPHEN-MINUS
'.' 46 0x2e FULL STOP
'/' 47 0x2f SOLIDUS
'0' 48 0x30 DIGIT ZERO
'1' 49 0x31 DIGIT ONE
'2' 50 0x32 DIGIT TWO
'3' 51 0x33 DIGIT THREE
'4' 52 0x34 DIGIT FOUR
'5' 53 0x35 DIGIT FIVE
'6' 54 0x36 DIGIT SIX
'7' 55 0x37 DIGIT SEVEN
'8' 56 0x38 DIGIT EIGHT
'9' 57 0x39 DIGIT NINE
':' 58 0x3a COLON
';' 59 0x3b SEMICOLON
'?' 63 0x3f QUESTION MARK
'@' 64 0x40 COMMERCIAL AT
'A' 65 0x41 LATIN CAPITAL LETTER A
'B' 66 0x42 LATIN CAPITAL LETTER B
'C' 67 0x43 LATIN CAPITAL LETTER C
'D' 68 0x44 LATIN CAPITAL LETTER D
'E' 69 0x45 LATIN CAPITAL LETTER E
'F' 70 0x46 LATIN CAPITAL LETTER F
'G' 71 0x47 LATIN CAPITAL LETTER G
'H' 72 0x48 LATIN CAPITAL LETTER H
'I' 73 0x49 LATIN CAPITAL LETTER I
'J' 74 0x4a LATIN CAPITAL LETTER J
'K' 75 0x4b LATIN CAPITAL LETTER K
'L' 76 0x4c LATIN CAPITAL LETTER L
'M' 77 0x4d LATIN CAPITAL LETTER M
'N' 78 0x4e LATIN CAPITAL LETTER N
'O' 79 0x4f LATIN CAPITAL LETTER O
'P' 80 0x50 LATIN CAPITAL LETTER P
'Q' 81 0x51 LATIN CAPITAL LETTER Q
'R' 82 0x52 LATIN CAPITAL LETTER R
'S' 83 0x53 LATIN CAPITAL LETTER S
'T' 84 0x54 LATIN CAPITAL LETTER T
'U' 85 0x55 LATIN CAPITAL LETTER U
'V' 86 0x56 LATIN CAPITAL LETTER V
'W' 87 0x57 LATIN CAPITAL LETTER W
'X' 88 0x58 LATIN CAPITAL LETTER X
'Z' 90 0x5a LATIN CAPITAL LETTER Z
'`' 96 0x60 GRAVE ACCENT
'a' 97 0x61 LATIN SMALL LETTER A
'b' 98 0x62 LATIN SMALL LETTER B
'c' 99 0x63 LATIN SMALL LETTER C
'd' 100 0x64 LATIN SMALL LETTER D
'e' 101 0x65 LATIN SMALL LETTER E
'f' 102 0x66 LATIN SMALL LETTER F
'g' 103 0x67 LATIN SMALL LETTER G
'h' 104 0x68 LATIN SMALL LETTER H
'i' 105 0x69 LATIN SMALL LETTER I
'j' 106 0x6a LATIN SMALL LETTER J
'k' 107 0x6b LATIN SMALL LETTER K
'l' 108 0x6c LATIN SMALL LETTER L
'm' 109 0x6d LATIN SMALL LETTER M
'n' 110 0x6e LATIN SMALL LETTER N
'o' 111 0x6f LATIN SMALL LETTER O
'p' 112 0x70 LATIN SMALL LETTER P
'q' 113 0x71 LATIN SMALL LETTER Q
'r' 114 0x72 LATIN SMALL LETTER R
's' 115 0x73 LATIN SMALL LETTER S
't' 116 0x74 LATIN SMALL LETTER T
'u' 117 0x75 LATIN SMALL LETTER U
'v' 118 0x76 LATIN SMALL LETTER V
'w' 119 0x77 LATIN SMALL LETTER W
'x' 120 0x78 LATIN SMALL LETTER X
'y' 121 0x79 LATIN SMALL LETTER Y
'z' 122 0x7a LATIN SMALL LETTER Z
'{' 123 0x7b LEFT CURLY BRACKET
'}' 125 0x7d RIGHT CURLY BRACKET
' ' 160 0xa0 NO-BREAK SPACE
'§' 167 0xa7 SECTION SIGN
'©' 169 0xa9 COPYRIGHT SIGN
'È' 200 0xc8 LATIN CAPITAL LETTER E WITH GRAVE
'à' 224 0xe0 LATIN SMALL LETTER A WITH GRAVE
'á' 225 0xe1 LATIN SMALL LETTER A WITH ACUTE
'è' 232 0xe8 LATIN SMALL LETTER E WITH GRAVE
'é' 233 0xe9 LATIN SMALL LETTER E WITH ACUTE
'ì' 236 0xec LATIN SMALL LETTER I WITH GRAVE
'ñ' 241 0xf1 LATIN SMALL LETTER N WITH TILDE
'ò' 242 0xf2 LATIN SMALL LETTER O WITH GRAVE
'ù' 249 0xf9 LATIN SMALL LETTER U WITH GRAVE
'–' 8211 0x2013 EN DASH
'—' 8212 0x2014 EM DASH
'’' 8217 0x2019 RIGHT SINGLE QUOTATION MARK
'“' 8220 0x201c LEFT DOUBLE QUOTATION MARK
'”' 8221 0x201d RIGHT DOUBLE QUOTATION MARK
'…' 8230 0x2026 HORIZONTAL ELLIPSIS
Note that, in this case, if we had used an online tool like FontSquirrel
and selected only the Latin subset, we might have failed
to remember to include the 0x20??
characters.
(Please read the Technical Notes to know the details of how glyphIgo computes the eBook character set.)
Subsetting using Unicode ranges (-p
switch)
OK, but what if you need to subset a font and you know exactly what characters you want to keep? In other words, you want to specify a font and one or more Unicode ranges, and get the corresponding subset into a new font.
With glyphIgo the process is simple:
just create a UTF-8, plain text file,
containing all the characters you need,
and use the -p
switch instead of -e
.
For example, if you just want the union of 0-9
and A-F
ranges,
you will write a file 09AF.set
containing:
$ echo -n "0123456789ABCDEF" > 09AF.set
$ cat 09AF.set
0123456789ABCDEF
Then, you will specify it using the -p
parameter:
$ python glyphIgo.py subset -f font.otf -p 09AF.set -o minimized.otf
[INFO] Subsetting font 'font.otf' with ebook '09AF.set' into new font 'minimized.otf', containing the following glyphs:
'0' 48 0x30 DIGIT ZERO
'1' 49 0x31 DIGIT ONE
'2' 50 0x32 DIGIT TWO
'3' 51 0x33 DIGIT THREE
'4' 52 0x34 DIGIT FOUR
'5' 53 0x35 DIGIT FIVE
'6' 54 0x36 DIGIT SIX
'7' 55 0x37 DIGIT SEVEN
'8' 56 0x38 DIGIT EIGHT
'9' 57 0x39 DIGIT NINE
'A' 65 0x41 LATIN CAPITAL LETTER A
'B' 66 0x42 LATIN CAPITAL LETTER B
'C' 67 0x43 LATIN CAPITAL LETTER C
'D' 68 0x44 LATIN CAPITAL LETTER D
'E' 69 0x45 LATIN CAPITAL LETTER E
'F' 70 0x46 LATIN CAPITAL LETTER F
Using the list
command, we can check that the new font minimized.otf
contains indeed:
$ python glyphIgo.py list -f minimized.otf
[INFO] Glyphs in 'minimized.otf':
'0' 48 0x30 DIGIT ZERO
'1' 49 0x31 DIGIT ONE
'2' 50 0x32 DIGIT TWO
'3' 51 0x33 DIGIT THREE
'4' 52 0x34 DIGIT FOUR
'5' 53 0x35 DIGIT FIVE
'6' 54 0x36 DIGIT SIX
'7' 55 0x37 DIGIT SEVEN
'8' 56 0x38 DIGIT EIGHT
'9' 57 0x39 DIGIT NINE
'A' 65 0x41 LATIN CAPITAL LETTER A
'B' 66 0x42 LATIN CAPITAL LETTER B
'C' 67 0x43 LATIN CAPITAL LETTER C
'D' 68 0x44 LATIN CAPITAL LETTER D
'E' 69 0x45 LATIN CAPITAL LETTER E
'F' 70 0x46 LATIN CAPITAL LETTER F
Clearly, if you need to subset several fonts, or several/complex character sets, you can generate (and store) your own range files:
$ for ((i=0x0000; i<=0x007F; i++)); do printf '\u'`printf %04x $i`; done > BasicLatin.set
$ for ((i=0x0400; i<=0x04FF; i++)); do printf '\u'`printf %04x $i`; done > Cyrillic.set
$ for ((i=0x16A0; i<=0x16FF; i++)); do printf '\u'`printf %04x $i`; done > Runic.set
$ printf '\u2026' > HorizontalEllipsis.set
$ cat Runic.set
ᚠᚡᚢᚣᚤᚥᚦᚧᚨᚩᚪᚫᚬᚭᚮᚯᚰᚱᚲᚳᚴᚵᚶᚷᚸᚹᚺᚻᚼᚽᚾᚿᛀᛁᛂᛃᛄᛅᛆᛇᛈᛉᛊᛋᛌᛍᛎᛏᛐᛑᛒᛓᛔᛕᛖᛗᛘᛙᛚᛛᛜᛝᛞᛟᛠᛡᛢᛣᛤᛥᛦᛧᛨᛩᛪ᛫᛬᛭ᛮᛯᛰᛱᛲᛳᛴᛵᛶᛷᛸ
$ cat HorizontalEllipsis.set
…
$ cat BasicLatin.set Cyrillic.set Runic.set HorizontalEllipsis.set > my.set
$ python glyphIgo.py subset -f font.otf -p my.set -o minimized.otf
[INFO] Subsetting font 'font.otf' with ebook 'my.set' into new font 'minimized.otf', containing the following glyphs:
' ' 32 0x20 SPACE
'!' 33 0x21 EXCLAMATION MARK
'"' 34 0x22 QUOTATION MARK
...
'ӵ' 1269 0x4f5 CYRILLIC SMALL LETTER CHE WITH DIAERESIS
'Ӹ' 1272 0x4f8 CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
'ӹ' 1273 0x4f9 CYRILLIC SMALL LETTER YERU WITH DIAERESIS
'…' 8230 0x2026 HORIZONTAL ELLIPSIS
In this example:
- the Basic Latin Unicode block includes the codepoints
0x0000..0x007F
, the Cyrillic block0x0400..0x04FF
, and the Runic block0x16A0..0x16FF
. If you need the hexadecimal range of a particular Unicode block, you can find them in this file; font.otf
does not contain Runic glyphs, and hence alsominimized.otf
does not (glyphIgo cannot create glyphs that are not present in the original font!);- you can also add single characters, like the horizontal ellipsis (
0x2026
).
Happy font subsetting with glyphIgo!