Subsetting Fonts with glyphIgo

RSS  •  Permalink  •  Created 16 Sep 2014  •  Written by Alberto Pettarin

glyphIgo is a handy Python script that helps you deal with fonts and EPUB eBooks.

At the moment glyphIgo offers seven main commands. To get its usage message, run it without arguments:

$ python glyphIgo.py 
usage: $ python glyphIgo.py check|convert|count|list|lookup|obfuscate|subset [options]

or add -h or --help to get the full list of options and examples:

$ python glyphIgo.py --help

You might also want to have a look at the usage examples with their output hosted in the GitHub repo.

In this post I would like to focus on the subset command.

As the name suggests, when you subset a font file a.otf according to a character set S, you want to get a font file b.otf containing only the glyphs present both in a.otf and S. For example, a.otf might be a universal font, containing glyphs for all Unicode codepoints, and S might be the union of Basic Latin and Cyrillic Unicode Blocks. You want the new font b.otf to contain the glyphs of these two blocks only.

You might want to subset a given font to reduce its size (e.g., for embedding it into an eBook or to reduce the bandwidth of your Web site) or because you are required by the font license (many font foundries require you to do so, before embedding their font files into apps or documents, so that a user cannot extract the whole font and use it later).

In the following examples I will use an OTF font, but glyphIgo also works on TTF and WOFF fonts, as the heavy lifting is done by the FontForge library, which supports those formats as well.

Subsetting using EPUB files (-e switch)

If you have an EPUB eBook ebook.epub, and you want to embed a big font file font.otf, but you only used, say, Latin characters, you might want to subset the font in order to avoid embedding also the Cyrillic, Greek, Math, Mahjong Tiles (yes, it exists!), etc. glyphs.

With glyphIgo you do not even need to know exactly what characters you used: the script will compute the character set of your eBook and use it to subset your font:

$ python glyphIgo.py subset -f font.otf -e ebook.epub -o minimized.otf

[INFO] Subsetting font 'font.otf' with ebook 'ebook.epub' into new font 'minimized.otf', containing the following glyphs:
' ' 32  0x20    SPACE
'!' 33  0x21    EXCLAMATION MARK
''' 39  0x27    APOSTROPHE
'(' 40  0x28    LEFT PARENTHESIS
')' 41  0x29    RIGHT PARENTHESIS
'*' 42  0x2a    ASTERISK
'+' 43  0x2b    PLUS SIGN
',' 44  0x2c    COMMA
'-' 45  0x2d    HYPHEN-MINUS
'.' 46  0x2e    FULL STOP
'/' 47  0x2f    SOLIDUS
'0' 48  0x30    DIGIT ZERO
'1' 49  0x31    DIGIT ONE
'2' 50  0x32    DIGIT TWO
'3' 51  0x33    DIGIT THREE
'4' 52  0x34    DIGIT FOUR
'5' 53  0x35    DIGIT FIVE
'6' 54  0x36    DIGIT SIX
'7' 55  0x37    DIGIT SEVEN
'8' 56  0x38    DIGIT EIGHT
'9' 57  0x39    DIGIT NINE
':' 58  0x3a    COLON
';' 59  0x3b    SEMICOLON
'?' 63  0x3f    QUESTION MARK
'@' 64  0x40    COMMERCIAL AT
'A' 65  0x41    LATIN CAPITAL LETTER A
'B' 66  0x42    LATIN CAPITAL LETTER B
'C' 67  0x43    LATIN CAPITAL LETTER C
'D' 68  0x44    LATIN CAPITAL LETTER D
'E' 69  0x45    LATIN CAPITAL LETTER E
'F' 70  0x46    LATIN CAPITAL LETTER F
'G' 71  0x47    LATIN CAPITAL LETTER G
'H' 72  0x48    LATIN CAPITAL LETTER H
'I' 73  0x49    LATIN CAPITAL LETTER I
'J' 74  0x4a    LATIN CAPITAL LETTER J
'K' 75  0x4b    LATIN CAPITAL LETTER K
'L' 76  0x4c    LATIN CAPITAL LETTER L
'M' 77  0x4d    LATIN CAPITAL LETTER M
'N' 78  0x4e    LATIN CAPITAL LETTER N
'O' 79  0x4f    LATIN CAPITAL LETTER O
'P' 80  0x50    LATIN CAPITAL LETTER P
'Q' 81  0x51    LATIN CAPITAL LETTER Q
'R' 82  0x52    LATIN CAPITAL LETTER R
'S' 83  0x53    LATIN CAPITAL LETTER S
'T' 84  0x54    LATIN CAPITAL LETTER T
'U' 85  0x55    LATIN CAPITAL LETTER U
'V' 86  0x56    LATIN CAPITAL LETTER V
'W' 87  0x57    LATIN CAPITAL LETTER W
'X' 88  0x58    LATIN CAPITAL LETTER X
'Z' 90  0x5a    LATIN CAPITAL LETTER Z
'`' 96  0x60    GRAVE ACCENT
'a' 97  0x61    LATIN SMALL LETTER A
'b' 98  0x62    LATIN SMALL LETTER B
'c' 99  0x63    LATIN SMALL LETTER C
'd' 100 0x64    LATIN SMALL LETTER D
'e' 101 0x65    LATIN SMALL LETTER E
'f' 102 0x66    LATIN SMALL LETTER F
'g' 103 0x67    LATIN SMALL LETTER G
'h' 104 0x68    LATIN SMALL LETTER H
'i' 105 0x69    LATIN SMALL LETTER I
'j' 106 0x6a    LATIN SMALL LETTER J
'k' 107 0x6b    LATIN SMALL LETTER K
'l' 108 0x6c    LATIN SMALL LETTER L
'm' 109 0x6d    LATIN SMALL LETTER M
'n' 110 0x6e    LATIN SMALL LETTER N
'o' 111 0x6f    LATIN SMALL LETTER O
'p' 112 0x70    LATIN SMALL LETTER P
'q' 113 0x71    LATIN SMALL LETTER Q
'r' 114 0x72    LATIN SMALL LETTER R
's' 115 0x73    LATIN SMALL LETTER S
't' 116 0x74    LATIN SMALL LETTER T
'u' 117 0x75    LATIN SMALL LETTER U
'v' 118 0x76    LATIN SMALL LETTER V
'w' 119 0x77    LATIN SMALL LETTER W
'x' 120 0x78    LATIN SMALL LETTER X
'y' 121 0x79    LATIN SMALL LETTER Y
'z' 122 0x7a    LATIN SMALL LETTER Z
'{' 123 0x7b    LEFT CURLY BRACKET
'}' 125 0x7d    RIGHT CURLY BRACKET
' '    160 0xa0    NO-BREAK SPACE
'§'    167 0xa7    SECTION SIGN
'©'    169 0xa9    COPYRIGHT SIGN
'È'    200 0xc8    LATIN CAPITAL LETTER E WITH GRAVE
'à'    224 0xe0    LATIN SMALL LETTER A WITH GRAVE
'á'    225 0xe1    LATIN SMALL LETTER A WITH ACUTE
'è'    232 0xe8    LATIN SMALL LETTER E WITH GRAVE
'é'    233 0xe9    LATIN SMALL LETTER E WITH ACUTE
'ì'    236 0xec    LATIN SMALL LETTER I WITH GRAVE
'ñ'    241 0xf1    LATIN SMALL LETTER N WITH TILDE
'ò'    242 0xf2    LATIN SMALL LETTER O WITH GRAVE
'ù'    249 0xf9    LATIN SMALL LETTER U WITH GRAVE
'–'   8211    0x2013  EN DASH
'—'   8212    0x2014  EM DASH
'’'   8217    0x2019  RIGHT SINGLE QUOTATION MARK
'“'   8220    0x201c  LEFT DOUBLE QUOTATION MARK
'”'   8221    0x201d  RIGHT DOUBLE QUOTATION MARK
'…'   8230    0x2026  HORIZONTAL ELLIPSIS

Note that, in this case, if we had used an online tool like FontSquirrel and selected only the Latin subset, we might have failed to remember to include the 0x20?? characters.

(Please read the Technical Notes to know the details of how glyphIgo computes the eBook character set.)

Subsetting using Unicode ranges (-p switch)

OK, but what if you need to subset a font and you know exactly what characters you want to keep? In other words, you want to specify a font and one or more Unicode ranges, and get the corresponding subset into a new font.

With glyphIgo the process is simple: just create a UTF-8, plain text file, containing all the characters you need, and use the -p switch instead of -e.

For example, if you just want the union of 0-9 and A-F ranges, you will write a file 09AF.set containing:

$ echo -n "0123456789ABCDEF" > 09AF.set

$ cat 09AF.set
0123456789ABCDEF

Then, you will specify it using the -p parameter:

$ python glyphIgo.py subset -f font.otf -p 09AF.set -o minimized.otf

[INFO] Subsetting font 'font.otf' with ebook '09AF.set' into new font 'minimized.otf', containing the following glyphs:
'0'     48      0x30    DIGIT ZERO
'1'     49      0x31    DIGIT ONE
'2'     50      0x32    DIGIT TWO
'3'     51      0x33    DIGIT THREE
'4'     52      0x34    DIGIT FOUR
'5'     53      0x35    DIGIT FIVE
'6'     54      0x36    DIGIT SIX
'7'     55      0x37    DIGIT SEVEN
'8'     56      0x38    DIGIT EIGHT
'9'     57      0x39    DIGIT NINE
'A'     65      0x41    LATIN CAPITAL LETTER A
'B'     66      0x42    LATIN CAPITAL LETTER B
'C'     67      0x43    LATIN CAPITAL LETTER C
'D'     68      0x44    LATIN CAPITAL LETTER D
'E'     69      0x45    LATIN CAPITAL LETTER E
'F'     70      0x46    LATIN CAPITAL LETTER F

Using the list command, we can check that the new font minimized.otf contains indeed:

$ python glyphIgo.py list -f minimized.otf

[INFO] Glyphs in 'minimized.otf':
'0'     48      0x30    DIGIT ZERO
'1'     49      0x31    DIGIT ONE
'2'     50      0x32    DIGIT TWO
'3'     51      0x33    DIGIT THREE
'4'     52      0x34    DIGIT FOUR
'5'     53      0x35    DIGIT FIVE
'6'     54      0x36    DIGIT SIX
'7'     55      0x37    DIGIT SEVEN
'8'     56      0x38    DIGIT EIGHT
'9'     57      0x39    DIGIT NINE
'A'     65      0x41    LATIN CAPITAL LETTER A
'B'     66      0x42    LATIN CAPITAL LETTER B
'C'     67      0x43    LATIN CAPITAL LETTER C
'D'     68      0x44    LATIN CAPITAL LETTER D
'E'     69      0x45    LATIN CAPITAL LETTER E
'F'     70      0x46    LATIN CAPITAL LETTER F

Clearly, if you need to subset several fonts, or several/complex character sets, you can generate (and store) your own range files:

$ for ((i=0x0000; i<=0x007F; i++)); do printf '\u'`printf %04x $i`; done > BasicLatin.set
$ for ((i=0x0400; i<=0x04FF; i++)); do printf '\u'`printf %04x $i`; done > Cyrillic.set
$ for ((i=0x16A0; i<=0x16FF; i++)); do printf '\u'`printf %04x $i`; done > Runic.set
$ printf '\u2026' > HorizontalEllipsis.set

$ cat Runic.set
ᚠᚡᚢᚣᚤᚥᚦᚧᚨᚩᚪᚫᚬᚭᚮᚯᚰᚱᚲᚳᚴᚵᚶᚷᚸᚹᚺᚻᚼᚽᚾᚿᛀᛁᛂᛃᛄᛅᛆᛇᛈᛉᛊᛋᛌᛍᛎᛏᛐᛑᛒᛓᛔᛕᛖᛗᛘᛙᛚᛛᛜᛝᛞᛟᛠᛡᛢᛣᛤᛥᛦᛧᛨᛩᛪ᛫᛬᛭ᛮᛯᛰᛱᛲᛳᛴᛵᛶᛷᛸ᛹᛺᛻᛼᛽᛾᛿
 
$ cat HorizontalEllipsis.set
…

$ cat BasicLatin.set Cyrillic.set Runic.set HorizontalEllipsis.set > my.set

$ python glyphIgo.py subset -f font.otf -p my.set -o minimized.otf

[INFO] Subsetting font 'font.otf' with ebook 'my.set' into new font 'minimized.otf', containing the following glyphs:
' '     32      0x20    SPACE
'!'     33      0x21    EXCLAMATION MARK
'"'     34      0x22    QUOTATION MARK
...
'ӵ'     1269    0x4f5   CYRILLIC SMALL LETTER CHE WITH DIAERESIS
'Ӹ'     1272    0x4f8   CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS
'ӹ'     1273    0x4f9   CYRILLIC SMALL LETTER YERU WITH DIAERESIS
'…'     8230    0x2026  HORIZONTAL ELLIPSIS

In this example:

  1. the Basic Latin Unicode block includes the codepoints 0x0000..0x007F, the Cyrillic block 0x0400..0x04FF, and the Runic block 0x16A0..0x16FF. If you need the hexadecimal range of a particular Unicode block, you can find them in this file;
  2. font.otf does not contain Runic glyphs, and hence also minimized.otf does not (glyphIgo cannot create glyphs that are not present in the original font!);
  3. you can also add single characters, like the horizontal ellipsis (0x2026).

Happy font subsetting with glyphIgo!