manpagez: man pages & more
html files: harfbuzz
Home | html | info | man


In shaping text, a cluster is a sequence of code points that needs to be treated as a single, indivisible unit.

When you add text to a HB buffer, each character is associated with a cluster value. This is an arbitrary number as far as HB is concerned.

Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the actual number does not matter. Moreover, it is not required for the cluster values to be monotonically increasing, but pretty much all of HB's tests are performed on monotonically increasing cluster numbers. Nevertheless, there is no such assumption in the code itself. With that in mind, let's examine what happens with cluster values during shaping under each cluster-level.

HarfBuzz provides three levels of clustering support. Level 0 is the default behavior and reproduces the behavior of the old HarfBuzz library. Level 1 tweaks this behavior slightly to produce better results, so level 1 clustering is recommended for code that is not required to implement backward compatibility with the old HarfBuzz.

Level 2 differs significantly in how it treats cluster values. Levels 0 and 1 both process ligatures and glyph decomposition by merging clusters; level 2 does not.

The conceptual model for what the cluster values mean, in levels 0 and 1, is this:

  • the sequence of cluster values will always remain monotone

  • each value represents a single cluster

  • each cluster contains one or more glyphs and one or more characters

Assuming that initial cluster numbers were monotonically increasing and distinct, then all adjacent glyphs having the same cluster number belong to the same cluster, and all characters belong to the cluster that has the highest number not larger than their initial cluster number. This will become clearer with an example.

© 2000-2018
Individual documents may contain additional copyright information.