Japanese Characters and Set Theory

Presentation

Set theory is universal in the sense that the elements of sets can be any objects. In mathematics, these objects are typically numbers, but they can also be people, buildings, or trams. In linguistics, the primary focus is on graphemes. It’s intriguing to consider the graphemes of Japanese characters.

Most writing systems are phonetic and categorizable into three main types: alphabetic (like the Latin script), consonantal (like the Arabic script), and syllabic (like Hiragana and Devanagari). The graphemes in these systems can be represented as two-dimensional vectors that include visual and phonetic components. For example, a letter from the Indian script Devanagari can be represented as त, pronounced “ta.”

Hieroglyphic characters are found not only in Chinese and Japanese writing but also in Egyptian, Mayan, and other writing systems. However, this discussion will focus solely on Japanese characters. The difference between a hieroglyphic character and a phonetic sign is that the former has a third component—semantic! This means it represents a three-dimensional vector that has a written form, a reading, and a meaning. A Japanese hieroglyph can have several readings and meanings. For example, (日, jitsu, ‘day’), (日, nichi, ‘sun’).

Now let’s return to set theory. Just as algebraic operations of addition, subtraction, multiplication, and division are introduced for numbers, there are also algebraic operations for the elements of a set—union, intersection, difference, symmetric difference, and complement. In the union operation (U), all elements are included, while in the intersection (∩), only the common elements are considered. To find the difference of sets \, we remove from the first set any elements that are also in the second set. The symmetric difference (Δ) represents the difference between the union and the intersection. In this case, the union is analogous to addition, while the intersection corresponds to multiplication. For example, let A = {1, 3, 4} and B = {1, 4, 5}. Then AUB = {1, 3, 4, 5}, A∩B = {1, 4}, A \ B = {3}, AΔB = {3, 5}.

If we consider a character as a set of its readings or meanings, all set theory operations can easily apply to it, except the complement operation. What does the complement of an element mean? It refers to the set of all elements except for the given one, within some universal set defined based on context. For example, the complement of the Russian letter {ш} in the context of modern Russian Cyrillic would include all Cyrillic letters except {ш}. A reading of a Japanese character can be a syllable, a morpheme, or a whole word, so the concept of a universal set doesn’t apply here. This holds even more true for the semantics of characters.

Let’s look at examples of applying set theory to the readings and meanings of a character, selecting instances where the reading and meaning partially overlap. The subscript y indicates the phonetic component, while z indicates the semantic component.

文_y = {mon, fun, fumi}, 門_y = {mon, kado}.

(文U門)_y = {mon, fun, kado, fumi}, (文∩門)_y = {mon},

(文 \ 門)_y = {fun, fumi}, (文Δ門)_y = {fun, kado, fumi}.

The meaning of the character 文 is “text, writing, literature,” and 門 means “gate.”

市_z = {city, market}, 町_z = {city, street}.

(市U町)_z = {city, market, street}, (市∩町)_z = {city},

(市 \ 町)_z = {market}, (市Δ町)_z = {market, street}.

Let’s take another look at the characters’ visual components. Theoretically, a character can also be represented as a set of strokes. Most characters include vertical or horizontal strokes, but since they appear in a wide variety of combinations, discussing their union or intersection—applying set theory operations—makes little sense.

So it’s better to consider characters as a set of radicals (each character has a defining part called a radical, totaling 214 radicals) or other significant elements. Yet even then, it’s generally impossible to apply any set-theory operations to the characters themselves. That’s because the set resulting from the union, intersection, or difference of two characters typically won’t be a real character at all! (When uniting or intersecting phonetic and semantic components, we’re still within the realms of phonetics and semantics.) Exceptions are quite rare but do exist. For instance, uniting the characters for “sun” and “moon” forms a character with the meaning “bright”: (日U月)_x = 明_x. In this case, the union operation clearly isn’t symmetric, as no character corresponds to the set (月U日)_x!

Now let’s consider a set of characters that share the same radical. The radical itself must also be a character; otherwise, it can’t be considered an element of the set. For example, radical No. 149 言 “to speak” is a character, while radical No. 40 (“lid”) as the upper part of 安 “calm, cheap” is not. Additionally, a radical can have different representations (allographs), which we will treat as identical in this model. For instance, radical No. 61 心 “heart” can appear in both standard (忘 “to forget”) and modified forms (性 “nature, gender” — the left radical).

It might seem that if a radical is a character, then for a set of characters sharing the same radical, we could introduce the intersection operation: (抱∩押)_x = 手_x (抱 “to embrace, to hold,” 押 “to press, to push,” 手 “hand”). However, this is not generally the case: characters with the same radical can have common elements beyond the radical itself, for example: (姉∩婦)_x ≠ 女_x (姉 “older sister,” 婦 “lady,” 女 “woman”).

Let’s consider an arbitrary character H and denote its radical by K, treating K as a subset of H. Then the intersection and union operations (H ∩ K)_x, (H U K)_x are defined if K is a character. It is evident that (H ∩ K)_x = K_x; (H U K)_x = H_x. The intersection operation can be extended to any number of characters sharing the same radical: (H1 ∩ H2 ∩ H3 ∩ … ∩ Hn ∩ K)_x = K_x.

As an example, let’s take the radical ‘tree’ 木 and several characters with this radical: 村 “village,” 桜 “sakura,” 柱 “pillar, column,” 机 “table.” Then (木∩柱)_x = (村∩桜)_x = 木_x; (木U 村)_x = 村_x; (木U机)_x = 机_x.

To determine the difference of sets, we need not only a pair (H, K) but also an existing character (H / K)_x. Clearly, in general, no such character exists, and the operation of set difference can only be introduced in a limited number of cases. For example, consider 言 “to speak,” 計 “to calculate,” 訪 “to visit,” 訓 “instruction, kun-reading.” We can then define their difference: (計 \ 言)_x = 十, (訓 \ 訪)_x = 川. The symmetric difference of two sets is even rarer: for example, (木Δ村)_x = 寸_x; (計Δ言)_x = 十_x.

Apraksin Blues