Hi Pandion,

Just read the post mentioned in the title, and wanted to give some

feedback. :-)

Firstly, yes, as per your "Concerns" section, using Unicode

graphemes for emphasis plays havoc with screenreaders. Quoting the

"Fonts/Background" page i recently added to the Gentoo wiki:

Note that using Unicode to create 'bold', 'italic', etc. effects
doesn't actually involve the use of different fonts; instead, it
involves the use of particular codepoints with specific
stylistic representations within a single font. For example,
'bolding' the word 'Gentoo' by using the Unicode MATHEMATICAL
BOLD codepoints to write '𝐆𝐞𝐧𝐭𝐨𝐨' results in people using
screenreaders not hearing the word 'Gentoo' read out, but
instead "MATHEMATICAL BOLD CAPITAL G MATHEMATICAL BOLD SMALL E
MATHEMATICAL BOLD SMALL N ..."

-- https://wiki.gentoo.org/wiki/Fonts/Background

In other words, it's not "text" in the sense that screenreaders

understand it (not to mention using Unicode codepoints in ways

they definitely weren't designed for). So for accessibility

reasons - which, personally, i think is one of the advantages of

the text-oriented nature of Geminispace - i strongly discourage

people using Unicode to achieve stylistic effects.

Secondly, i write "Unicode graphemes" rather than "UTF-8 symbols",

because UTF-8 is merely one possible encoding / bit-level

representation of a "Unicode grapheme". Again quoting the

"Fonts/Background" page:

The term character typically refers to what is better described
as a grapheme - the smallest 'unit' within a writing system
...
A glyph is a particular stylistic representation of a grapheme -
for example, the glyph for a serif version of the 'a' grapheme
can be different from a sans-serif version of that same
grapheme. A font within a typeface family provides a collection
of similarly-styled glyphs, each representing a particular
grapheme.

Unicode is a collection of graphemes, such as LATIN SMALL LETTER

A, LATIN CAPITAL LETTER B, and so on. UTF-8 is one way to

designate each of those graphemes in hardware; another widely-used

way is UTF-16, which is the default used by Windows and Java. But

in both cases, they're a way of referring to a particular grapheme

in Unicode. In other words, 'a' on my Linux laptop is encoded as

UTF-8, and as UTF-16 on Windows machines, but it's still the same

character / grapheme / symbol in both cases.

Myself, i use underscores to designate emphasis in Gemtext, _like

this_, and the Gemtext-to-HTML code i wrote to generate the HTML

version of my Gemini capsule translates that to "<em>like

this</em>". i had originally used forward slashes, '/', but i

didn't like how much that clashed with Unix-style directory paths.

Alexis.

❇️ ❇️ ❇️

CC BY-NC-ND 4.0 flexibeast@gmail.com 2024

🧭 Site navigation

🔙 Back to my posts
⏪ Back to the main page