Hi Pandion,
Just read the post mentioned in the title, and wanted to give some
feedback. :-)
Firstly, yes, as per your "Concerns" section, using Unicode
graphemes for emphasis plays havoc with screenreaders. Quoting the
"Fonts/Background" page i recently added to the Gentoo wiki:
Note that using Unicode to create 'bold', 'italic', etc. effects
doesn't actually involve the use of different fonts; instead, it
involves the use of particular codepoints with specific
stylistic representations within a single font. For example,
'bolding' the word 'Gentoo' by using the Unicode MATHEMATICAL
BOLD codepoints to write '𝐆𝐞𝐧𝐭𝐨𝐨' results in people using
screenreaders not hearing the word 'Gentoo' read out, but
instead "MATHEMATICAL BOLD CAPITAL G MATHEMATICAL BOLD SMALL E
MATHEMATICAL BOLD SMALL N ..."
-- https://wiki.gentoo.org/wiki/Fonts/Background
In other words, it's not "text" in the sense that screenreaders
understand it (not to mention using Unicode codepoints in ways
they definitely weren't designed for). So for accessibility
reasons - which, personally, i think is one of the advantages of
the text-oriented nature of Geminispace - i strongly discourage
people using Unicode to achieve stylistic effects.
Secondly, i write "Unicode graphemes" rather than "UTF-8 symbols",
because UTF-8 is merely one possible encoding / bit-level
representation of a "Unicode grapheme". Again quoting the
"Fonts/Background" page:
The term character typically refers to what is better described
as a grapheme - the smallest 'unit' within a writing system
...
A glyph is a particular stylistic representation of a grapheme -
for example, the glyph for a serif version of the 'a' grapheme
can be different from a sans-serif version of that same
grapheme. A font within a typeface family provides a collection
of similarly-styled glyphs, each representing a particular
grapheme.
Unicode is a collection of graphemes, such as LATIN SMALL LETTER
A, LATIN CAPITAL LETTER B, and so on. UTF-8 is one way to
designate each of those graphemes in hardware; another widely-used
way is UTF-16, which is the default used by Windows and Java. But
in both cases, they're a way of referring to a particular grapheme
in Unicode. In other words, 'a' on my Linux laptop is encoded as
UTF-8, and as UTF-16 on Windows machines, but it's still the same
character / grapheme / symbol in both cases.
Myself, i use underscores to designate emphasis in Gemtext, _like
this_, and the Gemtext-to-HTML code i wrote to generate the HTML
version of my Gemini capsule translates that to "<em>like
this</em>". i had originally used forward slashes, '/', but i
didn't like how much that clashed with Unix-style directory paths.
Alexis.
❇️ ❇️ ❇️
CC BY-NC-ND 4.0 flexibeast@gmail.com 2024