diff --git a/scratch/unicode.erl b/scratch/unicode.erl new file mode 100644 index 0000000..bdcf348 --- /dev/null +++ b/scratch/unicode.erl @@ -0,0 +1,52 @@ +% @doc +%
+% T R O N A L D D U M P +% +% .-""""""""""""-. +% .-' _..------.._ '-. +% .' .' GOLDEN NFC '. '. +% / / COMB-OVER MAP \ \ +% ; ; .-^^^^^^^^^^-. ; ; +% | | / THEY'RE \ | | +% | | | NOT SENDING | | | +% | | | ASCII | | | +% ; ; \_.--. .--._./ ; ; +% \ \ (o)(o) / / +% '. '. __ .' .' +% '-._ '._==_.' _.-' +% '-._____.-' +% /|||\ +% / ||| \ +% / ||| \ +% .-------' ||| '-------. +% / THE BEST NORMALIZER \ +% / VERY STABLE CODEPOINTS \ +% /_________________________________\ +%+% +% When unicode sends its codepoints, they're not +% sending their best. They're not sending ASCII. +% They're not sending ASCII. They're sending integers +% that have lots of problems, and they're bringing +% those problems with us. They're bringing diacritics. +% They're bringing non-idempotent lowercasing. They're +% bringing graphemes that don't correspond bijectively +% with printable characters. They're bringing RTL. +% They're bringing invisible characters. They're +% bringing characters that draw outside the character +% boundary. They're bringing variable-width +% whitespace. They're bringing control characters. +% They're bringing emojis. +% +% And some, I assume, are good characters. +% +% `SrcStr' is a unicode NFC list, not an ordinary +% string. you think a string is a list of codepoints. +% +% NOOOOO. +% +% See it's different, because that's why. +% +% This is the cost of diversity, folks. +% @end +