November 20, 2019

Editorial: A Brief Essay on Lexical Ambiguity by G. Irwin

The original commentary was posted in Red's Gitter channel, here, by Gregg Irwin, one of our core team members, in response to various requests for the ability to create new datatypes in Red.

As a writer, Red has always appealed to me because of its flexibility; but, of course, "the [lexicon] devil is in the details," as the idiom goes (okay, I edited that idiom a little, but it was too cool a link to pass up). It means the more specific we try to be, the more challenges and limitations we encounter, and we can lose some of the amazing versatility of the language. On the other hand, precision and refinement--the "exact right word at the exact right time," can powerfully enhance a language's utility. The dynamic tension between what he calls "generality and specificity, human friendliness and artifice," in the text below, can be an energetic ebb and flow that serves to strengthen our language, to make it more robust. 

Two quotes from community members provide some context:

> The real problem is not number of datatypes, but the lexical syntax of the new ones. -@Oldes

> ...However if something like utype! is added, nothing prevents you from (ab)using system/lexer/pre-load and reinventing whole syntax. -@Rebolek

"I don't support abusing system/lexer/pre-load, and (in the long view) there will almost certainly be special cases where a new lexical form makes sense. We can't see the future, so we can't rule it out. But, and this is key, how much value does each new one add?

I believe that each new lexical form adds less value, and there is a point of diminishing returns. This is not just a lexical problem for Red, but for humans. We have limited capacity to remember rules, and a constrained hierarchy helps enormously here. Think more like linguists, and less like programmers or mathematicians.

In language we have words and numbers. Numbers can be represented as words, with their notation being a handy shortcut for use in the domain of mathematics. And while we classify nouns, verbs, and adjectives by their use, they are all words, and don't have syntax specific to their particular part of speech. That's important because a single word may be used in more than one context, for more than one purpose.

This is interesting, as a tangent, because human language can be ambiguous, though some synthetic languages try to eliminate that (e.g. Lojban). The funny thing is that it's almost impossible to write poetry or tell jokes in Lojban. Nobody┬║ speaks Lojban. This ties to programming because, while we all know the strengths and value of strict typing, and even more extreme features and designs meant to promote correctness, dynamic languages are used more at higher levels [such as poetry, songwriting and humor, where even the sounds used in one single word can be employed to evoke specific emotive responses in the listener--the effects of devices like assonance, consonance, and loose associations we make with even single letters, in the way a repeated letter R throughout a line of poetry or literature can subtly impart a sense of momentum and intensity to the text...possibly because it evokes a growl... -Ed.]. Why is that? Humans.

When Carl designed Rebol, it had a goal, and a place in time. He had to choose just how far to go. Even what to call things like email!, which are very specific to a particular type of technology. This is what gives Redbol langs so much of their power. They were designed as a data format, meant for exchanging information. That's the core. What are the "things" we need to exchange information about with other humans, not just other programmers?

Do I want new types? I'm pushing for at least one: Ref! with an @some-name-here syntax. It's not username! or filename+line-number!, or specific in any way. It's very general, as lexical types should be; their use and meaning being context-specific (the R in Redbol, which stands for "relative"). I also think ~ could be a leading numeric sigil to denote approximation. It came mainly from wanting a syntax for floats, to make it clear that they are imprecise; but it's tricky, because it could also be much richer, and has to take variables into account. ~.1 is easy, but what about x = ~n+/-5%? Units are also high value, but they are just a combination of words and numbers. (Still maybe worth a lexical form.)

When we look at what Red should support, and the best way to let users fulfill application and purpose-specific needs, we can learn from the past, and also see that there is no single right answer. Structs, Maps, Objects, data structures and functions versus OOP, strict vs dynamic.

As Forth was all about "Build a vocabulary and write your program in that," think about what constitutes a vocabulary; a lexicon. It's a balance, in Red, between generality and specificity, human friendliness and artifice. So when we ask for things, myself and Nenad included, we should first try to answer our need with what is in Red today, and see where our proposed solution falls on the line of diminishing returns. To this end, we can and should abuse system/lexer/pre-load for experimentation."
Fork me on GitHub