November 20, 2019

Editorial: A Brief Essay on Lexical Ambiguity by G. Irwin

The original commentary was posted in Red's Gitter channel, here, by Gregg Irwin, one of our core team members, in response to various requests for the ability to create new datatypes in Red.

As a writer, Red has always appealed to me because of its flexibility; but, of course, "the [lexicon] devil is in the details," as the idiom goes (okay, I edited that idiom a little, but it was too cool a link to pass up). It means the more specific we try to be, the more challenges and limitations we encounter, and we can lose some of the amazing versatility of the language. On the other hand, precision and refinement--the "exact right word at the exact right time," can powerfully enhance a language's utility. The dynamic tension between what he calls "generality and specificity, human friendliness and artifice," in the text below, can be an energetic ebb and flow that serves to strengthen our language, to make it more robust. 

Two quotes from community members provide some context:
_____________________________

> The real problem is not number of datatypes, but the lexical syntax of the new ones. -@Oldes

> ...However if something like utype! is added, nothing prevents you from (ab)using system/lexer/pre-load and reinventing whole syntax. -@Rebolek


"I don't support abusing system/lexer/pre-load, and (in the long view) there will almost certainly be special cases where a new lexical form makes sense. We can't see the future, so we can't rule it out. But, and this is key, how much value does each new one add?

I believe that each new lexical form adds less value, and there is a point of diminishing returns. This is not just a lexical problem for Red, but for humans. We have limited capacity to remember rules, and a constrained hierarchy helps enormously here. Think more like linguists, and less like programmers or mathematicians.

In language we have words and numbers. Numbers can be represented as words, with their notation being a handy shortcut for use in the domain of mathematics. And while we classify nouns, verbs, and adjectives by their use, they are all words, and don't have syntax specific to their particular part of speech. That's important because a single word may be used in more than one context, for more than one purpose.

This is interesting, as a tangent, because human language can be ambiguous, though some synthetic languages try to eliminate that (e.g. Lojban). The funny thing is that it's almost impossible to write poetry or tell jokes in Lojban. Nobodyº speaks Lojban. This ties to programming because, while we all know the strengths and value of strict typing, and even more extreme features and designs meant to promote correctness, dynamic languages are used more at higher levels [such as poetry, songwriting and humor, where even the sounds used in one single word can be employed to evoke specific emotive responses in the listener--the effects of devices like assonance, consonance, and loose associations we make with even single letters, in the way a repeated letter R throughout a line of poetry or literature can subtly impart a sense of momentum and intensity to the text...possibly because it evokes a growl... -Ed.]. Why is that? Humans.

When Carl designed Rebol, it had a goal, and a place in time. He had to choose just how far to go. Even what to call things like email!, which are very specific to a particular type of technology. This is what gives Redbol langs so much of their power. They were designed as a data format, meant for exchanging information. That's the core. What are the "things" we need to exchange information about with other humans, not just other programmers?

Do I want new types? I'm pushing for at least one: Ref! with an @some-name-here syntax. It's not username! or filename+line-number!, or specific in any way. It's very general, as lexical types should be; their use and meaning being context-specific (the R in Redbol, which stands for "relative"). I also think ~ could be a leading numeric sigil to denote approximation. It came mainly from wanting a syntax for floats, to make it clear that they are imprecise; but it's tricky, because it could also be much richer, and has to take variables into account. ~.1 is easy, but what about x = ~n+/-5%? Units are also high value, but they are just a combination of words and numbers. (Still maybe worth a lexical form.)

When we look at what Red should support, and the best way to let users fulfill application and purpose-specific needs, we can learn from the past, and also see that there is no single right answer. Structs, Maps, Objects, data structures and functions versus OOP, strict vs dynamic.

As Forth was all about "Build a vocabulary and write your program in that," think about what constitutes a vocabulary; a lexicon. It's a balance, in Red, between generality and specificity, human friendliness and artifice. So when we ask for things, myself and Nenad included, we should first try to answer our need with what is in Red today, and see where our proposed solution falls on the line of diminishing returns. To this end, we can and should abuse system/lexer/pre-load for experimentation."

3 comments:

  1. I get the feeling that Greg thinks that lots of people would use the same set of features (types). What if the "Core Red" would contain most use-full types. However you could download (a module?) and use a new type easily. It's not like everyone needs every type that there is in the language.
    https://github.com/red/red/wiki/%5BPROP%5D-User-defined-types-(UDT)-and-dependent-types

    ReplyDelete
    Replies
    1. It's easy enough to create modules that have their own user level types (think `object!`), but there are two key considerations we can start with, when it comes to defining new datatypes:

      1) Runtime support. As soon as you build support in for a custom datatype, your runtime is incompatible with everyone else's. If you're only building standalone apps, and never use a shared runtime, it's all on you, and there may be value there.

      2) If you add a lexical form for a new datatype, now your Red *data* is incompatible with everyone else's, likely including tooling. Plus, you're then modifying Red's lexer. For that to be a good use of your time, you need to get a LOT of value out of those new types, and being incompatible with the world.

      Delete
    2. 2) Does it really have to be incompatible? I mean, sure, some "mutation testing engine" would know about integers but not about "my type". However the Red nor a "mutation testing engine" shouldn't break (a syntax error). It should be easy enough to extend a "mutation testing engine".

      And person doesn't need to get a LOT of value - it depends HOW easily is to do it. Building simple program based on OOP is easy but it doesn't mean non-OOP way is less good.

      Delete

Fork me on GitHub