March 20, 2020

GTK, fast lexer, money, deep testing, and our first commercial product

It's been a busy start to the year for the team. Work has continued on many fronts, and we have some new team members helping us to keep the momentum going. We announced in January that GTK and the Fast Lexer were close, and they are even closer now. The hard part about making announcements is that some of this work is unpredictable and changes in scope after we do. Or the world steps in and a pandemic throws a wrench into your plans.

@bitbegin has done an enormous amount of work on the GTK branch. You can check out the GTK branch to see some of what goes into supporting a new GUI system. The more features we include in Red, the more that have to be ported and maintained. Unfortunately, most operating systems and UI systems have large, complicated sets of APIs and interactions.

Because GUI systems are so complex, and Red not only has to handle them, but also adds its own reactive framework, there are more places for bugs to hide. And users are involved, which is the worst part. They do nothing but cause problems. To that end, in addition to his normal deep diving and bug hunting, @hiiamboris has been working on an automated view test system, which is no small feat. @9214 has joined him on the hunt, and we will squash a good number of bugs for our next release.

The fast lexer was in near-final testing when we decided that it was worth delaying its merge in order to incorporate some new lexical forms that we had planned to include. Then we looked at some old tickets related to modulo and division operators, and a couple lexing questions came up related to tag!. Suddenly the fast lexer work was back in code mode. New lexical forms usually means new datatypes, and that's the case here.

New Datatypes

One we've expected for some time, and thanks to @9214 it's now a reality. Money! is coming. There is a branch for it, but no need to comment at this time. There are a few features, like round still to be completed, but the bulk of the work is done. @BeardPower did some great experimental work that we thought might be used for money, based on Douglas Crockford's Dec64 design, but until Red is fully 64-bit it was only Dec32, and the limited range was determined not to be enough. That work won't go to waste though, it's just waiting for its moment in the sun. The current version of money! is a BCD implementation, but that shouldn't concern anyone outside the core team. What you care about is that it can be used for accurate financial calculations that don't suffer from floating point errors. It will also support an optional currency identifier, e.g. USD or EUR, and automatic group separators when formed.

Another new type is called ref! and is still being designed. The basic concept is simple: @reference is a form most people are familiar with today, just as hashtags are known (though we use the historical name issue!, because the new name hadn't become part of our global lexicon back then). Issues, in Rebol 2, were a string type, but Red made them a word type instead. That has benefits (mainly efficiency), but also costs (symbol table space and lexical limitations). For instance, in R2 you could say #abc:123, but not so in Red. Life is compromise. Ref! will be a string type, making it quite flexible. While we most often think of them as referring to a person, they can refer to, for example, a location in a file. You can do that with strings too, of course, but that's the beauty of rich datatypes. By using a ref!, you can build rich dialects and make your intent clear in the data itself.

Finally, Red is going to add a new raw-string! form. It's a combination of the raw string literals some languages support, and heredoc. The goal is to make it easier to include content that would otherwise require escape sequences that sometimes clutter inline data and lead to errors. A lot of time and effort went into the (sometimes heated) discussion around the need, use cases, and syntax. Right now this is just a new literal form, rather than a whole new datatype. They will still be strings when loaded, until we see how they are used by others and if they deserve to be a separate type.

We don't add new datatypes lightly, and design choices have to keep the big picture in mind as well. Balancing the value of new types against their added complexity in the language is hard work, but satisfying if it makes everyone's life better. To that end, when these new types and features become available, we need your input on how they work, where you use them, and what gaps need to be filled.

Both the fast lexer and money datatype leverage some new features in Red/System, which we'll talk about in a future post.

All Aboard!

We're also very excited to announce the imminent release of our first commercial product. We alluded to it at the beginning of the year, and it's almost ready to leave the station. There will be plenty of train-related puns, because it's a Railroad syntax diagram generator. Here's what it looks like:
More details will come soon, but here's a short list of some of it's features:

  • Live coded diagramming. You write your grammar and the diagram appears in real time.
  • Red parse support (of course), but also support for foreign grammars like ABNF, McKeeman, and YACC. That's right, you can write using those grammars and get the same diagram. 
  • When writing parse rules, you can use block level parsing, rather than character level.
  • Test inputs. Not only can you test a single input, but also entire directories of test files.
  • Test specific rules. Not only can you test from your top-level rule, but any rule in your grammar. You can also find where a particular rule matches part of your input.
  • Custom diagram styles, including options for a cleaner, more abstract view, different charset rendering options, and whether actions (parens) are displayed.
  • The ability to generate inputs that your grammar will recognize. This opens up many other use cases, including educational ones. Here's one of my favorites, that @toomasv created:

Look at the sentences in the input area. Those were created by clicking the generate button. It's like parsing in reverse. So if you're not sure what inputs your grammar might recognize, or want to see examples, this lets you view things in a whole new way.

Stay Tuned

It's an exciting time to be a Reducer, and we're rolling with changes just like everyone else right now. The best way to stay healthy during this pandemic is to stay away from other people and take the Red pill.

Until next time.

January 1, 2020

Happy New Year!

Hello and happy new year, friends of Red! We have some exciting projects we’ve been working on that will be available this year, including a new product. Let’s talk a little about what the team has been working on behind the scenes.

(TL;DR: A cool new product with Red in 2020...plus, a robust preliminary draft of Parse documentation can now be previewed...CLI library...fast-lexer to merge soon...GTK on the horizon...and a new native OS calendar widget!)


December 4, 2019

November 2019 in Review

Welcome to December, friends of Red. It's an intent and focused time of year as we wind down 2019, and the core team is making important moves (sometimes literally!) to set us up for an ambitious 2020. But first, here are just a few things that happened in November.
  • First and foremost, it's always great when community members help compile resources for use by others, and we'd like to acknowledge @rebolek for his excellent compendium of historic automatic builds: https://rebolek.com/builds/ (they weren't available for a hot minute, but now they're back). They can be useful if you're in need of a previous version for a specific project. Of course, you can always go here for Red's daily automated builds. But seeing as how we've a goal of being a self-sustaining, self-selecting group of do-ers, this spirit of providing collective resources is perfectly aligned with the Red-Lang we always want to be. 
  • From the community, to outreach: thanks to community member @loziniak, Red is now on Stackshare, so be sure to follow us there and chime in. As a repo with 4.1k GitHub stars (and infinite possibilities), Red has a lot to offer the wider community of developers and engineers, and Stackshare is a great place to help compare and contrast us with other languages. 
  • Now, a challenge! In the coming new year we'll be needing beta testers willing to lend their expertise in refining a new product built with Red. You read that right! If you think you'd like to be one of the contributors to spearhead a move into our next phase, we want YOU! Drop a line to @greggirwin to get in on the ground floor.
  • An appreciation to @hiiamboris for his deeply thought out proposal regarding "series evolution," a framework for standardizing and testing the functions we use in Red for manipulating series. Design is hard, and we have a number of initiatives in the works taking a lot of brain power right now.
  • Over 60 commits were made to Red's GTK branch in November, making it almost ready for "prime time." The product of a lot of work by a fair handful of the core team, heavy lifter @bitbegin says that merging to master branch is possibly the next step.
  • And, to close, a bit of a tease: Watch this space for some snazzy website changes coming soon.
All the warmest wishes for the upcoming holidays, from the Red family to yours. <3

-Lucinda.

November 20, 2019

Editorial: A Brief Essay on Lexical Ambiguity by G. Irwin

The original commentary was posted in Red's Gitter channel, here, by Gregg Irwin, one of our core team members, in response to various requests for the ability to create new datatypes in Red.

As a writer, Red has always appealed to me because of its flexibility; but, of course, "the [lexicon] devil is in the details," as the idiom goes (okay, I edited that idiom a little, but it was too cool a link to pass up). It means the more specific we try to be, the more challenges and limitations we encounter, and we can lose some of the amazing versatility of the language. On the other hand, precision and refinement--the "exact right word at the exact right time," can powerfully enhance a language's utility. The dynamic tension between what he calls "generality and specificity, human friendliness and artifice," in the text below, can be an energetic ebb and flow that serves to strengthen our language, to make it more robust. 

Two quotes from community members provide some context:
_____________________________

> The real problem is not number of datatypes, but the lexical syntax of the new ones. -@Oldes

> ...However if something like utype! is added, nothing prevents you from (ab)using system/lexer/pre-load and reinventing whole syntax. -@Rebolek


"I don't support abusing system/lexer/pre-load, and (in the long view) there will almost certainly be special cases where a new lexical form makes sense. We can't see the future, so we can't rule it out. But, and this is key, how much value does each new one add?

I believe that each new lexical form adds less value, and there is a point of diminishing returns. This is not just a lexical problem for Red, but for humans. We have limited capacity to remember rules, and a constrained hierarchy helps enormously here. Think more like linguists, and less like programmers or mathematicians.

In language we have words and numbers. Numbers can be represented as words, with their notation being a handy shortcut for use in the domain of mathematics. And while we classify nouns, verbs, and adjectives by their use, they are all words, and don't have syntax specific to their particular part of speech. That's important because a single word may be used in more than one context, for more than one purpose.

This is interesting, as a tangent, because human language can be ambiguous, though some synthetic languages try to eliminate that (e.g. Lojban). The funny thing is that it's almost impossible to write poetry or tell jokes in Lojban. Nobodyº speaks Lojban. This ties to programming because, while we all know the strengths and value of strict typing, and even more extreme features and designs meant to promote correctness, dynamic languages are used more at higher levels [such as poetry, songwriting and humor, where even the sounds used in one single word can be employed to evoke specific emotive responses in the listener--the effects of devices like assonance, consonance, and loose associations we make with even single letters, in the way a repeated letter R throughout a line of poetry or literature can subtly impart a sense of momentum and intensity to the text...possibly because it evokes a growl... -Ed.]. Why is that? Humans.

When Carl designed Rebol, it had a goal, and a place in time. He had to choose just how far to go. Even what to call things like email!, which are very specific to a particular type of technology. This is what gives Redbol langs so much of their power. They were designed as a data format, meant for exchanging information. That's the core. What are the "things" we need to exchange information about with other humans, not just other programmers?

Do I want new types? I'm pushing for at least one: Ref! with an @some-name-here syntax. It's not username! or filename+line-number!, or specific in any way. It's very general, as lexical types should be; their use and meaning being context-specific (the R in Redbol, which stands for "relative"). I also think ~ could be a leading numeric sigil to denote approximation. It came mainly from wanting a syntax for floats, to make it clear that they are imprecise; but it's tricky, because it could also be much richer, and has to take variables into account. ~.1 is easy, but what about x = ~n+/-5%? Units are also high value, but they are just a combination of words and numbers. (Still maybe worth a lexical form.)

When we look at what Red should support, and the best way to let users fulfill application and purpose-specific needs, we can learn from the past, and also see that there is no single right answer. Structs, Maps, Objects, data structures and functions versus OOP, strict vs dynamic.

As Forth was all about "Build a vocabulary and write your program in that," think about what constitutes a vocabulary; a lexicon. It's a balance, in Red, between generality and specificity, human friendliness and artifice. So when we ask for things, myself and Nenad included, we should first try to answer our need with what is in Red today, and see where our proposed solution falls on the line of diminishing returns. To this end, we can and should abuse system/lexer/pre-load for experimentation."

October 30, 2019

A Deeper Dive Into the Fast-Lexer Changes

What made the fast-lexer branch a priority?


Several things. It started when @dockimbel looked into ticket #3606, which was impossible to fix currently, and we didn't want to give up on the auto-syncing between /text and /data facets. So he had to consider bigger options, including how to make the lexer instrumentable. It was not easy, because the current lexer is not re-entrant, so having the lexer emit events to a callback function could have caused serious problems.

Digging through all Red's repos showed that the current lexer code was duplicated twice, beyond the basic lexing needed by load: once in the console code, once in the VSCode plugin, each time for syntax coloring purposes, and each one lagging behind the original implementation. Not good.

@Dockimbel then considered changing the current lexer to make it instrumentable, but the changes were significant and would have made the parse rules much more complex. At the same time, @qtxie did some benchmarking, and the result showed Red's lexer was ~240 times slower than Rebol's. This is not due to parse, but  rather because the high-level rules were optimized for readability, not performance.

The lexer also caused delays in the VSCode plugin, because of its (lack of) performance. The high level code has served Red well, and was a showcase for parse, but loading larger data is also being used by community members, and data sizes will just keep growing. With some projects we have on the horizon, the lexer's performance became a higher priority.

As planned since the beginning (the lexer used to be R/S-only during the pre-Unicode era), @dockimbel decided the best option was to not postpone the conversion of the lexer to pure R/S code any longer, by porting R3's C-based lexer to R/S. After studying Rebol's lexer in detail, he realized that the code was quite complex in some places (mostly the prescanner), and would lead to less than optimal R/S code that would be hard to maintain.

Evaluating the state of the art in fast parsers for programming languages, he found inspiration in some unpublished papers. He then started prototyping the next lexer in R/S, and realized that it could be several times faster than Rebol's, with the additional benefit of much smaller and simpler code. Then he embarked on the full implementation. Knowing he and @qtxie would not have the opportunity to work on that for probably a year with all the big tasks ahead on the roadmap, he committed to it full time.

Red's new R/S lexer is half the size of Rebol's, far simpler, with more maintainable code, and it performs at similar speeds (sometimes a bit faster, sometimes a bit slower). That is a fantastic result, because it means that with an optimizing backend (Red/Pro), our lexer will be 4-8 times faster than R3's. It should then be possible to load gigabytes of Red data in memory in just a few
seconds (using the future 64-bit version). 😉

An additional benefit was brought by @qtxie, who added a hashtable for symbol lookup in Red contexts. That sped up word loading tremendously, and should have a measurable improvement on Red's start up time; especially on slow platforms like Raspberry Pi.

@Dockimbel is almost done with the lexer itself, just date! and time! to add, and it should be possible to replace the old one with the new one after thorough testing and debugging. Then, we'll add the hooks for a user-provided callback, allowing us to instrument the lexer in ways Redbolers could only dream about until now. One application of that will be the ability to implement "predictive loading," which will tell you the type and size of a Red value in a string, without loading it, and at extremely high speed (~370MB/s currently, 1-2GB/s with /Pro). Such a feature will allow us to finally address the #3606 issue with a very clean and efficient solution, while keeping the facet's auto-syncing feature.

October 25, 2019

October 2019 In Review

Over the last few weeks the Red Lang core team drilled down to make some truly great progress on Red's fast-lexer branch--while we also gained valuable support from the contributions of Red doers and makers as they consolidate a world of useful information and resources.


Fast-Lexer Benchmarks


In the fast-lexer branch of Red, you can see lots of new work from Red creator @dockimbel (Nenad Rakocevic) and core teammate @qxtie. Among other fixes and optimizations, they substituted a hashtable for what had previously been a large array in context!

The numbers so far: Loading 100'000 words (5 to 15 characters, 1MB file): Red (master): 19000ms.  Red (fast-lexer): 150ms. Nenad's observations on further testing:
"FYI, we just [ran] some simple benchmarks on the new low-level lexer for Red using 1M 10-digit integers. The new lexer completes the loading about 100 times faster than the current high-level one. Loading 1M 10-digit integers in one block: Red: 175ms; R2: 136ms; R3: 113ms. 
"We use a faster method than Rebol, relying on several lookup tables and a big FSM with pre-calculated transition table (while Rebol relies on a lot of code for scanning, with many branches, so bad for modern CPU with branch predictions). With an optimizing backend, Red's LOAD should in theory run 2-3 times faster than Rebol's one. (Though, we still need to optimize the symbol table loading in order to reach peak performance).  Given that Rebol relies on optimized C code while Red relies on sub-optimal code from R/S compiler, that speaks volume about the efficiency of our own approach. So, Red/Pro should give us a much faster LOAD.
"The lexer is not finished yet, but the hard part is done. We still need to figure out an efficient way to load keywords, like escaped character names (`^(line), ^(page), ...) and month nouns in dates."
This is a huge accomplishment, and it's shaping up to make future goals even more impressive. The fast-lexer branch is a work in progress, but stay tuned: Nenad has more to say about why it's been prioritized just now, which we will have in an upcoming post.


Red's MVPs Contribute New Resource Material & Tools


If you're new to Red, sometimes the flexibility of the language can leave you uncertain about which aggregate structure to use. In red/red's wiki on github, @9214 contributes a useful guide for those seeking to tease apart the differences. For example, map! works better with data that can be atomized, or framed as a conventional associative array, while hash! lends itself to data that will be queried at a high volume and which will require fewer updates. Learn further linguistic nuances, including object! and block!, as well as a useful comparison table of their algorithmic complexity, here@Rebolek, meanwhile, has furnished us with loads of useful information, diving deeper into code statistics. His value datatype distribution, here, his unofficial Red build archive here, and his rebolek/red-tools repo containing various tools--line parsers, codecs, APIs and documentation among them--are greatly appreciated. The tools repo has a number of new features you can check out here.


About Those Ports...


Wondering about port!Here's the latest. We've got port! in the master branch already, but low-level input/output networking abilities aren't complete yet, so we need to focus on this, and your feedback can always help. "We have a working async TCP and TLS ports implementation (both client and server-side)," explains Nenad, "but they still require more work to cover all the target platforms." Here, he goes on to explain the prerequisites for our team to complete this process; your thoughts and code contributions are welcomed.


Games and Experiments


It's a fun one to end this update on: Red community member @GalenIvanov's "Island Alleys," a game of unspooling Hamiltonian paths! A path of this type only allows its line, which inscribes a closed loop, to cross through a vertex within a graph once, a process which can lend itself to neural network-related interpretations. And @planetsizedcpu offers a wintry little spin on this repo. Enjoy, and thanks to all!

September 15, 2019

The Latest: Red could help AI be more precise; community stars; one CSV codec to rule them all?

Hello to all the great makers, doers and creative people who are using Red, helping the Red Language grow and improve! As always, there's a standing invitation for you to join us on Gitter, Telegram or Github (if you haven't already) to ask questions and tell us about your Red-powered projects.

Here are some recent highlights we’d like to share with you:

1. Tickets Get Priority

In the last month, our core team has closed a large number of tickets.We’d like to thank community members rgchris, giesse, and dumblob who are just a few of the passionate contributors putting Red through its paces and providing feedback as fixes and changes occur. @WArP ran the numbers for us, showing a cyclical growth pattern linking bursts of closed issues and some serious Red progress, and September’s not even done yet!...:


2. CSV Codec Available

Our newly updated CSV codec has been merged in the master branch and is now a part of the nightly (or automatic) build here. It is in an experimental phase, and we want your feedback.

Should the standard codec only support block results, so it’s as simple as possible? Or do people want and need record and column formats as well (using the load-csv/to-csv helper funcs, rather than load/as)? Including those features as standard means they’re always available, rather than moving them to an extended CSV module; but the downside is added size to the base Red binary.

Applause goes to @rebolek’s excellent organization and his wiki on the codec, which explains the various ways in which Red can represent data matrices. He writes, “Choosing the right format depends on a lot of circumstances, for example, memory usage - column store is more efficient if you have more rows than columns. The bigger the difference, the more efficient.”

You can judge their efficiency here, where @rebolek has laid out the compile time, size and speed of each version, including encapping and lite. Be sure to get the latest build, and chat with everyone on Gitter to tell us what you think.

3. Red has reached 4K stars on GitHub!

We’re truly grateful for all the interest and support, and we are proud of the way our growth has been powered by this community.

4. AI + Red Lang Stack: Precision Tuning With Local OR Web-Based Datasets

In conversation with @ameridroid:
“Presently, it seems like most AI systems available today either allow building an AI from scratch using low level code (difficult and time-consuming), *OR* using a pre-built AI system that doesn't allow any fine-tuning or low-level programming...with the advent of NPUs (Neural Processing Units) akin to CPUs and GPUs, an AI toolkit would allow specifying what type of AI we want to perform (facial, image or speech recognition, generic neural net for specialized AI functions, etc.), the training data (images, audio, etc.) and then allow us to send it the input data stream and receive an output data stream…[using Red] would also allow us to integrate with the AI system at a low level if we have specific needs not addressed by the higher-level funcs. Red dialects would be a good way to define the AI functionality that's desired (a lot like VID does for graphics), but also allow the AI components, like the learning dataset or output data stream sanitization routines, to be fine-tuned via functions. Red can already work on web-based data using 'read or 'load, or work on local data in the same way; the learning data for a particular AI task could be located on the web or on the local machine. That's not easily possible with a lot of the AI solutions available today.”

Check back in the next few days for an update from @dockimbel!

Ideas, contributions, feedback? Leave a comment here, or c’mon over and join our conversation on TelegramGitter, or Github.
Fork me on GitHub