- Red GUI : Red interpreter + View + GUI console
- Red CLI : Red interpreter + CLI console
- Red Toolchain : Encapper for Red + Red/System compiler
Red Programming Language
July 29, 2022
New Red binaries
July 14, 2022
The Road To 1.0
You cannot have missed that in the last months (and even last years), our overall progress has slowed down drastically. One of the main reasons is that we have spread our limited resources chasing different objectives while making little progress on the core language. That is not satisfying at all and would bring us most likely to a dead-end as we exhaust our funding. We have spent the last weeks discussing about how to change that. This is our updated action plan.
From now on, our only focus will be to finish the core language and bring it to the much-awaited version 1.0. We need to reach that point in order to kickstart a broader adoption and provide us and our users a stable and robust foundation upon which we can build commercial products and services necessary for sustainability.
Given the complexities involved in completing the language and bringing an implementation that can run on modern 64-bit platforms, we have devised a two-stage plan.
Upgrade the current 32-bit Red implementation
👉 Language specification
It is now time to do so in order to clean-up some semantic rules and address all possible edge cases which will help fulfill our goals of implementation robustness and stability. The process of writing down the complete language specs will result in dropping some features that we currently have that end up being problematic or inconsistent. OTOH, we might add some new features that will need to be implemented for 1.0.
👉 Modules
We need a proper module system in order to be scalable. We also need to have a proper package management system which will be tied to a central repo where we can gather third-party libraries. That would also enable modular/incremental compilation (or encapping) which will be most probably supported in the self-hosted toolchain.
👉 Concurrency
We need a proper model for concurrent execution in order to leverage multicore architectures. We will define one and make a prototype implementation in the 32-bit version.
👉 Toolchain
Before starting to work on the new toolchain, we will make some changes to the existing version in order to prepare for the transition. The biggest change is the dropping of the Red compiler, which will only act as a (smart) encapper. Routines and #system directives will still be supported, but probably with some restrictions. The Red preprocessor might also see some changes. This change means that Red will only have one execution model instead of the two it has currently. The Red compiler has become more of a burden than a help. The speed gains are not that significant in real code (even if they can be in some micro-benchmarks), but the impossibility for the compiler to support the exact same semantics as the interpreter is a bigger problem. This move not only will bring more stability by eliminating some edge case issues but also will reduce the toolchain by almost 25% in size, which will help reduce the number of features to support in the new toolchain.
👉 Runtime library
Some improvements are long overdue in the Red runtime library. Among them:
- unified Red evaluation stack.
- unified node! management.
- improved processing of path calls with refinements.
- improved object! semantics.
All those changes are meant to simplify, reduce the runtime library code and address some systemic issues (e.g. stack management issues and GC node leaks).
👉 Documentation
We need proper, exhaustive, user-oriented documentation for the Red core language. This is one of the mandatory tasks that needs to be completed and done well for wider adoption.
Self-hosted Red for 64-bit version
👉 Toolchain
In order to go 64-bit, we have to drop entirely our current toolchain code based on Rebol2 and rewrite it with a newer architecture in Red itself. The current toolchain code was disposable anyway, it was not meant to live this long, so this was a move we had to do for 1.0 anyway.
So the new toolchain will feature:
- a new compilation pipeline with a plugin model.
- an IR layer.
- one or more optimizing layers.
- modular/incremental compilation support.
- x64, AArch64 and WASM backends.
- linker support for 64-bit executable file formats for the big 3 OS.
- support for linking third-party static libraries.
👉 Runtime library
Roadmap
Here are the main milestones:
- v0.7 : Full I/O with async support.
- v1.0b : (beta) completed self-hosted Red with 64-bit support.
- v1.0r : (release) first official stable and complete Red/Core language release.
- v1.1 : View 64-bit release.
- v1.2 : Android backend and toolchain release.
- v1.3 : Red/C3 release.
- v1.4 : Web backend for View release.
- v2.0 : Red JIT-compiler release.
- v3.0 : Red/...
The 0.7 should be the last version for the 32-bit Red version and current toolchain and we will be working on that first.
For reaching the 1.0-beta milestone, we target 12 months of intensive work, so that will bring us to Q3 2023. That's an ambitious goal but necessary to reach for the sake of Red's future.
The currently planned beta period for 1.0 is 2-3 months. We want a polished, rock-solid, production-ready 1.0 release.
For the 1.1, we will probably make some (needed) improvements to View engine architecture and backends.
For Red/C3, as the Ethereum network is transitioning to 2.0 and a new EVM, we need the WASM backend in order to support it.
Version 1.4 will bring a proper web runtime environment to the WASM backend, including GUI support.
The 2.0 will be focused on bringing a proper JIT-compiler to Red runtime, that should radically improve code execution of critical parts without having to drop to R/S.
Version 3.0 is already planned, but I will announce that once 1.0 will be released. ;-)
One major platform is missing from the above plan, that is iOS. Given how closed that platform is, we will need to come up with a specific plan on how to support it, as it won't be able to cross-compile for it (you would need a Mac computer), nor probably generate iOS apps without relying on Xcode at some point (not even mentioning dynamic code restrictions on the AppStore), which are layers of complexity that Red is trying to fight against in the first place... So for now, that platform is not among our priorities.
To finish, let me borrow some words from someone who succeeded more than anyone else in our industry:
Expect me to say "no" even more so from now on, as we get laser-focused on our
primary goal.
Cheers and let's go!
December 31, 2021
2021 Winding Down
Another quarter, another blog post. Seems almost rushed after the previous drought.
To set the stage, I'll start with a bit of a rant about complexity. If you just want the meat of what's happening in the Red world, feel free to skip the introduction.
Complexity Considerations: Part 1
I liked what the InfoWorld article, Complexity is Killing Software Developers said, which we all know, about difficult domains (voice and image recognition, etc.) being available as APIs. This lets us tackle things we couldn't in some cases. Though I imagine @dockimbel or others also used Dragon Dictate's libraries back in the 90s. What we have now is massive data to train systems like that. Those work well, allowing us to add features we otherwise couldn't with a small team.
The problem I see is that the trend has become for everything to be outsourced, including simple features like logging, and those libraries have exploded. There must be graphs available to show the change. Moderately complex domains, UIs for example, have risen in number and lead to what @hiiamboris says about Brownian Movement. It's a random collection of things, not designed to work together, without a coherent vision. A quote from the above article says it this way:
"Complexity is less the issue than inconsistency in an environment."
It used to be that you could take a FORTRAN, COBOL, Lisp, VB, Pascal/Delphi, Access/PowerBuilder, dBase/Clipper/Paradox, or even a Java developer, drop them into a project, and they could work from a solid core, learning the team's custom bits and any commercial tools as they went. With JS leading the way, but not alone in this, a programmer can only rely on a much smaller core, relative to how many libraries are used.
Because those libraries, and the choices to use a particular combination of them were not designed to work together, there is no guarantee (or perhaps hope) of consistency to leverage. It's worse if you came from a history of other tools that were based on different principles or priorities, because you have to unlearn, breaking the patterns in your mind. Or you convince people to use what you did before, even if there is overlap with tools already in use.
Things are changing now, and will even more. New service-based companies are coming, and a drive to APIs rather than libraries. So we not only have risks like LeftPad, but also companies going out of business under you. The modern trend means it's no longer dependent on an author or team committed to a project long term, but to what investors want, and what changes are made to gain adoption at all costs. As a service-based company you can't hold dearly to design principles if the investors tell you to pivot. Because it's no longer about your vision, but their return. If it is a solo FOSS author or small team, what is their incentive to maintain a project for free, while others profit from it? Success can be your worst enemy, and we need a more equitable solution than what we have now. The software business model has changed dramatically, and will likely continue to do so.
Here is what I personally see as the crux of the problem: the goal of scaling. FOSS projects and companies are only considered successful if they have millions (or, indirectly, billions) of users. Companies that want to be sustainable, providing long term, moderate profits don't make headlines, but they make the world go 'round. They are not the next big social media disruption where end users are the product, to be bought and sold. It is a popular business model and profit is the goal. It's nothing personal.
This has led us to the thinking that every project needs to be designed for millions of users at the very least. Sub-second telemetry for all the data collected, another explosion, giving rise to data analytics for everyone; not just Business Intelligence (BI) for large companies. I won't argue against having data. I love data and learning from it. But I do believe there is a point of diminishing returns which is often ignored. Rather, in this case, there is a cost of entry that small projects wouldn't otherwise need to pay.
What do you do, as an "architect" (see the previous blog post about my thoughts on software architecture) or developer on a team? Your small team (we all know small teams are best, plenty of research and history there) simply can't design and build every piece to support these scaling demands, while the sword of Damocles hangs over you in the form of potential pivots (dramatic changes in goals).
As an industry, we are being inexorably forced to make these choices. Either you're a leader and make your own Faustian bargain, or you're in the general mass of developers being whipped and driven to the gates of Hell.
Only you, dear reader, can decide the turns this tragic story will take, and what you forgive in this telling perchance I should exaggerate.
Complexity Considerations: Part 2
Complexity doesn't come only in the form McCabe is famous for, the decision points in a piece of code, but in how many pieces there are and how often they change either by choice or necessity. Temporal Complexity if you will. This concept is unrelated to algorithmic time complexity. Rebol2 for any faults we can point out, still works to this day (except in cases where the world changed out from under it, e.g., in protocols). It was self-contained, and relied only on what the OS (Operating System) provided. As long as OSs don't break a core set of functionality that tools rely on, things keep working. R2 had a full GUI system (non-native, which insulated it from changes there), and I can only smile when I run code that is 20 years old and it works flawlessly. If that sounds silly, remember that technology, in most cases, is not the goal. It is a means to an end. A lot of very old code is still in production, keeping businesses running.
We talk about needing to keep up with changes, but some things don't change very much, if at all. Other things change rapidly, but for no good reason, and without being an improvement. If a change is just a lateral move there is no value in it, unless it is to align us on a different, and better, path in the future. I started programming with QuickBASIC, but also used other tools as I quickly learned my tool of choice came with a stigma attached, and I wanted to be a serious, "real" programmer. What became clear was that QB was a great tool, with a few companies providing terrific ASM libraries, and had a wonderful IDE to boot. It was simpler, not only as a language, but because every 12-18 months (the release cycle way back when) my new C compiler would break something in my code. But QB, and later BASIC/PDS and then VB very rarely broke working code. Temporal complexity.
Even then there were more complex options. The cool kids used Zortech C++ and there were various cross-platform GUI toolkits. But those advanced tools were often misapplied to simple projects. We still do that today. Much of that is human nature, and the nature of programmers. If it's easy we are no longer special. We may not mean to, but we make things harder than they need to be. Some of us are even elitist about what we do, to our own detriment. If you don't need to be cross platform, why do you have multiple machines or VMs each with a different compiler setup? If you need a GUI, why are you using a language that was not designed with them in mind? If you need easy deployment, which is simpler: a single EXE with no dependencies, or a containerization approach with all that entails? How many technologies do you need in your web stack? Are you the victim of peer pressure, where you feel your site has to be shiny and "responsive", or use the latest framework?
A big argument for using other's work is performance. They've taken time, and may be experts, to optimize Thing X far beyond what you could ever do. That JIT compiler, an incredible virtual DOM, such clever CSS tricks, the key-value DB with no limits, and yet...and yet our software is slower and more bloated than ever. How can that be? Is it possible we're overbuilding? Is software sprawl just something we accept now?
Earlier I mentioned that a hodge-podge assembly of parts that have no standards, norms, or even aesthetic sense applied does not make our lives easier. Lego blocks, the originals anyway, are limited, but consistent in how they can be used. We misapply that analogy, because the things we build are far from consistent or designed to interact. Even in the realm of UX and A/B testing on subsets of users that companies apply today. I love the idea of data-driven HCI to guide us to a more evidence-oriented approach. This includes languages. But when a site or service moves fast and changes their interface based on their own A/B testing, they don't account for the others doing the same. Temporal complexity.
As a user, every app or site I access may change out from under me in the flash of refresh or automatic update I didn't ask for. Maybe it's better, an actual improvement, if you only use that one site. But if all your tools constantly change out from under you, it's like someone sneaking into your office and rearranging it every night while you sleep. Maybe this is the developer's revenge, for the pain we inflict on ourselves by constantly changing our own tools. If we suffer, why shouldn't our users? For those who truly have empathy for their users and don't want to drive them mad, or away, perhaps the lesson is to have empathy for ourselves, for our own tribe. I don't want to see my friends and colleagues burn out, when it was probably the enjoyment and passion that solving problems with software can bring which led them here to begin with.
Every moving part in your system is a potential point of failure. Reduce the moving parts and reliability increases. Whether it's the OS you run on (we now have more of those than ever, between Linux distros and mobile platforms always trying to outdo each other), extra packages or commercial tools, FOSS libraries, environments, [?]aaS, or platform components like containers and cluster management, every single piece is a point of failure. And if any of them break your code, or your system, even in the name of improvements or bug fixes, you may find yourself running just to stay in the same place. Many of those pieces are touted as the solution to reliability problems, but a lot of them just push problems around, or target problems you don't have. Don't solve problems you don't have. That adds complexity, and now you really have a problem.
Less Philosophy, More Red
Interpreter Events
logger: function [ event [word!] ;-- Event name code [any-block! none!] ;-- Currently evaluated block offset [integer!] ;-- Offset in evaluated block value [any-type!] ;-- Value currently processed ref [any-type!] ;-- Reference of current call frame [pair!] ;-- Stack frame start/top positions ][ print [ pad uppercase form event 8 mold/part/flat either any-function? :value [:ref][:value] 20 ] ]Given this code:
do/trace [print 1 + 2] :loggerIt will output:
INIT none ;-- Initializing tracing mode ENTER none ;-- Entering block to evaluate FETCH print ;-- Fetching and evaluating `print` value OPEN print ;-- Results in opening a new call stack frame FETCH + ;-- Fetching and evaluating `+` infix operator OPEN + ;-- Results in opening a new call stack frame FETCH 1 ;-- Fetching left operand `1` PUSH 1 ;-- Pushing integer! value `1` on stack FETCH 2 ;-- Fetching and evaluating right operand PUSH 2 ;-- Pushing integer! value `2` CALL + ;-- Calling `+` operator RETURN 3 ;-- Returning the resulting value CALL print ;-- Calling `print` 3 ;-- Outputting 3 RETURN unset ;-- Returning the resulting value EXIT none ;-- Exiting evaluated block END none ;-- Ending tracing modeSeveral tools are now provided in the Red runtime library, built on top of this event system:
- An interactive debugger console, with many capabilities (step by step evaluation, a flexible breakpoint system, and call stack visualisation).
- A simple profiler that we will improve over time (especially on the accuracy aspects).
-
A simple tracer. The current evaluation steps are quite low-level, but @hiiamboris has already built an extended version, operating at the expression level that will soon be integrated into the master branch.
Format
Split
Markup Codec
CLI Module
IPv6 Datatype
Getting Near
The Daily Grind
Roadmap
Q4 2021 (retrospective)
- We hoped to have `format` and `split` deployed, but they will push back to Jan-2022.
- `CLI` module approved, needs to be merged, then refined as necessary.
- `Markup Codec` took longer than expected due to extensive design chat on formats.
- Interpreter instrumentation, with PoC debugger and profiler. Took longer than expected, but are out now.
- Async I/O, out but some extra bits didn't make it in. One unplanned addition was `IPv6!` as a datatype. It's experimental, and subject to change.
- @galenivanov did some great work on his animation dialect, but @toomasv's `diagram` dialect took a back seat and will move to Q1 2022.
- Audio has 3 working back ends and a basic port implementation. Next up is higher level design, device and format enumeration, and device control. A `port!` may not be the way to go for all this, but it was step one.
- Animation has more great examples all the time. Like this and this. @GalenIvanov is doing great work, and we are planning to make his dialect a standard addition to Red.
2022
- `Table` module, `node!` datatype and other REP reviews
- Full HTTP/S protocol and basic web server framework
- New DiaGrammar release
- Animation dialect
- New release process
- New web sites updated and live
- Red/C3 (Including ETH 2.0 client protocol)
- Red Language Specification (Principles, Core Language, Evaluation Rules, Datatype Specs (including literal forms), Action/Native specs, Modules spec.
- 64-bit support (LLVM was a possibility, but we learned from Zig that LLVM breaking changes can be quite painful for small teams to keep up with. We may be better off continuing to roll our own, though it's a big task.)
- Android update
- Red Spaces cross-platform GUI
- Module and package system design
- RAPIDE (Rapid API Development Environment)
RAPIDE, from Redlake Technologies
In conclusion
August 4, 2021
Long Time No Blog
It's been almost a year since our last blog post. Sorry about that. It's one of those things that falls off our radar without a person dedicated to it, and we run lean so don't have anyone filling that role right now. We know it's important, even if we have many other channels where people can get information. So here we are.
Last year was a tough year all around, even for us. We were already a remote-only team, but the effect the pandemic had on the world, particularly travel, hit us too. We had some team changes, and also split our focus into product development alongside core Red Language development. This is necessary for sustainability, because people don't pay for programming languages, and they don't pay for Open Source software. There's no need to comment on the exceptions to these cases, because they are exceptions. The commercial goal, starting out, is to focus on our core strengths and knowledge, building developer-centric tools. Our first product, DiaGrammar for Windows, was released in December 2020, and we've issued a number of updates to it since then. Our thanks to Toomasv for his ingenuity and dedication in creating DiaGrammar. We are a team, but he really accepted ownership of the project and took it from an idea to a great product. Truly, there is nothing else like it on the market.
We learned a lot from the process of creating a product, and will apply that experience moving forward. An important lesson is that the product itself is only half the work. As technologists, we're used to writing the code and maybe writing some docs to go with it. We don't think about outreach, marketing, payments, support, upgrade processes for users, web site issues, announcements, and more. The first time you do something is the hardest, and we're excited to improve and learn more as we update DiaGrammar and work on our next product. We'll probably announce what it will be in Q4. One thing we can say right now is that the work on DiaGrammar led to a huge amount of work on a more general diagramming subsystem for Red. It's really exciting, and we'll talk more about that in a future blog post.
So what have we been doing?
Since our last blog post we've logged over 400 fixes and 100 features into Red itself. Some of these are small, but important, others are headline-worthy; some are deep voodoo and some visible to every Reducer (what we call Red users). For example, most people use the console (the REPL), so the fixes and improvements there are easy to see. A prime feature being that the GUI console, but not the CLI console, didn't show output if the UI couldn't process events. This could happen if you printed output in a tight loop. The results would only show up at the termination of the loop, when the system could breathe again. That's been addressed, but wasn't easy and still isn't perfect. Red is still single threaded, so there's no separate UI thread (pros and cons there). We make these tradeoffs every day, and need feedback from users and real world scenarios to help find the right balance. Less obvious are things like improvements to parse, which not everyone uses. Or how fmod works across platforms, and edge cases for lexical forms (e.g. is -1.#NaN valid?). The latter is particularly important, because Red is a data language first.
JSON is widely used, but people may not notice that the JSON decoder is 20x faster now, unless they're dealing with extremely large JSON datasets. JSON is so widely used that we felt the time spent, and the tradeoffs made, were worth it. It also nicely shows one of Red's strengths. Profiling showed that the codec spent a lot of time in its unescape function. @hiiamboris rewrote that as a Red/System routine, tweaked it, and got a massive speedup. No external compiler needed, no need to use C, and the code is inlined so it's all in context. Should your JSON be malformed, you'll also get nicer error information now. As always, Red gives you options. Use high level Red as much as possible, for the most concise and flexible code, but drop into Red/System when it makes sense.
Some features cross the boundary of what's visible. A huge amount of work went into D2D support on Windows. D2D is Direct2D, the hardware-accelerated backend for vector graphics on Windows. For users, nothing should change as all the details are hidden. But the rendering behavior is not exactly the same. We try to work around that, but sometimes users have to make adjustments as well; we know because DiaGrammar is written in Red and uses the draw dialect heavily. It's an important step forward, but comes at a cost. GDI+ is now a legacy graphics back end, and won't see regular updates. Time marches on and we need to look forward. As if @qtxie wasn't busy enough with that, he and @dockimbel also pushed Full I/O forward in a big way. It hasn't been merged into the main branch yet, but we expect that to happen soon. @rebolek has been banging on it, and has a full working HTTP protocol ready to go, which is great. TLS/SSL support gets an A+ rating, which is also a testament to the design and implementation. It's important to note that the new I/O system is a low level interface. The higher level API is still being designed. At the highest level, these details will all be hidden from users. You'll continue to use read, write, save, load exactly as you do today, unless you need async I/O.
Another big "feature" came from @vazub: NetBSD support. The core team has to focus on what stands to help the project overall, with regard to users and visibility. Community support for lesser known platforms is key. If you're on one of those platforms, be (or find) a champion. We'll help all we can, but that's what Open Source is for. Thanks for this contribution @vazub!
We also have some new Python primers up, thanks to @GalenIvanov. Start at Coming-to-Red-from-Python. Information like this is enormously important. Red is quite different from other languages, and learning any new language can be hard. We're used to a set of functionality and behaviors, which sometimes makes the syntax the easiest part to learn. Just knowing what things are called is a learning curve. Red doesn't use the same names, because we (and Carl when he designed Rebol) took a more holistic view. That's a hard sell though. We feel the pain. A user who found Red posted a video as they tried to do some basic things. We learned a lot from watching it. Where other languages required you to import a networking library, it's already built into Red. When they were looking for request or http.get, and expecting strings to be used for URLs, they couldn't find answers. In Red you just read http://.... It's obvious to us, but not to the rest of the world. So these new primers are very exciting. We have reference docs, and Red by Example, but still haven't written a User's Guide for Red. We'll get there though.
Why do things take so long?
Even with that many fixes and features logged, and huge amounts of R&D, it can still feel like progress is slow. The world moves fast, and software projects are often judged by their velocity. We even judge ourselves that way, and have to be reminded to stay the course, our course, rather than imitating others. Red's flexibility also comes into play. Where other languages may limit how you can express solutions, we don't. It's so flexible that people can do crazy things or perform advanced tricks which end up being logged as bugs and wishes. Sometimes we say No (a lot of times in fact), but we also try to keep an open mind. We have to ask "Should that be allowed?", "Why would you want to do that (even though I never have)?", and "What are the long term consequences?" We have to acknowledge that Red is a data format first, and we never want to break that. It has to evolve, but not breaking the format is fundamental. And while code is expected to change, once people depend on a function or library it causes them pain if we break compatibility. We don't want to do that, though sometimes we will for the greater good and the long view. There are technical bandages we can patch over things, but it's a big issue that doesn't have a single solution. Not just for us, but for all software development. We'll talk more about this in the future as well.
I'll note some internal projects related to our "slow and steady" process:
- Composite is a simple function that does for strings what compose does for blocks. It's a basic interpolator. But the design has taken many turns. Not just in the possible notations, but whether it should be a mezzanine function, a macro, or both. Each has pros and cons (Side note: we don't often think about "cons" being an abbreviation for "consequences"). This simple design and discussion is stalled again, because another option would be a new literal form for interpolated strings. That's what other languages do, but is it a good fit for Red? We belabor the point of how tight lexical space is already, so have to weigh that against the value of a concise notation.
- Non-native GUI. Red's native GUI system was chosen in response to Rebol's choice to go non-native. Unfortunately it's another case of needing both. Being cross platform is great for Red users, but Hell for us. Throw in mobile and it's even worse. Don't even talk about running in the browser. But every platform has native widget limitations. Once you move beyond static text, editable fields, buttons, and simple lists, you're in the realm of "never the twain shall meet". How do you define and interact with grids and tables or collapsible trees? Red already has its own rich-text widget, so you don't have to embed (even if you could) an entire web browser and then write in HTML and CSS. To address all this, with much research and extensive use case outlines, @hiiamboris has spent a lot of time and effort on Red Spaces. Show me native widgets that can do editable spiral text, put any layout inside a rotator, or define recursive UIs. I didn't think so. Oh, and the wiggling you see in the GIFs there are not mistakes or artifacts, they are tests to show that any piece of the UI can be animated.
- Other projects include format, split, HOFs, and modules, each with a great deal of design work and thought put into them. As an example, look at Boris' HOF analysis. They are large and important pieces, based on historical and contemporary research, but not something we will just drop into Red, though we could. A simple map function is a no-brainer, and could have been there day one. But that's not how we work. It's not a contest to see how many features we can add, or how fast; but how we can move software forward, make things easier, and push the state of the art. Not just in technical features (the engineering part), but in the design of a language and its ecosystem.
Not Everyone Has These Problems
@hiiamboris It was a (R/S) compiler issue afterall. ;-)
size? a
was the guilty part. The compiler was wrongly generating code for loadinga
even thoughsize?
is statically evaluated by the compiler and replaced by a static integer value. Given thata
was a float type, its value was pushed onto the x87 FPU stack, but never popped. That stack has a 7 slots limit. Running the loop 5 times was enough to leave only 2 slots free. When the big float expression is encountered indtoa
library, it requires 3 free slots on the FPU stack, which fails and results in producing a NaN value, which wreaks havoc in the rest of the code.
The Big Picture
August 20, 2020
Red/System: New Features
In the past months, many new features were added to Red/System, the low-level dialect embedded in Red. Here is a sum up if you missed them.
Subroutines
During the work on the low-level parts of the new Red lexer, the need arised for intra-function factorization abilities to keep the lexer code as DRY as possible. Subroutines were introduced to solve that. They act as the GOSUB directive from Basic language. They are defined as a separate block of code inside a function's body and are called like regular functions (but without any arguments). So they are much lighter and faster than real function calls and require just one slot of stack space to store the return address.
The declaration syntax is straightforward:
<name>: [<body>] <name> : subroutine's name (local variable). <body> : subroutine's code (regular R/S code).
To define a subroutine, you need to declare a local variable with the subroutine! datatype, then set that variable to a block of code. You can then invoke the subroutine by calling its name from anywhere in the function body (but after the subroutine own definition).
Here is a first example of a fictive function processing I/O events:
process: func [buf [byte-ptr!] event [integer!] return: [integer!] /local log do-error [subroutine!] ][ log: [print-line [">>" tab e "<<"]] do-error: [print-line ["** Error:" e] return 1] switch event [ EVT_OPEN [e: "OPEN" log unless connect buf [do-error]] EVT_READ [e: "READ" log unless receive buf [do-error]]
EVT_WRITE [e: "WRITE" log unless send buf [do-error]]
EVT_CLOSE [e: "CLOSE" log unless close buf [do-error]]
default [e: "<unknown>" do-error] ] 0 ]
This second example is more complete. It shows how subroutines can be combined and how values can be returned from a subroutine:
#enum modes! [ CONV_UPPER CONV_LOWER CONV_INVERT ] convert: func [mode [modes!] text [c-string!] return: [c-string!] /local lower? upper? alpha? do-conv [subroutine!] delta [integer!] s [c-string!] c [byte!] ][ lower?: [all [#"a" <= c c <= #"z"]] upper?: [all [#"A" <= c c <= #"Z"]] alpha?: [any [lower? upper?]] do-conv: [s/1: s/1 + delta] delta: 0 s: text while [s/1 <> null-byte][ c: s/1 if alpha? [ switch mode [ CONV_UPPER [if lower? [delta: -32 do-conv]] CONV_LOWER [if upper? [delta: 32 do-conv]] CONV_INVERT [delta: either upper? [32][-32] do-conv] default [assert false] ] ] s: s + 1 ] text ] probe convert CONV_UPPER "Hello World!" probe convert CONV_LOWER "There ARE 123 Dogs." probe convert CONV_INVERT "This SHOULD be INVERTED!"
will output:
HELLO WORLD! there are 123 dogs. tHIS should BE inverted!
Support for getting a subroutine address and dispatching dynamically on it is planned to be added in the future (something akin computed GOTO). More examples of subroutines can be found in the new lexer code, like in the load-date function.
New system intrinsics
Lock-free atomic intrinsics
- system/atomic/fence: generates a read/write data memory barrier.
- system/atomic/load: thread-safe atomic read from a given memory location.
- system/atomic/store: thread-safe atomic write to a given memory location.
- system/atomic/cas: thread-safe atomic compare&swap to a given memory location.
- system/atomic/<math-op>: thread-safe atomic math or bitwise operation to a given memory location (add, sub, or, xor, and).
- system/stack/allocate/zero: allocates a storage space on stack and zero-fill it.
- system/stack/push-all: saves all registers to stack.
- system/stack/pop-all: restores all registers from stack.
- system/fpu/status: retrieves the FPU exception bits status as a 32-bit integer.
Improved literal arrays
The main change is the removal of the hidden size inside the /0 index slot. The size of a literal array can now only be retrieved using the size? keyword, which is resolved at compile time (rather than run-time for /0 index access).
A notable addition is the support for binary arrays. Those arrays can be used to store byte-oriented tables or embed arbitray binary data into the source code. For example:
table: #{0042FA0100CAFE00AA} probe size? table ;-- outputs 9 probe table/2 ;-- outputs "B" probe as integer! table/2 ;-- outputs 66The new Red lexer code uses them extensively.
Variables and arguments grouping
It is now possible to group the type declaration for local variables and function arguments. For example:
foo: func [ src dst [byte-ptr!] mode delta [integer!] return: [integer!] /local p q buf [byte-ptr!] s1 s2 s3 [c-string!] ]
Note that the compiler supports those features through code expansion at compile time, so that error reports could show each argument or variable having its own type declaration.
Integer division handling
Integer division handling at low-level has notorious shortcomings with different handling for each edge case depending on the hardware platform. Intel IA-32 architecture tends to handle those cases in a slightly safer way, while ARM architecture produces erroneous results silently typically for the following two cases:
- division by zero
- division overflow (-2147483648 / -1)
IA-32 CPU will generate an exception, while ARM ones will return invalid results (respectively 0 and -2147483648). This makes it difficult to produce code that will behave the same on both architectures when integer divisions are used. In order to reduce this gap, R/S compiler will now generate extra code to detect those cases for ARM targets and raise a runtime exception. Such extra checkings for ARM are produced only in debug compilation mode. In release mode, priority is given to performance, no runtime exception will occur in such cases on ARM (as the overhead is significant). So, be sure to check your code on ARM platform thoroughly in debug mode before releasing it. This is not a perfect solution, but at least, it makes it possible to detect those cases through testing in debug mode.
Others
Here is a list of other changes and fixes in no particular order:
-
Cross-referenced aliased fields in structs defined in same context are now allowed. Example:
a!: alias struct! [next [b!] prev [b!]] b!: alias struct! [a [a!] c [integer!]]
- -0.0 special float literal is now supported.
- +1.#INF is also now supported as valid literal in addition to 1.#INF for positive infinite.
- Context-aware get-words resolution.
- New #inline directive to inline assembled binary code.
- Dropped support for % and // operators on float types, as they were relying on FPU's relative support, the results were not reliable across platforms. Use fmod function instead from now on.
- Added --show-func-map compilation option: when used, it will output a map of R/S function addresses/names, to ease low-level debugging.
- FIX: issue #4102: ASSERT false doesn't work.
- FIX: issue #4038: cast integer to float32 in math expression gives wrong result.
- FIX: byte! to integer! conversion not happening in some cases. Example: i: as-integer (p/1 - #"0")
- FIX: compiler state not fully cleaned up after premature termination. This affects multiple compilation jobs done in the same Rebol2 session, resulting in weird compilation errors.
- FIX: issue #4414: round-trip pointer casting returns an incorrect result in some cases.
- FIX: literal arrays containing true/false words could corrupt the array. Example: a: ["hello" true "world" false]
- FIX: improved error report on bad declare argument.
August 3, 2020
A New Fast and Flexible Lexer
- High performance, typically 50 to 200 times faster than the older one.
- New scanning features: identify values and their datatypes without loading them.
- Instrumentation: customize the lexer's behavior at will using an event-oriented API.
The reference documentation is available there. This new lexer is available in Red's auto-builds since June.
Performance
- 100 x compiler.r: loads 100 times compiler.r source file from memory (~126KB, so about ~12MB in total).
- 1M short integers: loads a string of 1 million `1` separated by a space.
- 1M long integers: loads a string of 1 million `123456789` separated by a space.
- 1M dates: loads a string of 1 million `26/12/2019/10:18:25` separated by a space.
- 1M characters: loads a string of 1 million `#"A"` separated by a space.
- 1M escaped characters: loads a string of 1 million `#"^(1234)"` separated by a space.
- 1M words: loads a string of 1 million `random "abcdefghijk"` separated by a space.
- 100K words: loads a string of 100 thousands `random "abcdefghijk"` separated by a space.
Loading Task v0.6.4 (sec) Current (sec) Gain factor --------------------------------------------------------------------- 100 x compiler.r 41.871 0.463 90 1M short integers 14.295 0.071 201 1M long integers 18.105 0.159 114 1M dates 29.319 0.389 75 1M characters 14.865 0.092 162 1M escaped characters 14.909 0.120 124 1M words n/a 1.216 n/a 100K words 23.183 0.070 331
Scanning
>> scan "123" == integer! >> scan "w:" == set-word! >> scan "user@domain.com" == email! >> scan "123a" == error!
>> scan/fast "123" == integer! >> scan/fast "a:" == word! >> scan/fast "a/b" == path!
src: "hello 123 world 456.789" until [ probe first src: scan/next src empty? src: src/2 ]
word! integer! word! float!
Matching by datatype in Parse
>> parse to-binary "Hello 2020 World!" [word! integer! word!] == true >> parse to-binary "My IP is 192.168.0.1" [3 word! copy ip tuple!] == true >> ip == #{203139322E3136382E302E31} >> load ip == 192.168.0.1
Instrumentation
- Trace the behavior of the lexer for debugging or statistical purposes.
- Catch errors and resume loading by skipping invalid data.
- On-the-fly input transformation (to remove/alter some non-loadable parts).
- Extend the lexer with new lexical forms.
- Process serialized Red data without having to fully load the input.
- Extract line comments that would be lost otherwise.
transcode/trace <input> <callback> <input> : series to load (binary! string!). <callback> : a callback function to process lexer events (function!).
>> transcode/trace "hello 123" :system/lexer/tracer prescan word 1x6 1 " 123" scan word 1x6 1 " 123" load word hello 1 " 123" prescan integer 7x10 1 "" scan integer 7x10 1 "" load integer 123 1 "" == [hello 123]
Implementation notes
FSM graph -> Excel file -> CSV file -> binary table
- Manually edit changes in the lexer-states.txt file.
- Port those changes into the lexer.xlsx file by properly setting the transition values.
- Save that Excel table in CSV format as lexer.csv.
- Run the generate-lexer-table.red script from Red repo root folder. The lexer-transitions.reds file is regenerated.