September 22, 2012

Plan for Unicode support


Red is growing up fast, even if just born two weeks ago! It is time we implement basic string support so we can do our first, real, hello-word. ;-)

Red strings will natively support Unicode. In order to achieve that in an efficient and cross-platform way, we need a good plan. Here is the list of Unicode native formats used by our main target platforms API:

        Windows       : UTF-16
        Linux         : UTF-8
        MacOSX/Cocoa  : UTF-16
        MacOSX/Darwin : UTF-8
        Java          : UTF-16
        .Net          : UTF-16
        Javascript    : UTF-8
        Syllable      : UTF-8
   
All these formats are variable-width encodings, requiring any indexed access to pay the cost of walking through the string.

Fortunately, there are also fixed-width Unicode encodings that can be used to give us back constant time for indexed accesses. So, in order to make it the most space-efficient, Red strings will internally support only these encoding formats:

        Latin-1 (1 byte/codepoint)
        UCS-2   (2 bytes/codepoint)
        UCS-4   (4 bytes/codepoint)

This is not something new, at least Python 3.3 does it in the same way.

Additionally, UTF-8 and UTF-16 codecs will be supported, in order to deal with I/O accesses on host platforms.

Red will use UTF-8 for exchanging strings with outer world by default, except when accessing a UTF-16 API is necessary. Conversion for input and output strings will be done on-the-fly between one of the internal representation and UTF-8/UTF-16. When reading an input string, Red will select the most space-efficient internal format depending on highest codepoint in the input string. Also users should be able to force the encoding of a string to a given internal format, when possible.

So far, this is the plan for additing Unicode to Red, a prototype implementation will be done quickly, so we can fine-tune it if required.

Comments and suggestions are welcome.

September 6, 2012

Red is born


Yesterday, I finally got my first real Red program compiling and running:
    Red []
    print 1
This doesn't look like much but it proves that the whole current Red stack is working properly. Red compiler generates Red/System code in memory, that is then compiled to native code and linked in a 14KB executable file (contains the full Red runtime with currently 9 datatypes partially implemented).

The baby needs a few more days in the nursery before I commit the new code to the v0.3.0 branch on Github. Once the existing datatypes will be more complete and once we choose how to deal internally with Unicode strings, we should be able to release the first Red alpha.

The current Red compiler is pretty simple and should remain light in the future. I have finally chosen an hybrid dynamic/static type system, to avoid diving into a complex type inference engine now, as I realized that once we get out of the bootstrap and have the final Red JIT-compiler, it will be much more easier to achieve. Also, I want to pass that bootstrap stage as soon as possible, because it is really limiting the full Red potential.

Stay tuned, the next months will be fun! ;-)

Fork me on GitHub