Thursday, August 27, 2020

async and await in a nutshell

The async and await keywords introduced in Ecmascript 2017 are significant improvements on the way asynchronous functions in Javascript have been called in the past, and are now supported in all major browsers, and in Node.js. Unfortunately, most explanations on the Web are marred by unnecessary detail. For the benefit of other confused readers I will try to explain the underlying concepts in the simplest possible terms.

Javascript functions come in two basic flavours: asynchronous and synchronous. Asynchronous functions execute when the time is right, whereas synchronous functions execute straightaway. Asynchronous functions give Javascript its power by maximising the usage of the CPU instead of blocking while network and other in/out operations complete.

The async and await keywords simplify the calling of asynchronous functions, which can now be declared with the async keyword. This can be applied to a simple function, a method in a class or a static class method:

async function foo(){ return 1;}
class bar { async foo(){ return 1;} }
class bar { static async foo(){ return 1;} }

When called, these functions or methods will yield the CPU to other pending functions rather than execute immediately. The await keyword lets you treat asynchronous functions as if they were synchronous. That is, you don't have to obscure the clarity of your program with callbacks, explicit promises or then-clauses to get things done. (Hallelujah!) Here's a simple example that uses async/await in a loop:

async function bar() {
   return 2;
}
async function foo(){
    let arr = Array();
    arr.push(1);
    arr.push(7);
    arr.push(3);
    let total = 0;
    for ( let i=0;i<arr.length;i++ ) {
        if ( i == 1 )
            total += await bar();
        else 
            total += arr[i];
    }
    console.log(total);
}
foo();

This adds up a list of numbers: 1, 7 and 3, whose total should be 11. But when it tries to add the second element of the array it gets it instead from the asynchronous bar() function, which returns 2 rather than 7. The program can be written as if it were synchronous because that call to bar() is prefixed by the await keyword. In effect, the code pauses at that point until bar() completes, before executing the next step in the loop. So the answer is 6, not 11.

However, you can only use await inside a function declared with async. Also, if you call an async function inside a synchronous function, or at the top level, the next step will execute immediately and not wait for the async function to complete. In the above example the final call to foo() is from the synchronous top-level, but it doesn't matter because this is the last statement in the program.

And finally, many functions defined in frameworks are already asynchronous. You can also call these using the await keyword. For example in jQuery you can use await with $.get, as long as you call it within a function declared with async:

let data = await $.get(url);
// do something with data...

So you don't have to worry about callbacks, and that's cool.

Tuesday, April 14, 2020

Converting UTF-8 and UTF-16 arrays to strings in Javascript and vice versa

Support for UTF-8 and UTF-16 conversion is not that great in Javascript. There are libraries for Node.js, like StringDecoder, but you have to require them. And in the browser they won't work. For browser Javascript you can use TextEncoder but it doesn't work in all browsers consistently and only in Node.js via the util module. So if you want (like me) something that can convert UTF-8 byte arrays and UTF-16 character arrays into strings and vice versa, and have exactly the same code work in both Node.js and in browsers with no dependencies you might begin to understand my problem.

A few people recommend using unescape(encodeURIComponent(s)) to encode utf-8 and decodeURIComponent(escape(s)) to decode, but both escape and unescape are deprecated. Also this method only produces strings, not Uint8Arrays and doesn't handle the UTF-16 case. Why would you need an array of UTF-8 bytes or UTF-16 characters? Because char and byte arrays can be compared and indexed into more easily. Also files store string data in these formats, especially UTF-8. If only bits of your file are in UTF-8 then you have to convert the string parts piecemeal. There are probably other uses too, or else Uint8Array and Uint16Array wouldn't exist.

UTF-8

For UTF-8 conversion Javascript already has two functions that do most of the work: encodeURIComponent and decodeURIComponent. encodeURIComponent takes a string and escapes a few reserved characters and also ASCII codes greater than 127 into single byte escape sequences. So '%" becomes '%25' and 'ó' becomes '%C3%B3'. This method also works on Unicode characters outside the Basic MultiLingual plane, for example the gothic character Hwair: 𐍈, which is escaped to '%F0%90%BD%88'. Once we have the escaped sequence it is fairly easy to take each byte and encode it as a 8-bit integer within a Uint8Array. The reverse process (Uint8Array to string) is also simple: any byte less than 128 can be converted back into a character using String.fromCodePoint(n), where n is the 8-bit value. For code points from 128-255 they can be converted back into their escape string form. Then the string built up this way can be passed through decodeURIComponent to produce the original string.

UTF-16

UTF-16 is even easier since all Javascript strings are already encoded in UTF-16. To convert a string to an array we can use str.charCodeAt(index), where str is our string and index is the index into the string. If the character is longer than a 16-bit integer it will be encoded as a 'surrogate pair', but it will still be extracted by charCodeAt as two 16-bit integers. Indeed, the length of the string in that latter case is the number of UTF-16 characters, not the length of the Unicode string, which will be shorter, because each surrogate pair is only 1 character. To reverse the process we can use String.fromCharCode, which converts each half of the surrogate pair separately and the character is put back together by the browser.

Here's my code. For Node.js just trim it to the class definition and add module.exports=unicode. This way you can test it in the browser easily.

<!DOCTYPE html> 
<head><script>
/**
 * A simple class to convert utf8 or utf16 byte arrays to strings etc
 * Works in Node.js OR in any browser. No dependencies.
 */
class unicode {
    /**
     * Convert a Uint8Array in UTF-8 to a Javascript string
     * @param uint8_array a Uint8Array in UTF-8
     * @return a Javascript string encoded in standard UTF-16
     */
    static utf8_to_string(uint8_array) {
        var str = "";
        for ( var i=0;i<uint8_array.byteLength;i++ ) {
            if ( uint8_array[i] < 128 )
                str += String.fromCodePoint(uint8_array[i]);
            else 
                str += '%'+uint8_array[i].toString(16);
        }
        return decodeURIComponent(str);
    }
    /**
     * Convert a javascript string to Uint8Array UTF-8. 
     * @param str the string to convert
     * @return a Uint8Array in UTF-8
     */
    static string_to_utf8(str) {
        var encoded = encodeURIComponent(str);
        // NB % sign itself encoded as %25
        var bytes = Array();
        var state = 0;
        for ( var i=0;i<encoded.length;i++ ) {
            switch ( state ) {
                case 0:    // convert characters to bytes
                    if ( encoded[i] == '%' )
                        state = 1;
                    else
                        bytes.push(encoded.codePointAt(i));
                    break;
                case 1:    // seen '%'
                    state = 2;
                    break;
                case 2:    // seen %H
                    bytes.push(parseInt(encoded.substring(i-1,i+1),16));
                    state = 0;
                    break;
            }
        }
        return new Uint8Array(bytes);
    }
    /**
     * Convert a javascript string to Uint16Array UTF-16. 
     * @param str the string to convert
     * @return a Uint16Array in UTF-16
     */
    static string_to_utf16(str) {
        var arr = new Uint16Array(str.length);
        for ( var i=0;i<str.length;i++ ) 
            arr[i] = str.charCodeAt(i);
        return arr;
    }
    /**
     * Convert a Uint16Array in UTF-16 to a Javascript string
     * @param uint16_array a Uint16Array in utf-16
     * @return a Javascript string
     */
    static utf16_to_string(uint16_array) {
        var str = "";
        for ( var i=0;i<uint16_array.length;i++ ) 
            str += String.fromCharCode(uint16_array[i]);
        return str;
    }
}
function test() {
    var u8_arr = unicode.string_to_utf8("dógs lov€ 𤭢s");
    var str = unicode.utf8_to_string(u8_arr);
    console.log(("dógs lov€ 𤭢s"==str)?"utf-8 test passed":"utf-8 test failed");
    var u16_arr = unicode.string_to_utf16("dógs lov€ 𤭢s");
    str = unicode.utf16_to_string(u16_arr);
    console.log(("dógs lov€ 𤭢s"==str)?"utf-16 test passed":"utf-16 test failed");
}
</script>
</head>
<body>
<p><input type="button" value="test" onclick="test()"> (read result in console)</p>
</body>
</html>

Wednesday, March 22, 2017

Sending mail via smtp and libcurl with tls

I was writing a daemon that monitored users on a shared system, which sent an email to them if they were using too much CPU. The problem was that in a daemon you don't want to kick off a new process by calling a commandline tool. I wanted to send it from within the daemon process itself. I could get the email address of the user from ldap easy enough, but how to actually send the email?

The only way to do it via open source software I found was to use libcurl. Unfortunately I wanted to send to an outlook server, which uses NTLM for authentication process. And libcurl at least on redhat linux uses NSS, not openssl, for the TLS (SSL) transport. And nss does not support NTLM. That meant I had to recompile the libcurl library using openssl not nss. I read lots of postings on this, none of which resolved the problem. The libcurl people simply refused to provide an openssl version of libcurl. But with the proper configuration you can rebuild libcurl. Here's what I used, after getting the source code from the libcurl github account:

./configure --with-ssl=/usr/lib64 --without-nss

Then, make, make install worked OK but installed into /usr/local/lib, not /lib/64. There was already a copy of libcurl there so I deleted it and swapped it over. with the fresh copy and voila: it worked. Here's the test code in case someone else has the same problem. The original code I found here. It has more comments than my version. You have to fill in your own credentials of course.

Monday, December 12, 2016

Why MVC will die out like the incunable and the dinosaur

There is hardly a web-framework around these days that doesn't base itself around model-view-controller, a programming paradigm as old as the hills of software engineering. It was invented at Xerox PARC in the '70s. Back then, when dinosaurs roamed those digital hills, MVC was a useful abstraction that simplified the development of desktop applications. With the invention of the Web, MVC was quickly ported to many frameworks that advocated doing everything on the server to overcome the inconsistency in Web-browsers' implementations of Javascript and CSS. So Java frameworks like Spring, and PHP frameworks like Zend adopted the MVC idea and made it core. But what is MVC exactly? "Model" is clear enough: that's the database and all interactions with it. "View" is clearly the GUI, which in Web applications is HTML/CSS/Javascript. "Controller", though, is less clear. It's the glue that arbitrates interactions between the model and the view. Without it the view would have to directly manipulate the model, or vice versa, which would be bad. But the basic problem with MVC is that it compels the developer to conflate the database code with the GUI development. That is, the model and the view are written in the same language and then the GUI result is spat out to the client who consumes it. All this is very Web 1.0 by design.

Web 2.0 and what it changed

Web 2.0 changed all that by turning the pages themselves into the web application. Without the GUI being part of the server-side code, web-applications are simply services that supply and consume data. And that data is, increasingly, being sent and received in JSON format. What need of the controller now? The funny thing is, the MVC paradigm was revised to cope with Web 2.0 as well. A sort of "web-incunable" – an incunable being a 15th century book that aped the design of manuscript books in type. In the same way, MVC is a kind of desktop application being aped in web applications – trying to do all work in a single place, when the separation of the model and the view is already implied by the Web 2.0 application model.

15th Century print incunable (manuscript lookalike)

Doing everything in one place using whatever framework we choose compels us to handle GUI-related stuff (e.g. composing HTML code and ensuring that it calls the correct Javascript functions etc) on the server. And that means that the framework will be very complex, and all those beans, taglibs, JSPs, ASPs and PHPs only exist to cope with all that functionality. And when the framework gets updated, the poor programmer has to dance the tune of the framework developer. "Oh, by the way Sir, in version 2.1.1 you have to change all your calls to the database because we changed the API. Sorry.". Or worse still: "Due to a fall in demand the product Framework X has been discontinued. Users should migrate their code to Framework Y. Sorry for any inconvenience this may cause." And the poor programmer again is compelled to do a lot of work because he/she joined all their code at the hip to that once popular framework.

Doing (almost) everything on the client

Doing almost everything on the client reduces the complex server part to a mundane: "here is data, store it" or "Get me some data about X". And the language is HTTP/JSON. The GUI code can GET and POST all it needs directly without reference to a "Controller". A web-page is increasingly a template where data is filled in asynchronously as it becomes available. We can now deal with the "business logic" where it is logically decided: in the GUI, and now your web-application will become as simple as a piece of cake: simple to develop, simple to maintain. Inconsistency in the way that browsers handle Javascript and/or CSS is not quite a thing of the past but is at a sufficiently low level to make this possible. The natural separation between model and view is now enshrined in the physical separation of server and client. And MVC will eventually go the way of the incunable and the dinosaurs that preceded it.

An extinct dinosaur

Won't that slow down the client?

I hear some skeptics cry: "But that will slow down the client, which is a thing we never do." In fact MVC slows down the client all the time by rebuilding the GUI on the server whenever the data changes. Even the humblest smart phone nowadays packs a whallop in processing power. The real limit is bandwidth. Once the application is downloaded to the client it runs there as long as it is needed, fetching only the necessary resources. "Oh but we can do AJAX also with MVC." True, but done properly that will no longer be MVC. If you really want a responsive interface then the code has to execute locally. So MVC is not only overly complex, but a resource hog too. You still hear the old mantra: "Your application should still run even if Javascript is turned off." But nowadays, turning off Javascript is like turning off the Web. No one is seriously going to do that.

Tuesday, August 16, 2016

Is Drupal 8 ready for prime time?

Drupal 8 is the latest instantiation of the popular Drupal Content Management system. Although not an entirely new product, Drupal 8 represents a significant upgrade from 7, and users hoping to upgrade their modules and themes to 8 might suffer from a steep learning curve. For module development the code is now split basically into two halves: php5 style hooks are retained in the .module file but much of the code is now moved into class definitions using php7 OOP features. While the latter is nice I wonder why half the system using the old PHP5 syntax has been retained. Either one approach or the other is desired; by choosing both the developers have overly complicated module development to the point where it will appear unattractive to would-be new Drupal developers, and old ones will be tempted to stay where they are with 7.

One of the major problems is the fragility regarding the installing/uninstalling of modules. Drupal 7 was more forgiving in that respect. You could delete a module on disk and the module would disappear from the modules list. Now any such action, even an attempt to put back a deleted module, renders the entire Drupal instance unusable. Modules can then neither be enabled nor disabled. The only option is to reinstall everything from scratch. A similar situation arises frequently whenever some mistake is made in development and a package becomes broken. This kind of time-wasting is what turns developers off. What 8 lacks is simplicity. Power doesn't have to equate with complexity at all. Sure there are some nice new features in 8 but instead of a few mandatory files in the module folder we now have multi-way dependencies, 'routes', libraries, entities, controllers, interfaces, configurations and loads of stuff that should either be fully documented or left out. Any incorrect change to one of the example programs seems likewise to break the system. Unlike in 7 you don't seem able to alter an installed module by renaming properties and methods. There's too much copying-in of original files into 'installed' data in the database, which creates fragile dependencies. After a week and a half I'm calling it quits. It's just not worth the effort.

For those of you thinking that they will eventually migrate to 8, my hunch is that 8 will never make it to the big time. As Steve Ballmer used to say: 'Developers, developers, developers, developers ...' Wagging a big stick at them and telling them that they really shouldn't be doing X won't persuade them to bat on your side.

Sunday, August 7, 2016

Extract the value of a field in a json file using just bash

I had a need to extract the value of a particular string field in a JSON file. There were a lot of files and I wanted to process them all so I could use that value in a bash script:

#!/bin/bash
# arg: filename
# return: contents of file without \n
function build_file
{
    while read line
    do
        str="$str $line"
    done < $1
    echo $str
}
# first arg: filename
# second arg: field
function extract_field
{
    text=`build_file $1`
    regex='"'$2'":\s*"([^"]*)"'
    if [[ $text =~ $regex ]]; then
        echo ${BASH_REMATCH[1]}
    else
        echo "$2 not found in $1"
    fi
}
# change this value to that of your field-name
field=docid
for f in *.json; do
    echo `extract_field $f $field`
done

You run this script in a directory where there are .json files. It then prints out the value of that field (minus the quotes) or an error message. Change the "docid" line to the name of your desired field.

Sunday, July 24, 2016

Eduroam and Ubuntu 16.04

My new laptop running Ubuntu 16.04 wouldn't connect to eduroam, but it was fine connecting at home to my modem. And my old laptop connected using Ubuntu 15.10. Why? I checked my credentials and settings. They were all correct and as recommended. I read all the blog entries by show-off geeks trying to explain what worked for them though they couldn't explain why. Yes there have been some small changes to the interface of the network manager connection editing tool, but nothing substantial has changed. Before you go and bang your head against a brick wall at least check this first:

It's easy to forget when setting up a new machine that eduroam requires userid@unisite.edu (or whatever), that is, the full location including the site name because it is a global service, not a local one. That's what got me, although a comprehensible error or even log message would have helped. As is often the case, the problem lay not in the technology but in the question itself.

That didn't work for you? Oh well. The brick wall is that way ↦.