Programmers’ Patch: 2016

Monday, December 12, 2016

Why MVC will die out like the incunable and the dinosaur

There is hardly a web-framework around these days that doesn't base itself around model-view-controller, a programming paradigm as old as the hills of software engineering. It was invented at Xerox PARC in the '70s. Back then, when dinosaurs roamed those digital hills, MVC was a useful abstraction that simplified the development of desktop applications. With the invention of the Web, MVC was quickly ported to many frameworks that advocated doing everything on the server to overcome the inconsistency in Web-browsers' implementations of Javascript and CSS. So Java frameworks like Spring, and PHP frameworks like Zend adopted the MVC idea and made it core. But what is MVC exactly? "Model" is clear enough: that's the database and all interactions with it. "View" is clearly the GUI, which in Web applications is HTML/CSS/Javascript. "Controller", though, is less clear. It's the glue that arbitrates interactions between the model and the view. Without it the view would have to directly manipulate the model, or vice versa, which would be bad. But the basic problem with MVC is that it compels the developer to conflate the database code with the GUI development. That is, the model and the view are written in the same language and then the GUI result is spat out to the client who consumes it. All this is very Web 1.0 by design.

Web 2.0 and what it changed

Web 2.0 changed all that by turning the pages themselves into the web application. Without the GUI being part of the server-side code, web-applications are simply services that supply and consume data. And that data is, increasingly, being sent and received in JSON format. What need of the controller now? The funny thing is, the MVC paradigm was revised to cope with Web 2.0 as well. A sort of "web-incunable" – an incunable being a 15th century book that aped the design of manuscript books in type. In the same way, MVC is a kind of desktop application being aped in web applications – trying to do all work in a single place, when the separation of the model and the view is already implied by the Web 2.0 application model.

15th Century print incunable (manuscript lookalike)

Doing everything in one place using whatever framework we choose compels us to handle GUI-related stuff (e.g. composing HTML code and ensuring that it calls the correct Javascript functions etc) on the server. And that means that the framework will be very complex, and all those beans, taglibs, JSPs, ASPs and PHPs only exist to cope with all that functionality. And when the framework gets updated, the poor programmer has to dance the tune of the framework developer. "Oh, by the way Sir, in version 2.1.1 you have to change all your calls to the database because we changed the API. Sorry.". Or worse still: "Due to a fall in demand the product Framework X has been discontinued. Users should migrate their code to Framework Y. Sorry for any inconvenience this may cause." And the poor programmer again is compelled to do a lot of work because he/she joined all their code at the hip to that once popular framework.

Doing (almost) everything on the client

Doing almost everything on the client reduces the complex server part to a mundane: "here is data, store it" or "Get me some data about X". And the language is HTTP/JSON. The GUI code can GET and POST all it needs directly without reference to a "Controller". A web-page is increasingly a template where data is filled in asynchronously as it becomes available. We can now deal with the "business logic" where it is logically decided: in the GUI, and now your web-application will become as simple as a piece of cake: simple to develop, simple to maintain. Inconsistency in the way that browsers handle Javascript and/or CSS is not quite a thing of the past but is at a sufficiently low level to make this possible. The natural separation between model and view is now enshrined in the physical separation of server and client. And MVC will eventually go the way of the incunable and the dinosaurs that preceded it.

An extinct dinosaur

Won't that slow down the client?

I hear some skeptics cry: "But that will slow down the client, which is a thing we never do." In fact MVC slows down the client all the time by rebuilding the GUI on the server whenever the data changes. Even the humblest smart phone nowadays packs a whallop in processing power. The real limit is bandwidth. Once the application is downloaded to the client it runs there as long as it is needed, fetching only the necessary resources. "Oh but we can do AJAX also with MVC." True, but done properly that will no longer be MVC. If you really want a responsive interface then the code has to execute locally. So MVC is not only overly complex, but a resource hog too. You still hear the old mantra: "Your application should still run even if Javascript is turned off." But nowadays, turning off Javascript is like turning off the Web. No one is seriously going to do that.

Tuesday, August 16, 2016

Is Drupal 8 ready for prime time?

Drupal 8 is the latest instantiation of the popular Drupal Content Management system. Although not an entirely new product, Drupal 8 represents a significant upgrade from 7, and users hoping to upgrade their modules and themes to 8 might suffer from a steep learning curve. For module development the code is now split basically into two halves: php5 style hooks are retained in the .module file but much of the code is now moved into class definitions using php7 OOP features. While the latter is nice I wonder why half the system using the old PHP5 syntax has been retained. Either one approach or the other is desired; by choosing both the developers have overly complicated module development to the point where it will appear unattractive to would-be new Drupal developers, and old ones will be tempted to stay where they are with 7.

One of the major problems is the fragility regarding the installing/uninstalling of modules. Drupal 7 was more forgiving in that respect. You could delete a module on disk and the module would disappear from the modules list. Now any such action, even an attempt to put back a deleted module, renders the entire Drupal instance unusable. Modules can then neither be enabled nor disabled. The only option is to reinstall everything from scratch. A similar situation arises frequently whenever some mistake is made in development and a package becomes broken. This kind of time-wasting is what turns developers off. What 8 lacks is simplicity. Power doesn't have to equate with complexity at all. Sure there are some nice new features in 8 but instead of a few mandatory files in the module folder we now have multi-way dependencies, 'routes', libraries, entities, controllers, interfaces, configurations and loads of stuff that should either be fully documented or left out. Any incorrect change to one of the example programs seems likewise to break the system. Unlike in 7 you don't seem able to alter an installed module by renaming properties and methods. There's too much copying-in of original files into 'installed' data in the database, which creates fragile dependencies. After a week and a half I'm calling it quits. It's just not worth the effort.

For those of you thinking that they will eventually migrate to 8, my hunch is that 8 will never make it to the big time. As Steve Ballmer used to say: 'Developers, developers, developers, developers ...' Wagging a big stick at them and telling them that they really shouldn't be doing X won't persuade them to bat on your side.

Sunday, August 7, 2016

Extract the value of a field in a json file using just bash

I had a need to extract the value of a particular string field in a JSON file. There were a lot of files and I wanted to process them all so I could use that value in a bash script:

#!/bin/bash
# arg: filename
# return: contents of file without \n
function build_file
{
    while read line
    do
        str="$str $line"
    done < $1
    echo $str
}
# first arg: filename
# second arg: field
function extract_field
{
    text=`build_file $1`
    regex='"'$2'":\s*"([^"]*)"'
    if [[ $text =~ $regex ]]; then
        echo ${BASH_REMATCH[1]}
    else
        echo "$2 not found in $1"
    fi
}
# change this value to that of your field-name
field=docid
for f in *.json; do
    echo `extract_field $f $field`
done

You run this script in a directory where there are .json files. It then prints out the value of that field (minus the quotes) or an error message. Change the "docid" line to the name of your desired field.

Sunday, July 24, 2016

Eduroam and Ubuntu 16.04

My new laptop running Ubuntu 16.04 wouldn't connect to eduroam, but it was fine connecting at home to my modem. And my old laptop connected using Ubuntu 15.10. Why? I checked my credentials and settings. They were all correct and as recommended. I read all the blog entries by show-off geeks trying to explain what worked for them though they couldn't explain why. Yes there have been some small changes to the interface of the network manager connection editing tool, but nothing substantial has changed. Before you go and bang your head against a brick wall at least check this first:

It's easy to forget when setting up a new machine that eduroam requires userid@unisite.edu (or whatever), that is, the full location including the site name because it is a global service, not a local one. That's what got me, although a comprehensible error or even log message would have helped. As is often the case, the problem lay not in the technology but in the question itself.

That didn't work for you? Oh well. The brick wall is that way ↦.

Friday, July 22, 2016

Ubuntu 16.04 on HP Envy 13 d107tu

One of the problems about buying a new Linux laptop is that the latest machines are not yet tested for Linux compatibility. Will the wifi card and the bluetooth work? Do the function keys do their respective jobs? Do suspend and hibernate work correctly (and waking up from them)? Does the backlit keyboard work? Buying always involves risk, so this time I thought I would document the experience fully to benefit others, as my way of thanking them for what they have already done for me.

Choosing a laptop for Linux

Installing Linux (in my case Ubuntu) on a laptop begins with the choice of model and make. You need to analyse your requirements. What can you afford? What features are must-haves? etc. In my case I whittled down my list to a set of requirements I was not prepared to compromise on:

Low power usage. My last laptop spent all the time running the fan at full blast. It began to smell and smoke and was very loud. And yet the CPU usage stayed at around 10%. What it would do at close to 100% I have no idea. The newest laptop models sport the Intel 6th generation chips that run on as little as 15W. The whole laptop can run on just 45W. I think that is cool, in every sense of the word. One of the drawbacks of high power is the way it goes through the battery so quickly. At 120W it will last around 1-2 hours. This means that it quickly reaches the end of its lifespan of 400 recharge cycles in around 100 days. 45W and 10 hours puts a lot less stress on the battery.
A backlit keyboard. This is not a luxury for those of us who wake up in the middle of the night and try to catch up on email from overseas, which I do all the time. Or watch the cricket. I hate turning on the room light at 2:00AM to do that. Laptop manufacturers mostly take the view that a backlight is a luxury, even though it only costs them $5 to install. ASUS for example, have a nice range of elegant laptops that would suit my purposes but the mid-range machines lack a backlight. ASUS designers take note! Apple has made a lot of money from this observation.
A FHD (1920x1080) screen. I spend too much time in Netbeans IDE to settle for anything less. HD has insufficient pixels to allow me to show the subwindows like projects, files, variables, debug and the source code all at the same time. I realise that a lot of ordinary users only need HD (1366x768) but FHD is becoming the new baseline. An IPS screen to allow better viewing angles would also be preferable but is not essential.
Compatibility with Linux. I don't have the time to install obscure drivers and maintain them every time I update the kernel. So I'd really like as much as possible to work out of the box.
A solid-state drive. It is so much faster to boot up and consumes less power that I couldn't bear going back to a boring Winchester type drive. And I only use around 64GB, so 128GB would be enough, and 256GB would be perfect.

This seemed a simple list but very few laptops under AUS$2000 fit the bill. The Dell XPS 13 had everything except that it has a fan, and a noisy one at that. The ASUS Zenbook UX303LA was nice but lacked a backlit keyboard, and reportedly didn't support Linux well. The Toshiba Tecra Z40-C had everything too, but it was not available in my country (Australia) yet, or not with a backlit keyboard. The Apple 13 inch Macbook Air's screen lacked FHD and had a weak processor. So I turned to HP and was surprised by the specifications of their latest range. I selected the HP Envy 13-d107TU, which I had delivered (almost) to my door for AUS$1550.

What's in the box

The box is Spartan. Inside I found only two hardware items: a tiny power adaptor and its cable, and the laptop. The left hand side has a wide SD card slot and a single USB port, also the combination speaker/microphone jack and a security lock point. The right hand side has HDMI, power and two more USBs, and a power-on light. The power jack goes in firmly and deeply, reassuring me it will not break easily. The power cable is thick and sturdy, with cable strains at each end (Apple take note!). The case is made of aluminium in two pieces that fit closely together with the seam only visible on the underside. The feet are reduced to two large rubber feet at the back and a long one at the front. Hopefully this design will prevent them falling off. The screen hinges are recessed and I'm not sure how robust this design is. My last two laptops suffered from hinge break, and also my son's Samsung. The screen is glossy, although the technical description said "anti-glare". I guess they mean anti-glare coating on an otherwise glossy surface. I prefer matte screens. The screen is set in by about a millimetre so that when closed it doesn't rub against greasy keys. When opened, the bottom of the screen serves as a prop to raise the back. These are good design touches, but the overall appearance resembles a bit too closely the Macbook Air, although this is a good design to copy. The keyboard backlight comes on through a function key, saving the battery when it is not needed. Caps lock and sound mute have separate extra LEDs for status. The keyboard travel seems generous and the spacing of the keys is not cramped. Up and down keys are half-size and a bit awkward though. The keys seem to be made of metal with the characters as see-though grey plastic. This makes it easy to read even in full light and the letters presumably won't ever rub off with age or solvent.

Trying Ubuntu 16.04.1

The guarantee says only that it covers faults in manufacture of hardware for one year, but not against damage of any kind. I presume they mean normal wear and tear. For example, if opening the lid like 2000 times caused the screen to fall off then that would be my fault, not a flaw in their design. But it doesn't say explicitly that installing Linux would invalidate the warranty, only that if software damaged the hardware somehow then they wouldn't cover it. I think Linux is pretty safe in that regard, and nowadays they usually blame you for whatever goes wrong within the warranty period so they can charge you regardless, so I decided to proceed with the installation.

Since the laptop boots automatically into Windows I had to restart a few times before I discovered that holding down the F10 key enters the system setup. The boot setup is hidden in the "System Configuration" menu under "Boot options". I didn't disable UEFI, just chose "USB Diskette on Key/USB Hard Disk" in the UEFI Boot order and moved it up to the top of the list with the F6 key. One horror: the system configuration lists the CPU fan as "always on". Yikes! I can hardly hear it though. I reboot, saving the configuration. That got me to the Try Ubuntu screen. I selected it and reached the Ubuntu desktop in 26 seconds.

The trackpad works well. The left click is deep and positive, and the movement of the cursor is smooth. Right clicking works, but the menu came up once when not clicking it.

The network came on without a struggle. This seems stable and I doubt it will be a problem.

The backlight on the keyboard was on at the start but it turned off and back on again when I pressed F5 (the correct key). Hooray!

Airplane mode seems to work. At least it cut me off the network but it came back on when I pressed it again.

The sound function keys work fine.

Even the print screen key works!

The menu coming up on the desktop by accident was because the default setting is "tap to click". I turned that off in Mouse and Trackpad (System Settings) and it is now fine. But now if I let my thumb hover over the bottom of the trackpad in anticipation of a click and then touch it even a little it scrolls or jumps so I open the rubbish bin. Previously I had a trackpad with separate buttons, but now the "buttons" are concealed within the trackpad itself. I think this is just something I have to get used to.

Brightness function keys work, and there is no shortage of backlight power. The brightness setting is preserved after reboot. Screen reading at sharp angles is excellent. I begin to see the advantages of IPS.

Suspend and waking up from it works (by pressing the start button as usual). Waking up from ordinary screen sleep also works by pressing any key. The network still works on waking from suspend, even after several hours, but I had to save my network settings before it would work.

The help function key does not bring up ubuntu help, but perhaps this is only because this is a trial version. I see no help in the desktop menu.

Now for the acid test: does it support bluetooth? I used a pair of Bluedio Turbine headphones which worked on my previous laptop (with the Intel 7260). This also has an Intel wifi card so it should work. After setting the headphones into "ready to pair" mode it picks it up and allows me to save the headphone configuration. I play a youtube video and hear high fidelity sound without problems.

Installation

The installation process took only a couple of minutes. Is installing Windows this quick? I think not. My recollection is that it requires you to reboot upteen times before it will work. I decided not to install third party graphics drivers as the ones on the trial screen worked fine. Also I chose to completely erase the hard disk. Since I never use Windows this saves me a lot of space. On reboot it gave me a quick message in the top left hand corner I couldn't read in 1/10th of a second, but all seems fine. F1 still does not work.

One problem though is that recharging seems to be slow. After more than an hour it went from 35 to 68%. So it will take several hours for a full charge. I guess though that this reflects the low power of the charger compared to more "powerful" machines.

I can only estimate battery life but in just one hour it went down to 83% although that was with the keyboard backlight on most of the time so I guess about 6-8 hours under normal work load, not the 10 they claim. But every manufacturer does this.

Verdict

This laptop works so well with Linux they should sell it preinstalled and save us all from the Microsoft tax. Compared to my previously owned Gigabyte P15-F and Clevo W550EU it supports Linux better and appears to be a more robust machine. Only time will tell if that last bit is true.

Two weeks on...

I have gotten used to the trackpad, so my initial complaints about it proved groundless. So far I have no complaints. Let's wait until something falls off, shall we?

Three months on...

The front rubber foot, which runs along the entire front of the base underneath started to come off. Due to a design flaw every time you pick up the laptop you must hold it by this rubber strip, which is only fixed to the body by some feeble contact tape. I had to re-glue it on with superglue and it now seems fine. Also the right hinge is getting a bit creaky and the fan now makes a tiny bit more noise. It's still barely audible but I know these are both things that can later go wrong.

Also the network drops out sometimes when I wake it from sleep. It never used to do this so it appears to be a Linux problem. I have to reboot to reconnect to wifi. It's a bit annoying but not crucial.

Friday, June 10, 2016

Copy to clipboard in javascript

A common requirement in Web applications is to copy some text to the clipboard. For security reasons this has traditionally been hard to do. The preferred solution is to call document.execCommand('copy') when the text of an text input or textarea is selected. This is supported in most modern browsers. Since I couldn't get this to work in jQuery the solution here uses pure javascript instead. In cases where the browser doesn't support execCommand it will return false or raise an exception. In either case the fallback is to prompt the user to copy the text in question manually. This can be done easily on an iPad by tapping once to get "select all" and then tap again to copy. A lot of solutions suggest "ctrl-c" but on tablets this is usually not possible.

This method creates an input field momentarily, copies and then removes it. In the case of failure it will hang around until the user clicks "OK' or "cancel" in the prompt dialog. If that's uncosmetic then just style the text field to appear off-screen. For example, add to the CSS for your page: input.secret {position:absolute;left:-1000px}. Unfortunately you can't hie it completely because otherwise select won't work and the document.execCommand('copy') will fail silently.

Friday, February 5, 2016

Why Drupal sucks

I use "sucks" here as a terminus technicus meaning "doesn't work in the way desired or expected".

Drupal "sucks" because it gets upgraded every couple of months and you have to upgrade. Now that's fine if you are using it in a business context where there are constantly paid employees to do that. But on a free or academic site, once set up it will be only occasionally updated. And when that new release comes out it effectively advertises the flaws to all the hackers of the universe: "Hey guys! You can hack Drupal sites with this neat exploit if the Drupal version is current-version minus 1". And of course they do. So you have to upgrade, and pronto, but you can't afford to. So you get hacked. Then you have to strip down your entire server because you don't know what the hackers did. And that really sucks.

Now all this would be gone if only the Drupal people would provide an automatic or at least easy upgrade path for new releases. Instead you have to take your site offline, then back up your database (why? Is the new release buggy?), then download the new server core, delete all your old stuff except the sites directory – and don't forget to cut and paste all those arcane rewrite rules from the old to the new .htaccess file – then install the new server, copy back the old sites, run the update script, take it back online and hope and pray that all your modules still work. They might not, especially that one you can't do without, whose author didn't bother to patch either. All this takes about half an hour, and you have to do it every 2 months on every damn Drupal site you maintain. So why not automate it? Is it really so hard? And don't tell me drush does all that because it only automates a couple of the easiest steps, and then you also have to worry about whether drush worked. I notice that Drupal 8 doesn't have an automatic update feature either. So that's why I think Drupal sucks. You're better off writing your own CMS. At least that way no one will know how to hack it.

mod_rewrite: redirect to another directory with additional parameters

Mod_rewrite is one of the hardest pieces of apache to get right: the documentation is awful, there is no way to test it and the syntax is arcane. All I wanted to do was redirect URLs from one directory to another, copying the existing GET parameters and adding some more. But there are a number of tricks to get it right. So my original URLs look like this:

http://me.org/mycms/olddir?param1=foo

And I wanted to redirect it to:

http://me.org/mycms/newdir?param1=foo&param2=poo&param3=roo

An additional requirement was that I wanted to add the rule to .htaccess in the CMS-directory, not root, since that's where the rules applied. The following rule worked in the "mycms" directory:


  RewriteEngine on
  RewriteRule ^olddir$ /mycms/newdir?param2=poo&param3=roo  [L,R,QSA]
  ... (other rules)

Since I was using Drupal, the <IfModule... was already present in the existing .htaccess file. So all I needed to do was add the correct RewriteRule. To explain what I understand by this: "^" means a URL starting with "mydir" since we are already in mycms (that's where the .htaccess file is), and ending ($) there. So only parameters follow. Now redirect to /mycms/newdir. If you don't start with "/" it will prepend the entire server path. Next, add your desired parameters, and finally append the existing ones via the QSA flag, which means "query string append". The "R" flag is needed because this is a redirect. The "L" flag is needed because otherwise other rules might be applied and this is the "last" rule we need here. I'm not sure all this is correct, but it is simple and it works. So I'm sticking with it.

Tuesday, January 19, 2016

Passing parameters to javascript via drupal_add_js

Nowadays most Web programs contain a significant Javascript (or JQuery) component but content management systems are still run in PHP, Java, python etc. In the case of the popular CMS Drupal, that language is PHP, and it would be rather nice if we had a reliable mechanism for passing arguments to a javascript file so that information would be available as parameters to customise its functionality. Unfortunately there is no such mechanism. Various hacks have been proposed and used within Drupal itself, so one can implant something like <script src="myscript.js?arg1=foo&arg2=bar"></script> and expect it to work, with the aid of a script that locates the script element, strips out the arguments and then passes them in javascript to a javascript function within myscript.js. Here's an example function that does it:

Using drupal_add_js

Another technique is to install a javascript file before the page is fully loaded so it will get executed when loading is complete. For this purpose the function drupal_add_js comes into play. It has to be called within a hook_init or hook_preprocess_page function. But since parameters to javascript files are not really allowed, this function escapes any content like '&', '=' or '?' into %26, %3D, %3F, so turning a request for a javascript file into something that is simply not there.

A workaround I used successfully is to precede the call to drupal_add_js('myscript.js','file') with another call to drupal_add_js(...,'inline');. In this way I managed to store the parameters in browser local storage and retrieve them similarly. e.g. for the inline call I used: drupal_add_js("localStorage.setItem('module_params', 'docid=english/harpur/h080&target=tabs-content')",'inline'); And to retrieve these values a function similar to the one above works just fine:

Sunday, January 10, 2016

Compressing an unsorted list of integers

You see a lot on offer for compressing sorted lists of integers, but not so much on the rather uncomfortable question of how to compress a list of unsorted numbers. So why would you want to do that? Surely all one needs to do is sort the list then compress it? The problem is that in many cases the order of the integers represents information that sorting would destroy. Here is my case in point.

Actual hit-positions in a search engine

In a search-engine you have to make a list of documents in which a term is found. Each document is assigned an identifier. So say we had a list of documents in which the word 'dog' occurs. It might be found in 7 documents: 0,1,1,2,2,3,4,6,11,21,21. We can use an algorithm like FastPFOR because the sorted list can be converted into a list of deltas, or the differences between successive entries. These will typically be much shorter numbers than the actual values. In my case an array of 100 document identifiers compressed down to just 13 integers. Cool. But what if you wanted to store the locations in those documents where the word 'dog' occurred as well? This would bloat the index considerably, since 'dog' might occur 100 times in a single document. I could sort and compress it the same way, but quite often it would be really short, maybe only one entry. Then the compression algorithm would actually increase the list size by three or four-fold, since the overhead for a compressed list adds a number of ints to the start, and – this is the real killer – one compressed list would be needed for each document the word was found in. So trying to compress it the conventional way would first, probably increase the overall index size, and second, you would need to maintain a lot of compressed lists.

The solution

Ideally, we would like just one list of word-positions for each word, just as we had one list of document identifiers for each word. But such a list would have to be unsorted, because if we sort it we would lose the information about which documents those positions refer to. Fortunately, most positions are quite small. They can't be greater then the length of the longest document the word is found in. Or if it was found in only a few documents, or always at the start, the values would be even smaller. Lets say that all the values are less than some amount like 127. (Convenient, huh?) Then the list could be stored in 8-bit integers with one 32-bit integer at the start to say how many bytes there were per integer. Or if 16 bits were needed, then we could use 2-byte ints, or 3 bytes for 24 bits etc. Worst case is when the documents are bigger than 4MB or so, when we will need 32-bit integers. But that's rare.

So the strategy is pretty simple. Scan all the numbers first to see what is the biggest (or smallest negative number) and work out how many bits we will need. I kind of cheat by rounding this up to the nearest 8 bits, but if you're interested you can refine it to have an arbitrary number of bits. But you won't gain much in compression and you will lose something in speed. Here's my Java code. It just uses ByteBuffer to build arbitrary-sized ints up to 32 bits, in 8-bit hops, and then stores the list as an array of 32-bit ints – compressed, of course. So typically what you'd expect is a 25%-50% reduction in the array size and in some cases 75%. Compressing it further by any significant amount seems to be impossible, given the near-random nature of the data.

I offer no guarantees that this works for all cases etc. But it is freeware. Do what you like with it.

/**
 * Compress/decompress an array of one to four-byte ints
 * @author desmond
 */
public class UnsortedIntCompressor 
{
    /**
     * Find out how many bits we'll need to represent this array of ints
     * @param maxValue the maximum positive value
     * @param minValue the minimum negative value or 0
     * @return the number of bits needed, rounded to nearest 8, max 32
     */
    private static int bitsNeeded( int maxValue, int minValue )
    {
        int minBits = 0;
        int maxBits = 32;
        if ( minValue == 0 )
            minBits = 0;
        else if ( minValue >= -128 )
            minBits = 8;
        else if ( minValue >= -32768 )
            minBits = 16;
        else if ( minValue >= -8388608 )
            minBits = 24;
        else
            minBits = 32;
        if ( maxValue <= 127 )
            maxBits = 8;
        else if ( maxValue < 32767 )
            maxBits = 16;
        else if ( maxValue < 8388607 )
            maxBits = 24;
        else
            maxBits = 32;
        return (maxBits>=minBits)?maxBits:minBits;
    }
    /**
     * Compress an array of ints
     * @param array an array of unsorted ints to compress
     * @return the compressed array
     */
    public static int[] compress( int[] array ) 
    {
        int maxValue = 0;
        int minValue = 0;
        for ( int i=0;i<array.length;i++ )
        {
            int value = array[i];
            if ( value > maxValue )
                maxValue = array[i];
            else if ( value < minValue )
                minValue = value;
        }
        int numBits = bitsNeeded( maxValue, minValue );
        // the first entry will store the number of bits/8
        int numBytes = numBits/8;
        // plus first entry
        int len = array.length*numBytes + 4;
        // 3 is the maximum overrun
        ByteBuffer buf = ByteBuffer.allocate(len+3);
        buf.putInt(numBytes);
        for ( int i=0,j=0;i<array.length;i++ )
        {
            int value = array[i];
            if ( numBits>24 )
                buf.put((byte)(value>>>24));
            if ( numBits > 16 )
                buf.put((byte)(value>>>16));
            if ( numBits > 8 )
                buf.put((byte)(value>>>8));
            buf.put((byte)value);
        }
        int resLen = (array.length*numBytes)/4 + 1;
        if ( (array.length*numBytes)%4 != 0 )
            resLen++;
        int[] res = new int[resLen];
        res[0] = numBytes;
        for ( int i=1;i<resLen;i++ )
        {
            res[i]= buf.getInt(i*4);
        }
        return res;
    }
    /**
     * Turn a reduced-byte integer array into a 4-byte integer array
     * @param array the reduced-byte int array previously compressed
     * @return an array of ints the same size as before it was compressed
     */
    public static int[] decompress( int[] array ) throws NumberFormatException
    {
        int numBytes = array[0];
        if ( numBytes < 1 || numBytes > 4 )
            throw new NumberFormatException("numBytes must be between 1 and 4");
        int len = ((array.length-1)*4)/numBytes;
        int rem = ((array.length-1)*4)%numBytes;
        if ( rem == 0 )
        {
            int mask = 0xFF<<((numBytes-1)*8);
            if ( (array[array.length-1]&mask) == 0 )
                len--;
        }
        ByteBuffer buf = ByteBuffer.allocate((array.length+1)*4);
        for ( int i=1;i<array.length;i++ )
            buf.putInt(array[i]);
        int[] res = new int[len];
        for ( int i=0;i<len;i++ )
        {
            res[i]= buf.getInt(i*numBytes);
            res[i] = res[i] >> 8*(4-numBytes);
        }
        return res;
    }
    public static void main( String[] args )
    {
        Random r = new Random();
        int[] uncompressed = new int[100];
        // adjust the "23" here to 7 or 31 for testing other int sizes
        int limit = (int)Math.pow(2,23);
        for ( int i=0;i<100;i++ )
        {
            uncompressed[i] = r.nextInt(limit);
        }
        int[] compressed = null;
        try
        {
            compressed = UnsortedIntCompressor.compress(uncompressed);
            System.out.println("Length of compressed="+compressed.length);
            int[] decompressed = UnsortedIntCompressor.decompress(compressed);
            if ( decompressed.length != 100 )
                System.out.println("decompression failed. length="+decompressed.length);
            for ( int i=0;i<100;i++ )
            {
                if ( decompressed[i] != uncompressed[i] )
                    System.out.println("uncompressed ("+uncompressed[i]
                        +") not equal to decompressed ("+decompressed[i]+")");
            }
        }
        catch ( NumberFormatException nfe )
        {
            nfe.printStackTrace(System.out);
        }
    }
}