(c) 2018 by Darek Mihocka, founder, Emulators.com.
June 18 2018
Welcome back to the 40th blog post in this No Execute series, my 11+ year long rant about on all things that bother me and interest me about CPUs and emulators. I am going to continue from the previous post on alternative 64-bit instruction sets such as RISC-V and ARM64, dynamic binary translation (a.k.a. "jitting" a.k.a. "dynamic optimization"), speculation about what Transmeta and nVidia might be up to, and repeating my long time wish of "emulation everywhere". Today I want to specifically continue on the topics of Atari 800 and x86 emulation and ARM64 - appropriate given that exactly 10 years ago this week I was in Beijing China where I co-presented the very "emulation everywhere" themed Bochs paper with Stanislav Shwartsman. As you will see some pretty cool x86 emulation technology has arrived in Windows 10 on ARM64.
I am also going to try something new with this post: demo videos!
I will begin with this tongue-in-cheek teaser video, shot on location in Utah and Las Vegas, to poke fun at my three-year long writer's block since the previous post:
In all seriousness, in 2015 I was too optimistic about the pace of change, excepting technology to just transform overnight. But as far as I can tell nVidia never did develop a low-power x86-compatible device based on the Transmeta technology they'd licensed for their Denver ARM64 processor. So much for my Windows apps on Android tablets theory, although I do have to say their TK1 ARM32 and TX1 ARM64 boards are fun Ubuntu Linux boards to experiment with and I still have a theory (which I will keep to myself) as to what nVidia is really up to. Intel hasn't really followed up with much at all since their last truly great leap in architecture with "Haswell" in 2013. Later in 2015 I did custom build a new Core i7-6600 "Skylake" desktop machine which runs great but is virtually identical in features and performance as the Core i7 "Haswell" desktop machine I had built and wrote about in 2013. Desktop and mobile Core i5/i7 processors have been stalled at 14nm for 4 years now and and AVX2 user-mode instruction going on 5 years now. AMD since I first pointed out in 2011 has their feet both in x86 and ARM64 chips, but I don't see any consumer devices from them taking advantage of that. And when Windows 10 did come out in the summer of 2015 there was no support for Surface RT device or much a push for ARM. So I had this touch-capable tablet-ready ARM port upgrade of my Atari 800 emulator, Xformer 10, with no devices for people to really run it on.
I shelved Xformer 10 for the time being and after months of sitting on the couch I poked around talking to a few different tech companies. In the end I was intrigued by the apparent changes within the post-Ballmer Microsoft (the public embrace of Git, the open source projects, the great things I was hearing from past colleagues) that I returned there later in 2015 to resume work on an old project that I worked on at Microsoft Research. I have referred to this project before by the old codenames "Nirvana" and "iDNA", but which today is officially called "Time Travel Debugging" or "TTD". It is amazing technology that I have been involved with since 2002 that is far more powerful than traditional "stepping backwards in the debugger" reverse execution. Microsoft posted the first public preview of Time Travel Debugging 9 months ago: https://blogs.windows.com/buildingapps/2017/09/27/time-travel-debugging-now-available-windbg-preview/ and I will circle back in an upcoming post to give you a detailed look at the amazing things you can do with it.
How an exhibit at the Vintage Computer Festival 2018 in Seattle turned into a major Xformer update
I pulled the dust off Xformer 10 after I received an invitation to join long time 8-bit Atarians Bill Kendrick, Michael Glaser, and Kevin Savetz at their Atari 8-bit booth at the 2018 Vintage Computer Festival, VCF for short, which was being held here in Seattle at the Living Computer Museum. Of course I said yes since how often does one get to take their old Atari 800 hardware out to a museum and play Star Raiders all day in the name of education? :-)
Danny joined the booth as well and we all dragged out our Atari 400, 800, 1200XL, and 130XE computers, 810 and 1050 drives, and stacks of game cartridges to demo. VCF turned out to be a surprisingly fun event, just like those old Atari swap meets. I am told VCF even set a new one-day attendance record at the Living Computer Museum. Tons of people come out look at old Radio Shack CoCo's, Unix machines, Sun pizza box servers, and my favorite, the MOnSter 6502 - a complete MOS 6502 CPU clone soldered together from discrete parts!
While Danny, Kevin, and Bill demoed games and Atari BASIC on real 1980's Atari hardware (with some help from modern flat screen monitors) I demoed Xformer running emulated Atari 800 virtual macines on a Windows tablet. With the fixes I'd already put in for Surface RT to support things like screen resolution changes and screen rotation support, touch screen support to act as a joystick fire button, I thought this is ready to ship, a quick little blog posting and a pointer to the GitHub repository of the Xformer 10 branch and we're done. Not so fast! :-)
Danny had ideas for a few improvements. He pointed out at the star field in Star Raiders was not as random on Xformer as on a real Atari 800. My bad, I didn't update the random number generator often enough for performance reasons. Remember, I originally wrote this code 1986 to run on 8 MHz computers. It was hand-coded assembly, cycle counted, to barely squeeze out enough performance to emulate a 1 MHz Apple II or Atari 800 at normal speed. So I took a few shortcuts, like, well, not doing cycle-accurate 17-bit polynomial random number generation. It hasn't really been an issue for 30 years but when you are dealing with a purist and the emulator is off a little, it matters.
In my defense, "ST Xformer" was the first Atari 800 emulator back when I first started writing it in 1986, followed soon by my port to MS-DOS called "PC Xformer", and so needed to run on really weak hardware. The kids who waited until CPUs got much faster to start writing similar emulators had the advantage of being able to design cycle accuracy into their code from day one and burn many more cycles doing it. A typical PC today runs at over 250 times the clock speed of the 8 MHz machines I targeted, plus uses CPU cores with out-of-order pipelines and large caches. The end result is if you are writing an Apple II or Atari 800 or Commodore 64 emulator for today's PCs, you can actually burn about 1000 times the CPU power to do that emulation and still run as fast as the original 1980's era machines. And of course, LOTS of emulators do this, burning gigahertz of CPU cycles to emulate a few megahertz. That REALLY annoys me, which is one reason I have resisted the urge to add cycle accuracy to Xformer even if it meant a few games don't run correctly.
But all the Xformer code has been open source for years anyway so I invited Danny to branch off the Xformer 10 code and make his fixes but please do not slow down the emulation! A few percent slowdown, sure, but not 10x slowdown or worse.
But then another idea occurred to us. We noticed that on my Haswell and Skylake Core i7 desktops just running a single instance of Xformer showed 0% CPU usage in Windows Task Manager. Obviously it is not saying _zero_ zero, but it meant that under 1% of a quad-core 4.4 GHz Intel processor's power was needed to emulate a full speed Atari 800. Flipping that around one could ask: how many Atari 800 virtual machines could be run on a single CPU without running slower than a real Atari 800? That automatically then becomes a benchmark to measure performance.
Task Manager's number implied that we should be able to emulate 100 Atari virtual machines at once without a slowdown. As it happens, after the doing the math and realizing that a 4K display (3840x2160 pixels) can be neatly subdivided into an 11 by 9 grid of 352x240 sized Atari 800 screens, "Tiled Mode" was born! Tiled mode crams and runs as many Atari 800 virtual machines simultaneously as the monitor can fit! In this case about a dozen tiles on the ARM64 tablet but the full 11 by 9 grid = 99 tiles on a 4K television:
Sure enough on a Core i7 machine feeding a 4K display we can in fact run 99 Atari 800 virtual machines simultaneously at normal Atari 800 speed in real time.
But I am jumping ahead of myself, I will let Danny explain his improvements in his own words in our brand new Xformer 10 demo video (clip the Play button to watch):
Late in 2016 Microsoft made a bold announcement with Qualcomm introducing new devices for ARM64 that would support existing legacy x86 Win32 applications via emulation: https://www.anandtech.com/show/10889/microsoft-and-qualcomm-bring-windows-10-to-snapdragon-processors. This was a very "WOW!" moment for, like "wow, they're doing to this!" and "wow, this will be fun to experiment with!". This was the kind of announcement I had expected from nVidia a year earlier - an "emulation everywhere" device. To run existing x86 Win32 desktop apps, it means that this device would load arbitrary x86 code, not a closed Store-only device like a Surface RT, an iPad, or a typical Android tablet. This therefore implied that my existing Atari and Mac emulators would just work unmodified, allowing Atari 800 games, Atari ST apps, Apple Macintosh binaries, as well as x86 Windows applications to all run on an ARM64 device.
From an engineering point of view alone this was fascinating and a long overdue return of x86 emulation to Windows. Recall that at one time in the mid-1990's Windows NT has been ported and was running on not just on x86 processors but also on MIPS, Alpha, PowerPC, and later Itanium. The NT kernel (unlike the MS-DOS based Windows kernel) was designed to be portable to any 32-bit or 64-bit processor and with technology like FX!32 and IA32-EL you could get some amount of x86 backward compatibility. So this concept did exist, but for various reasons (size, cost, speed) the market thinned and AMD and Intel's x86 chips became the sole survivors on desktop PCs by Windows 7 days.
PowerPC (which I've used since the early early days of PowerPC 601 based Macintosh computers) continued to exist on other platforms. Back in 2010 in my Part 32 post on PowerPC, I wrote about what a great architecture PowerPC was and how sad that Sony was removing the PowerPC Linux support from the Playstation 3, then Fedora dropped PowerPC as a first-class distro. Later of course we know both Playstation and Xbox both went to x86 for their next generations, and Apple had already dropped PowerPC from Macs a few years earlier, so PowerPC for various reasons also fell out of favor.
In the very previous Part 31 post I pondered that maybe ARM might be alternative to PowerPC, that ARM devices could be the "emulation everywhere" platform. To quote a few things I said 8 years ago:
"Apple iPad makes such a scenario even more compelling... a small thin device with mostly always-on Internet connectivity such as the iPad... to provide access to cloud instances while still having the juice and horsepower to run applications locally if so desired... QEMU could hold the key to providing a non-VT alternative to virtualization on mainstream PCs."
Back at the 2002 Consumer Electronics Show, Bill Gates showed the prototype "Mira", a wireless ARM-based tablet device that acted as a remote desktop tethered to a larger PC). The hardware industry made other attempts at such devices in various form factors afterwards. Some of you may recall my old posts in 2004 about the Sony VAIO U750P pocket Windows XP tablet that I brought back from a trip to Japan once. The U750P features a Pentium M x86 processor running full Windows XP desktop release in a handheld touch screen device. Then by 2008 there was the wave of "netbooks" such as Acer Aspire and the ASUS EEE PC pictured here which was a traditional keyboard-based clamshell, but just very small, great for maps and email but slow compared to a desktop:
In 2010 the iPad was really the device that hit the right form factor, inspiring an endless series of similar Android tablets and the Surface RT. I remember writing about buying the Thinkpad Android tablet 7 years ago and seeing the potential of what such a device could bring. But unfortunately the iPad, the Surface RT, the endless Android tablets even today still cannot run arbitrary code such as a Linux distro or an unlocked desktop version of Windows. You had to use their built-in apps or purchase store apps. And many of these earlier devices lacked 3G or 4G connectivity, so you had to tether to a Wifi hotspot.
But key to me: none of those devices could cannot act as their own development environment, as they are not standalone PCs. My existing software will not just work on them. The Surface RT for example did not run Visual Studio, so you had to develop the ARM apps on another PC then copy them over - the cell phone development model basically. And from what I understand certain stores do not even permit emulators to be submitted, so trying to even develop an Atari ST or Macintosh emulator on an iPad is pretty much a non-starter.
Certainly there have been some really cool standalone unlocked devices which I show in the demo video below. Devices such as the Intel NUC and the nVidia Jetson TX1. These devices are still not portable though nor are they battery powered, they are merely tiny desktop machines which still require an AC charger.
So you can understand now why the Microsoft/Qualcomm announcement really got me excited. It ticks off a number of checkboxes that are appealing to a developer:
By the 2018 Consumer Electronics Show early this year it was announced that devices from HP, Lenovo, and ASUS would be available: https://www.theverge.com/2017/12/5/16737288/microsoft-windows-10-qualcomm-arm-laptops-launch
HP shipped here in the U.S. first, so I ordered an HP Envy X2 device directly from hp.com website and that is the device I feature in the unboxing video below. I recently also just placed orders for the ASUS NovaGo and Lenovo Miix devices and hope to have those in my hands for a future post this summer.
To stress test the Envy X2 I have been carrying it with me everywhere for the past two months and love it. This weekend for example I took it to the Boston Red Sox game at Safeco Field. Getting a 4G signal and doing pull commits from GitHub was no problem, no Wifi connection required.
Of the typical programs, tools, and apps that I use on my other PCs every day (meaning of course that they are x86 binaries) I can confirm that all of these run on the HP Envy X2 ARM64 device:
One thing not to miss when watching Microsoft's BUILD video (which is embedded further down on this page) and which is also confirmed here is that these ARM64 devices also now support the Windows Subsystem for Linux as have their AMD/Intel counterparts since last year. This means that you can install the latest ARM64 version of Ubuntu 18.04 and run unmodified Ubuntu binaries such as bash, ssh, apt, and the gcc compiler toolset side-by-side with your Windows applications. I have just begun to explore this feature and will follow up at a later time.
I have confirmed the claim about "all day" battery life, the ability to carry this device around all day without needing to recharge or carry a USB-C charger around. I have verified this day after day for the past few months as I pull it off the USB-C charger in the morning and carry the device around for about 12 hours per day before putting it back on the charger. Most days the charge does not drop before 50-60%. The other way I tested this was to stress the device by maxing out the CPU all day long. I loaded up Xformer 10 and ran it in tiled mode which ran multiple Atari 800 games at once, I loaded YouTube up in a browser and just let it loop through videos, loaded Outlook the pull emails in the background, launched Task Manager to confirm the CPU was pegged at 100%, let 4G and Wifi both enabled, turned off all battery saving and sleep modes, and set the brightness of the screen to a decent level. I then disconnected the charger and left the Envy X2 running all day starting at 8am. I checked on it a few times during the day to make sure all the videos and games were still playing, and by 9pm the battery had finally dropped down to below 10% charge. So figure it would have run out of juice around 10pm, that's about 14 hours of no-idle full-tilt 100% CPU usage without a charge. It really does live up to the claim.
Another feature I am able to verify is the 4K HDMI video output via one of many USB-C dongles that can plugged in to the device. As picture here, I was able to trivially connect the HP Envy X2 to my Sony 4K television output a second screen at 3840x2160 resolution. I was also able to trivially pair a Surface Keyboard and a Surface Dial, giving me external keyboard and scroll wheel functionality.
Setting up the HP Envy X2 in Windows 10 Pro mode with Visual Studio 2017
You hopefully understand now why I am so excited about these ARM64 devices. So with no further delay here is my exciting very over-caffeinated video of me unboxing the HP Envy X2. :-)
To summarize the video, I show a variety of earlier generation low-power devices such as the Surface RT, the nVidia Jetson TX1, the Sony VAIO U750P, the Samsung Chromebook, and several generations of Surface Pro devices as well as the Surfacebook. After unboxing the Envy X2 device I talk briefly about different 4G LTE SIM card options, such as the Sprint card that comes bundled with some devices, and the Google Project Fi card which is a T-Mobile SIM card. (personally, the Project Fi is my data plan of choice).
Should you the reader acquire one of the HP, ASUS, or Lenovo ARM64 devices, after you unbox your device follow these steps to set up it up as a Windows desktop PC and software development machine:
Step 1: Apply Windows Update
After charging the device and powering it up for the first time, you will be prompted through the Windows 10 OOBE (Out of Box Experience) which creates a local administrator on the machine. The first thing you will want to do of course is go to Windows Update and apply any Windows updates that came out while your device was in transit. As of this writing, you device should upgrade itself to Windows 10 1803, the "Spring Creators Update", also known as build 17134. To check for updates, click on the Start button, click "Settings" (the little round gear icon), click "Update & Security", and then "Check for updates". My device recently received the 1803 update, which looked like this, involving both a Windows 10 update as well as a Qualcomm firmware update:
Step 2: Switch to Windows 10 Pro mode
Next, ARM64 devices come configured in Windows 10 S mode (think "S" for "Store" or "Secure") which limits the device to only running pre-installed applications and Windows Store apps. S mode is a locked down mode of operation, not unlike Windows RT or a Chromebook, which blocks untrusted legacy code from running. This is useful in some situations such as a shared PC used by multiple family members for example.
However, if you are developer or an experienced user needing to install legacy applications what you will want to do is switch over to Windows 10 Pro mode, the full desktop mode of Windows 10 allowing command prompts to be opened, PowerShell to run, x86 command line applications and legacy x86 Win32 applications to run. To do this go to the Windows Store (click on the Start button then click the Store tile in the menu) and search for Windows 10 Pro, the follow the instructions to perform the free switch to Windows 10 Pro mode:
Step 3: Install Visual Studio 2017 with ARM64 support
The next step for a developer is to install Visual Studio 2017, Windows development environment and tools which now includes compiler tools for ARM and ARM64. Simply go to https://www.visualstudio.com and download the free Community Edition to run the installer.
At the time of this writing in mid-June the latest preview release containing ARM64 support is Visual Studio 2017 15.8 Preview 2.0, but I have found that ever since about the 15.6 releases a few months ago the ARM64 compiler was included but just not installed by default. I suggest clicking on the "Individual components" list and manually verifying that the ARM64 compiler tools and SDK libraries options are checked. I also recommend selecting the "Git for Windows" option to install Git source control command line tools.
Your ARM64 Windows 10 device is now ready to run x86 desktop applications and run the Visual Studio development tools.
Microsoft's BUILD 2018 Presentation
For us software developer types there is additional information about these devices revealed at Microsoft's Windows on ARM talk at BUILD 2018 held recently in May 2018 here in Seattle. It gives a technical overview of not only the device hardware but how Microsoft implemented the x86 emulation capabilities and how the x86 and ARM64 parts all fit together:
Among the key points to take away from this talk:
So unlike the traditional approach of just emulating a full Windows VM on top of QEMU or VirtualPC, the Windows 10 on ARM64 is a hybrid mix of a lot of native ARM/ARM64 code with a bit of emulate x86 legacy code. This results in much faster performance than having to emulate kernel mode and emulate hardware, as was the case with VirtualPC for Mac and QEMU when I discussed that scenario many years ago.
My own benchmarking and analysis so far shows the ARM64 "big cores" which peak at 2.6 GHz are very comparable in performance to the 3rd generation Core i5 "Ivy Bridge" processors. Clock for clock the ARM64 and 64-bit Ivy Bridge are practically identical in performance despite the vast vast difference in design. A brand new Intel Core i7 based laptop costing more money will still outperform ARM64 by sheer brute force of higher clock speed. At about this same price level ($699 for the ASUS NovaGo to $999 for this HP Envy X2 that I purchased), the performance I have measure does outperform a similarly priced Surface 3 (which is based on Intel's much slower Atom mobile CPU) and in many cases delivers performance similar to the Ivy Bridge era Surface Pro 2.
Coming next... Building your first native ARM64 program
I will follow up next with a deeper dive into ARM64 architecture, how ARM and ARM64 instruction sets differ from x86 and x64, and show you how to port an existing x86 Windows app to ARM64. This latest Visual Studio 2017 15.8 release allows the ARM64 devices to target three flavors of CPU architectures which can be compiled and run directly on the device: x86, ARM, and ARM64. The Atari 800 emulator Xformer 10 just discussed turned out to be a good software project to use as a test case for compiling to these various architectures. I will use our Xformer 10 source code branch in GitHub as the example case so you will all be able to follow along with us and build your own private versions of Xformer 10, and then after that your own apps.
Until then, if you have any questions or comments (specifically what did you think of the demo videos and were they more useful than just my usual text blogs?) you can of course email at firstname.lastname@example.org or message me via my Facebook page at https://www.facebook.com/darekmihocka/