NO EXECUTE!

(c) 2008 by Darek Mihocka, founder, Emulators.com.

September 19 2008


[Part 1]  [Part 2]  [Part 3]  [Part 4]  [Part 5]  [Part 6]  [Part 7]  [Part 8]  [Part 9]  [Part 10]  [Part 11]  [Part 12]  [Part 13]  [Part 14]  [Part 15]  [Part 16]  [Part 17]  [Part 18]  [Part 19]  [Part 20]  [Part 21]  [Part 22]  [Part 23]  [Part 24]  [Part 25]  [Part 26]  [Next]  [Return to Emulators.com]

The Ever Shrinking Flash Drive

Intel shook up the flash memory market recently with their announcement that they will enter the solid-state hard disk market with large capacity high-speed drives. Even in the few months since I began poking around with various flash technologies, prices have just continued to plummet. Most 1GB, 2GB, and 4GB USB memory sticks are now under 10 dollars. At Fry's here in Seattle they are in the discount bucket, right next to the obsolete HD-DVD movies selling for 6 dollars.

Just because they are practically free now does not mean that the "old" memory sticks are worthless. In fact, some of the two dollar memory sticks now make excellent Windows Vista ReadyBoost memory sticks. If you recall, ReadyBoost is a feature in big fat Vista to cache the virtual memory swapfile on external USB flash. When Vista first hit the market in early 2007, many USB drives did not been the performance level to be useful for ReadyBoost (i.e. too slow, and you may as well stick with your existing hard disk based swapfile). Flash memory on the market today is almost always up to par now, and manufacturers specifically now stamp a Windows Vista logo on flash drives to indicate that they are fast enough.

I recently picked up some KingMax brand USB drives for literally 2 to 5 dollars each, and can confirm that they do in fact make excellent ReadyBoost drives. They are tiny as well, barely larger than the size of the USB connector itself, meaning they can be plugged in to the back of a PC or the side of a notebook without the problem of protruding too much, as was the case with some older flash drives which I would always keep breaking off the side of my notebooks by accident. The tiny KingMax USB drives shown above with their little Windows Vista logo are available on Amazon.com, literally for anywhere from 2 to 5 dollars depending on the capacity.

There is another little drive worth noting. Last month in Part 22, I talked about my experiments to build a solid-state hard disk from cheap plug adapters and Compact Flash cards. You may recall this photo of my homemade CF-based drive which I plugged directly into my Athlon XP system's motherboard IDE connector:

You can barely see the CF card in the bottom middle of the photo. Well, look at that same AMD Athlon system again and squint even harder to see a newer Transcend 4GB flash drive, at the middle right of the photo between the memory DIMMs and the CD-ROM IDE cable:

The folks over at Transcend, who made some of the other flash I discussed last month, also make tiny little IDE hard disks, in what is barely a form factor larger than the IDE connector! Here on my kitchen table of science, I have put a traditional Seagate desktop hard disk next to the Transcend IDE drive. Below that, you see two of the tiny little 4GB KingMax USB thumb drives:

The Transcend IDE drive pictured above is the 512MB model which sells on CDW.com for about 20 dollars: (http://www.cdw.com/shop/products/default.aspx?EDC=1124712) and the 4GB drive I show inside the computer is about 60 dollars.

I am currently trying a little experiment to see if these cheap tiny USB drives can be used as an alternative to mailing CDs and DVDs internationally. Normally, it costs under a dollar to burn a DVD-R, but factor in several dollars in postage and disk mailer costs to securely ship that DVD overseas. These little USB drives are smaller than most postage stamps, so I thought to myself, what if I simply tape one of these little drives to the inside of a regular letter envelope and mail as a standard letter? Total cost: 2 dollars for the KingMax drive, and 90 cents first class postage, and no hassle with customs forms or special mailers. A friend in Australia will confirm in a few days whether he received the thumb drive in one piece. :-)

You may be asking, why pay 20 dollars for a 512-megabyte solid state drive? Well, if you have old 486 and Pentium systems laying around as I do, 512-megabytes is probably as large or larger than the antique noisy drive you have in there now (and which you paid a few hundreds dollars for!). Also, your old MS-DOS operating system likely doesn't handle larger drives (remember the 32MB and 512MB limits?). My next experiment will actually be to upgrade one of my old 486 machines with that 512MB flash drive and give it a go with Windows 98.

As far as that 4GB Transcend drive in the AMD Athlon system, I repeated the same experiments on that drive as I had on the other solid-state drives last month: installing Windows XP from scratch and running some benchmarks. The good news, GETBLOCK reports about a 26 megabytes/second raw read speed, and HDTEST32 a pretty consistent write throughput of about 13 to 16 megabytes/second depending on the block size. The 60 dollar 4GB Transcend drive's performance is almost exactly comparable to the home-built 4GB drive I made out of the SanDisk Compact Flash, but certainly smaller and more convenient than the homemade solution. Consider it as a cheap way to upgrade an older PC.


Summer Of 2008 - Windows Vista SP1, Visual Studio 2008 SP1, and Hyper-V Still Disappoint

Given that yesterday was Microsoft's annual employee company meeting in downtown Seattle, which screwed up traffic for the rest of us, hrmph, let me use this last posting of the summer of 2008 to vent a few issues toward Microsoft's summer of 2008 offerings. To help make my point, I am giving you some free code today which I promisd you earlier in the summer. Source code and Windows executables to my HDTEST32 and GETBLOCK disk performance utilities which I used to test drive speeds, can be found here (HDTEST.ZIP) and here (GETBLOCK.ZIP). Compile with Microsoft Visual Studio 98 or later. Let to reader (use my gcc C++ code sample from previous posting as a guide) to make the code more portable to compile under gcc.

To use HDTEST32, which tests the performance of the Windows WriteFile call, you specify whether to let Windows cache the writes to the disk cache or force them to flush directly to the hard disk being tested. Do that by specifying either U (for "unbuffered") or B (for "buffered") as the first parameter. Then specify a path to write the test file to. I generally just pass a dot "." which specifies to use the current directory. You can also pass a specific disk letter and path, such as D:\ or E:\, as different disk formats (FAT/FAT32/NTFS), amount of fragmentation on that disk, and even the placement of the disk partition on the physical hard disk will affect the performance. The optional third parameter, specifies the size in kilobytes of each write. Use a small value such as 16 or 64 (for 64 kilobyte blocks) to test write throughput of typical small file operation. Use a large value such as 1024 (to specify 1-megabyte blocks) to get a better feel for performance of large file copy operations. The fourth parameter to HDTEST32 optionally lets you spawn multiple write threads. Use this on multi-core machines to see how well your system's write performance scales. Finally, the fifth parameter is a maximum file size (in gigabytes) for the test. Usually it is sufficient to leave it at the default of 1 gigabyte, but you may want to crank that up when testing a large number of threads with large write blocks.

The following is some sample output of four runs of HDTEST32 on one of my Windows Vista SP1 based computers which uses an older mechanical IDE hard disk:

C:\Users\DarekM>hdtest32 U D:\ 16 1 20
HDTEST: writing unbuffered 16K buffers in 1 threads up to 20 GB each
1.000: 18 MB written, 18 MB/s sustained, 18 MB/S incremental
2.000: 41 MB written, 20 MB/s sustained, 23 MB/S incremental
3.000: 64 MB written, 21 MB/s sustained, 23 MB/S incremental
4.000: 94 MB written, 23 MB/s sustained, 30 MB/S incremental
5.000: 121 MB written, 24 MB/s sustained, 27 MB/S incremental
6.031: 149 MB written, 24 MB/s sustained, 27 MB/S incremental
7.031: 176 MB written, 25 MB/s sustained, 27 MB/S incremental

C:\Users\DarekM>hdtest32 B D:\ 16 1 20
HDTEST: writing buffered 16K buffers in 1 threads up to 20 GB each
1.063: 134 MB written, 126 MB/s sustained, 126 MB/S incremental
2.063: 159 MB written, 77 MB/s sustained, 25 MB/S incremental
3.063: 192 MB written, 62 MB/s sustained, 33 MB/S incremental
4.063: 219 MB written, 53 MB/s sustained, 27 MB/S incremental
5.079: 253 MB written, 49 MB/s sustained, 33 MB/S incremental
6.094: 286 MB written, 46 MB/s sustained, 32 MB/S incremental
7.094: 319 MB written, 44 MB/s sustained, 33 MB/S incremental
8.110: 351 MB written, 43 MB/s sustained, 31 MB/S incremental

C:\Users\DarekM>hdtest32 U D:\ 1024 1 20
HDTEST: writing unbuffered 1024K buffers in 1 threads up to 20 GB each
1.000: 32 MB written, 32 MB/s sustained, 32 MB/S incremental
2.015: 65 MB written, 32 MB/s sustained, 32 MB/S incremental
3.031: 99 MB written, 32 MB/s sustained, 33 MB/S incremental
4.031: 132 MB written, 32 MB/s sustained, 33 MB/S incremental
5.047: 164 MB written, 32 MB/s sustained, 31 MB/S incremental
6.047: 197 MB written, 32 MB/s sustained, 33 MB/S incremental

C:\Users\DarekM>hdtest32 B D:\ 1024 1 20
HDTEST: writing buffered 1024K buffers in 1 threads up to 20 GB each
1.063: 26 MB written, 24 MB/s sustained, 24 MB/S incremental
2.063: 49 MB written, 23 MB/s sustained, 23 MB/S incremental
3.063: 73 MB written, 23 MB/s sustained, 24 MB/S incremental
4.063: 95 MB written, 23 MB/s sustained, 22 MB/S incremental
5.078: 122 MB written, 24 MB/s sustained, 26 MB/S incremental
6.078: 142 MB written, 23 MB/s sustained, 20 MB/S incremental
7.078: 167 MB written, 23 MB/s sustained, 25 MB/S incremental
8.078: 189 MB written, 23 MB/s sustained, 22 MB/S incremental

You can actually see something quite surprising from these runs, which reminds me of what I was saying a few weeks ago about how Windows 7 really needs to take a long hard look at disk caching and possibly removing it. Disk caching is a great idea for disk drives with long latencies, such as the 10-millisecond or longer seek times and spin-up times of mechanical drives. Small 16-kilobyte writes cause enough track-to-track head movement that it makes sense to have the operating system buffer up the writes and then batch them up to the hard disk as larger writes. You can see how the unbuffered writes gives a pretty consistent write speed as the file grows larger. With buffering, there is an initial burst of 126 megabytes in the first second, but which quickly degrades as the file gets larger. What happens is that Windows Vista fills up its available memory and now can't buffer any more. By the second second, er, 2nd second of the test, the incremental write throughput drops down to almost the rate of the unbuffered writes.

If you increase the block size to one megabyte - more representative of large file copies  - you can see that buffered file writes are significantly slower than unbuffered writes. The overhead of having the operating system buffer the file data, not to mention the pressure that puts on memory, can deliver a good 25% speed slowdown over simply not having the disk cache there. And this is on an old mechanical drive which you the reader are probably using right now. Now in the upcoming era of solid-state hard disks with near-instantaneous seek times, it starts to make me wonder if Windows Vista really needs SuperFetch, and ReadyBoost, and prefetching, and disk caching, and any other caching technology that may in fact be a performance bottleneck!

Try it yourself; this disk cache slowdown effect is real. For example, why the hell does the Windows Explorer (i.e. the Windows desktop shell) use buffered file writes for large file copies? If I am copying thousands of files from one disk to another, and that file copy is on the order of gigabytes, there is virtually no point in having the operating system buffer the data in a disk cache. The data is already buffered in memory by the Windows Explorer, so this double buffering is only a tax. And you wonder why file copies are so slow in Windows Vista, even in Windows Vista Service Pack 1? The disk caching algorithms are designed for 1990's hardware!

GETBLOCK is more handy for manipulating raw disk sectors, imaging physical drives, and measuring raw disk throughput. It takes four parameters: a source file (which can either be an ordinary disk file, or an entire hard disk), a hexadecimal starting offset (into the file or hard disk), a hexadecimal copy size, and finally an optional fourth parameter which is a destination file to copy to. For example, using the Windows "\\.\device" notation, you can dump out the boot sector of your primary hard disk with this command:

C:\Users\DarekM>getblock \\.\PhysicalDrive0 0 200
Source file size = FFFFFFFF bytes
skipping 0 bytes

Dump of \\.\PhysicalDrive0...
00000000 33 c0 8e d0 bc 00 7c 8e c0 8e d8 be 00 7c bf 00 3.....|......|..
00000010 06 b9 00 02 fc f3 a4 50 68 1c 06 cb fb b9 04 00 .......Ph.......
00000020 bd be 07 80 7e 00 00 7c 0b 0f 85 10 01 83 c5 10 ....~..|........
00000030 e2 f1 cd 18 88 56 00 55 c6 46 11 05 c6 46 10 00 .....V.U.F...F..
00000040 b4 41 bb aa 55 cd 13 5d 72 0f 81 fb 55 aa 75 09 .A..U..]r...U.u.
00000050 f7 c1 01 00 74 03 fe 46 10 66 60 80 7e 10 00 74 ....t..F.f`.~..t
00000060 26 66 68 00 00 00 00 66 ff 76 08 68 00 00 68 00 &fh....f.v.h..h.
00000070 7c 68 01 00 68 10 00 b4 42 8a 56 00 8b f4 cd 13 |h..h...B.V.....
00000080 9f 83 c4 10 9e eb 14 b8 01 02 bb 00 7c 8a 56 00 ............|.V.
00000090 8a 76 01 8a 4e 02 8a 6e 03 cd 13 66 61 73 1e fe .v..N..n...fas..
000000a0 4e 11 0f 85 0c 00 80 7e 00 80 0f 84 8a 00 b2 80 N......~........
000000b0 eb 82 55 32 e4 8a 56 00 cd 13 5d eb 9c 81 3e fe ..U2..V...]...>.
000000c0 7d 55 aa 75 6e ff 76 00 e8 8a 00 0f 85 15 00 b0 }U.un.v.........
000000d0 d1 e6 64 e8 7f 00 b0 df e6 60 e8 78 00 b0 ff e6 ..d.⌂....`.x....
000000e0 64 e8 71 00 b8 00 bb cd 1a 66 23 c0 75 3b 66 81 d.q......f#.u;f.
000000f0 fb 54 43 50 41 75 32 81 f9 02 01 72 2c 66 68 07 .TCPAu2....r,fh.
00000100 bb 00 00 66 68 00 02 00 00 66 68 08 00 00 00 66 ...fh....fh....f
00000110 53 66 53 66 55 66 68 00 00 00 00 66 68 00 7c 00 SfSfUfh....fh.|.
00000120 00 66 61 68 00 00 07 cd 1a 5a 32 f6 ea 00 7c 00 .fah.....Z2...|.
00000130 00 cd 18 a0 b7 07 eb 08 a0 b6 07 eb 03 a0 b5 07 ................
00000140 32 e4 05 00 07 8b f0 ac 3c 00 74 fc bb 07 00 b4 2.......<.t.....
00000150 0e cd 10 eb f2 2b c9 e4 64 eb 00 24 02 e0 f8 24 .....+..d..$...$
00000160 02 c3 49 6e 76 61 6c 69 64 20 70 61 72 74 69 74 ..Invalid partit
00000170 69 6f 6e 20 74 61 62 6c 65 00 45 72 72 6f 72 20 ion table.Error
00000180 6c 6f 61 64 69 6e 67 20 6f 70 65 72 61 74 69 6e loading operatin
00000190 67 20 73 79 73 74 65 6d 00 4d 69 73 73 69 6e 67 g system.Missing
000001a0 20 6f 70 65 72 61 74 69 6e 67 20 73 79 73 74 65 operating syste
000001b0 6d 00 00 00 00 62 7a 99 ff c1 d3 e6 00 00 80 01 m....bz.........
000001c0 01 00 07 fe ff ff 3f 00 00 00 37 be e3 04 00 fe ......?...7.....
000001d0 ff ff 07 fe ff ff 00 c0 e3 04 00 68 f8 00 00 fe ...........h....
000001e0 ff ff 07 fe ff ff 00 28 dc 05 00 c8 74 03 00 00 .......(....t...
000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 aa ..............U.

Or, specify a much larger copy size, such as 10000000 (256 megabytes), and "nul" as the destination file, and you can use that to very easily measure raw disk read speed:

C:\Users\DarekM>timer getblock \\.\PhysicalDrive0 0 10000000 nul
getblock \\.\PhysicalDrive0 0 10000000 nul
Source file size = FFFFFFFF bytes
skipping 0 bytes
0FF00000: writing 1048576 bytes...
Time of execution 6.959

In this example, about a 37 megabyte-per-second disk read speed, 256 megabytes divided by 6.9 seconds. You can use GETBLOCK to save and restore hard disk boot sectors, to extract portions of a binary file, or for benchmarking purposes as I've just shown.

The next free tool I'm giving you is the latest build of my CPU_TEST micro-benchmarking suite, available here (CPU_TEST.ZIP). CPU_TEST is my assembly language based test harness which I've used to figure out the micro-architectural details and differences of different CPU designs. Originally when I started developing this tool almost a decade ago it was to figure the differences between the Intel Pentium III, Intel Pentium 4, and the AMD Athlon. I used it throughout the years 2000 and 2001 to fine tune my Gemulator and SoftMac emulators to avoid any significant performance hazards of any given CPU architecture, hazards of which the Pentium 4 was full of.

As I have added more and more tests over the years, hundreds of tests by now, it has become an indispensible tool for benchmarking not just the silicon, but also seeing differences between different Windows operating system releases, and between different virtual machine hypervisors. For example, something that may execute relatively quickly on the native hardware, such as reading the on-chip timestamp, may balloon to hundreds or thousands of clock cycles under a virtual machine hypervisor.

Some of the simple benchmark tests I've already posted in the past were actually just very distilled stripped down versions of CPU_TEST. The version I am posting today is not the full suite of tests, but does contain many of them, from measuring operating system latencies to MMX and SSE multimedia instruction latencies, to basic memory read/write operations. What is not as significant to you is any particular raw result on its own, but rather when comparing two processors, say, the original Pentium 4 and the final Pentium 4 design, or comparing the Intel Pentium III against its clone the AMD Athlon, to see what differs.

CPU_TEST is pretty easy to run. At its simplest, open up a Windows command line prompt, make sure that you have disabled any background processes such as search indexers or virus scans, and simply type CPU_TEST and the command prompt. A default set of tests will run, giving you output that you have seen here before, showing the name of the test, the raw execution time, the throughput of the test in MIPS (millions of instructions or operations per seconds), and then column that displays either the number of clock cycles per operation, or the number of operations per clock cycle.

A snippet of sample output of CPU_TEST will look like this:

C:\Users\DarekM>cpu_test
CPU Perf Tester by Darek Mihocka. Built Sep 19 2008
x86 native version
calibrating... : 38 ms, 395 ps/instr, 2526 MIPS, 1.0 IPC

Measuring using a clock speed of 2526 MHz
Hardware performance frequency = 3 MHz
On-chip cycles clock frequency = 2526 MHz
GetTickCount elapsed time (ms) = 1000 ms

Simple tests of integer and memory operations.
Ideally, MIPS should equal the clock speed of your CPU.

test 1 integer   : 19 ms, 197 ps/instr, 5052 MIPS, 2.0 IPC
test 1 address   : 19 ms, 197 ps/instr, 5052 MIPS, 2.0 IPC
test 1 memory mx : 38 ms, 395 ps/instr, 2526 MIPS, 1.0 IPC
test 1 memory sr : 76 ms, 791 ps/instr, 1263 MIPS, 2.0 clk

If you know your host CPU's clock frequency, you may use the /MHz switch to get a more accurate reading. The /clk switch normalizes everything in terms of clock cycles per operation. If your CPU is later model 486 or Pentium, use the /rdtsc switch to enable slightly more accurate timing. The /all switch activates all tests. So for example on my 2.53 GHz Pentium 4 machine, I used this command line:

C:\Users\DarekM> cpu_test /clk /rdtsc /MHz 2530 /all >p4_253.txt

to generate this output file. I know the test names are probably cryptic, so if you want to know specifics, please email me. I will explain some of these tests in a future posting when I compare in detail the characteristic differences between the Pentium 4, the Pentium III / Core 2, AMD Athlon/Opteron/Phenom, and the new Intel Atom processors.

One easy test you AMD fans can try out is run CPU_TEST on an AMD Athlon64 and then on the new AMD Phenom. You'll see that they are for all intents and purposes the same chip. Similarly, compare the Athlon64 or the Opteron to an older Athlon XP. Again, virtually the same chip with almost identical clock cycle timings. AMD's basic CPU architecture has not changed much from the original Athlon from 1999. However, repeat the same comparison between say, a Pentium 4 and a Core 2, or a Core 2 and Atom, and the differences will show up like night and day. It is finding these differences and understanding why they happen, and then writing code which avoids the slow cases that is the key to writing portable yet efficient code.

Hyper-V. Besides the fact that the original release of Hyper-V in June is little more than a cleaned up release of Virtual Server, there are still performance, compatibility, and documentation issues.

  • Hyper-V cannot import existing virtual machines from Virtual PC, from VMware, from Xen, or from Virtual Server. You are back to square one as far as setting up brand new virtual machines all over again. How many freaking times will this be necessary? With open source products, I can share virtual machine state from one product to another, such as Bochs and QEMU. I use the exact same disk images on both, and I used the same disk images whether I'm running on my Pentium 4, Core 2, or AMD Phenom systems. Microsoft can't even migrate a virtual machine between its own products!
  • As is the fatal design flaw of most VT-based hypervisors, including VMware Workstation and Virtual PC, Hyper-V can't migrate a virtual machine between Intel and AMD hosts. I tried it, so much for "live migration" hype. I installed Windows Server 2008 on my recent quad-core Core 2 home-built system, and my recent quad-core AMD Phenom home-built system. Same OS, same clock speed (2.4 GHz), same number of cores (4), same amount of RAM (8GB DDR2), same Seagate SATA hard disks, same nVidia PCIe video cards. Virtually identical systems, except one is based on Intel Core 2, the other on AMD Phenom. The flaw, which I've harped about for the past year, is that VT-based hypervisors do not truly isolate the guest virtual machine from the host hardware. The guest can "see" that it is running on Intel hardware as opposed to AMD hardware, it can "see" the clock speed, the MMX/SSE capabilities, and other implementation specific details which violate the very definition of a virtual machine.
  • Another design flaw carried over from Virtual PC and Virtual Server is that Hyper-V locks down host physical memory corresponding to guest RAM. This is a common shortcut used by most virtual machine hypervisors to simplify their implementation, but is completely unnecessary. On my quad-core machines that have 8 gigabytes of RAM, realistically I am able to run three 2-gigabyte Windows XP guests. Anything more and my host Windows 2008 machine starts to thrash, or Hyper-V simply fails to launch another guest. There is no technical reason why guest RAM, like all guest resources, isn't virtualized. Hyper-V is not acting as a virtual machine monitor as much as it serves to partition RAM. VMware for example, supports what is called memory overcommit, a feature that now even the open source Xen hypervisor on Linux is implementing. Why is Hyper-V years behind the curve?
  • The latencies in Hyper-V are far worse than even with Virtual PC 2007 or Virtual Server 2005. The latencies of page faults and system calls are higher than I've ever see them, to the tune of thousands of extra clock cycles per event. A hypervisor should be a very low-latency low-overhead layer, and Hyper-V trends the opposite direction. Instead, Hyper-V now makes the cost of most system calls, page fault, ring transitions and such about 4 to 6 times slow than native. It's getting down to QEMU and Bochs interpreted x86 territory at that rate. The performance penalties on such operations are the worse that on any of the other recent hypervisors I've benchmarked.
  • An embarrassing aspect of the Hyper-V launch is that for several weeks after launch, Microsoft had not even posted the Hyper-V documentation. The online help pages which the built-in Help was giving links to pointed to placeholder pages. This gives me the impression that Hyper-V was rushed out the door unfinished. From what I can tell, three months later that documentation is now finally up.
  • In a slimy the game that Microsoft plays, forcing one to install Window Server 2008 in order to run the "free" Hyper-V, when Server 2008 and Vista SP1 are really one and the same thing. Requiring the Windows Server 2008 host is a marketing restriction more so than any kind of technical restriction.

Try it yourself, assuming you shell out the bucks for Windows Server 2008. Create a Windows XP virtual machine under Hyper-V, then run my various benchmark tools, specifically CPU_TEST, and compare them to what you get when you run Windows XP natively on the same system. You will see how page faults jump from about 5000 cycles to about 28000 cycles. How file mapping operations jump from about 8000 cycles to 32000 cycles. How basic Windows system calls such as PeekMessage() jump from about 1000 cycles to about 5000 cycles. The clock cycle penalty imposed by the Hyper-V hypervisor is just ridiculous and unnecessary. If it is soooooo integrated into the Windows Server 2008 kernel as Microsoft would have you believe (http://www.microsoft.com/windowsserver2008/en/us/hyperv-features.aspx), then how can Hyper-V's overhead be worse than that of Virtual PC 2007, a standalone virtual machine that even runs on Windows XP and Vista? What is the point of Hyper-V then?

As I pointed out in July, Hyper-V is so technically flawed that VMware never needed to sweat or lose a night's sleep over it, let alone panic and fire its CEO. Microsoft is firmly still playing catch-up in the virtualization space.

Finally, the Service Pack 1 of Visual Studio 2008 was released. I installed it. Recompiled Bochs with it and tried to use the fancy PGO feature (Profile Guided Optimization), and ran into exactly the same issue I ran into a year ago. The optimizer de-virtualizes indirect function calls to such an extreme, that the "optimized" code for the Bochs inner CPU loop looks like one big giant set of "if/else" statements that nest dozens of levels deep. The original code, which is basically a 5-byte indirect function call to an x86 opcode handler at the heart of the Bochs CPU dispatch loop:

    ; 179 : BX_CPU_CALL_METHOD(i->execute, (i));

    000d8 8b cb    mov ecx, ebx
    000da ff 53 04 call DWORD PTR [ebx+4]

and the profile-guided "optimized" code which breaks that down into a ridiculous cascade of if/else/if/else type code that takes longer and generates more branch misprediction than the original loop. The optimized code is so large I had to put it in a separate file for you, and bloats the dispatch code from 5 bytes to about 400 bytes. Microsoft blew it. I don't care that C# now has generics or whatever. Interns use C#. Real developers use C or C++. I want excellent code generation of my C++ code, and I expect "optimized" code to run faster than un-optimized code. As it is, the "optimized" build of Bochs runs about 20% slower due to all this ridiculous inline expansion of the indirect function calls.

Back to the drawing board Microsoft. Discontinue the sale of Windows Vista and just give people the better Windows XP they are asking for. Windows XP SP3 has become a bit of a pig as it requires at least an extra gigabyte over XP SP2, ouch. Discontinue the sale of Visual Studio 2008 because the generated code quality is an embarrassment compared to what I was getting from Visual Studio 98 and Visual Studio 2003 years ago. gcc, a free C/C++ compiler, runs circles around the current Visual Studio 2008, and as of Apple's latest Xcode 3.1 release, it supports the LLVM virtual bytecode which could derail both Java and .NET. And of course, Hyper-V is just the same old Virtual PC crap recycled in a manner to sell more Windows Server licenses. Just write a large check to VMware and be done with it. From what I can tell, all the smart people at Microsoft have taken shelter at Xbox these days.


And Justice For All?

AJFA, the title of the fine Metallica album released 20 years ago when OS/2 and the Atari ST were king. Linux and Mac OS X may still be behind the market share of Windows, but there is some justice in this world to be found. Namely that this year's top live acts aren't idotic bubble gum garbage, but instead this year we have AC/DC, Metallica, Motorhead, Def Leppard, Iron Maiden, Slayer, and Judas Priest all touring. And particularly that AC/DC and Metallica will be playing here in Seattle on back-to-back nights in about two months. Unless you have been in a coma this week (not likely following the liquefaction of your stock portfolios), you are probably aware that Metallica's latest release - Death Magnetic - is currently #1 on the charts, EVERYWHERE. Ah, so now my cross-country drive to Ozzfest last month to see Metallica doesn't seem so crazy after all, huh? I can't stop listening to the new album. Literally since 9am last Friday when it went on sale at the Virgin store in San Francisco (yes, I was on yet another road trip, this time to the CTIA wireless conference at Moscone Center). I listen to it ten, fifteen, twenty times a day. It is amazing. The guitar solo is back! And Napster is now reduced to merely yet another brand at Best Buy, ha! The metal hammer of justice crushes all.

Unfortunately, justice does not prevail in more serious matters. It is absolutely stunning to me how the U.S. government can afford a trillion dollars to invade Iraq to depose one man, and now has found another trillion dollars under its seat cushions to bail out the bankers, stock brokers, gamblers, the bottom feeders who used to bait hard working people with 1% mortgage loans. I even used to get mortgage offers on my UPS Store mailbox! After 9/11, America went hog wild with zero percent financing offers to try to stimulate its economy. The American public got so used to free money, even staunchly conservative banks such as Seattle's Washington Mutual got greedy and started giving out money to people who had no business taking that money or "buying" homes that they could not afford. Now WaMu is on the rocks.

Everyone is at fault here - the U.S. government for allow loose lending in the first place, the American public who took that free money, and the fat cat bankers that financed it with virtual money that didn't exist. It is unbelievable to me that, as in the 1980's S&L collapse, the U.S. government chooses to bail out the multi-millionaires. So now the U.S. government owns foreclosed homes which in many cases have been looted and trashed by the now criminal "buyers" who had no business ever moving in.

For years the U.S. government has made excuses about not having a paltry few billion dollars to help the uninsured in this country. It is ok to lose your home due to illness and medical bills, but God forbid you can't make your mortgage payment on your 9,000 square foot mega-McMansion on just your 7-11 sales clerk salary. Look at China. Sure, they're building their economy on the broken backs of its peasants. But also the Chinese government is spending its money on education, on roads, on literacy, on bringing up the standard of living for millions of its people, and on putting on events such as the Olympics to attract even more money. And ironically, doing so mostly with America's money. Those brand new skyscrapers I saw all over Beijing three months ago are paid for by Wal-Mart's money pipeline from North America.

I'm no economist, but it would seem to me that if America had not written itself a blank check between 2002-2007 and spent so much of it on imported goods that many people now can't afford, China would not now be in a position to buy up what is left of America's economy. The American public really has not right to whine about any of this now. They're driving Hummers and Lexus SUV's in the age of $4/gallon gasoline, throwing money away on ring tones, and paying $4 for a cup of coffee (I raise my hand, guilty as charged on the Starbucks charge). America voted for Bush. They took his free money. He sold the country. Deal with it. This trillion dollar bailout is nothing more than the completion of a trillion dollar money transfer from the U.S. to China that started years ago. One trillion dollars of free money. Nothing wrong with that I guess. I'd have preferred that it was Canada or my bank account. Well, congratulations to China, and I loved watching the Olympics which I indirectly helped pay for.

That said, please keep supporting my $4/cup Starbucks coffee habit. I couldn't have driven over 10000 miles across North America in the past 6 weeks without it. Go to the Starbucks Online Store, purchase a prepaid gift card, and send it to me at:

Darek Mihocka c/o Emulators
14150 N.E. 20th Street, Suite 302
Bellevue, WA 98007-3700
U.S.A.

I haven't asked for a while, but if you have comments, please email them to be by clicking on one of the two links below:
 

Darek, your postings are better than Death Magnetic itself...
 
Darek, your postings are worse than any song on St. Anger...

Will work for coffee. :-)


[Part 1]  [Part 2]  [Part 3]  [Part 4]  [Part 5]  [Part 6]  [Part 7]  [Part 8]  [Part 9]  [Part 10]  [Part 11]  [Part 12]  [Part 13]  [Part 14]  [Part 15]  [Part 16]  [Part 17]  [Part 18]  [Part 19]  [Part 20]  [Part 21]  [Part 22]  [Part 23]  [Part 24]  [Part 25]  [Part 26]  [Next]   [Return to Emulators.com]