WebSID: HTML5 Audio + GWT 2.0 -> Commodore 64 emulator SID music playback on Safari 4 (http://cromwellian.appspot.com/websid...)
Note, I tried it with web workers and other optimizations (window.btoa for base64 encoding), still too slow. It needs a lot of optimization work, and surprisingly, Chrome is slower than Safari 4. - Ray Cromwell
Holy crap, that's cool. Do you have any sense for where most of the time's going? I assume it's in the emulation, but that base-64 encoding part sounds nasty. Why can't the web standards ever specify sensible low-level interfaces to these things? - Joel Webber
data: urls have a non-encoded version as well, but I've never done any tests to see how transparent they are to binary data. I've used that mode for HTML before. Maybe something to experiment with (if b64 turns out to be the issue). - Matt M (inactive)
Most of the time is spent in emulation, even if I comment out the audio dataURL part, it's still 2x too slow. The main loop is a giant switch/case which dispatches to other switch/cases (IO register writes). Perhaps changing these to functions, so instead of switch(op) { case FOO }, you have dispatch[op].exec() would be faster? Of course, that would mean using lots of JSNI to set this up to avoid anonymous classes/double indirection - Ray Cromwell
Or perhaps you could cross-compile the opcodes to something more direct? Are we talking about actual 6502 emulation here, or just the SID chip? - Joel Webber
6502 emulation, SID emulation of 3 waveform oscillators, two CIA timer chips. To sound identical to the SID, you have to do cycle accurate emulation. You could theoretically pre-compile the ROM/ASM code stuff, but it would still end up gated by a clock-driven loop. It would probably run faster, but I think the fact that V8 is a factor of 2X slower, whiles the JVM eats this for breakfast really shows a) how slow still really is and b) how terrible JS is at manipulating byte oriented data :) - Ray Cromwell
BTW, did it work for you in your Safari? - Ray Cromwell
Yup. It worked precisely as you described (about half-speed). So you're having to emulate basically the whole bloody machine minus the video chips. Wow. I'm surprised that works at all. When you say v8 is 2x slower, do you mean than nitro in safari, or than the jvm? - Joel Webber
2x slower than real time. Probably 6-10x slower than JVM, and maybe 20% slower than Safari. I don't think pre-compilation will work due to self-modifying 6502 code, so you'd need to do JIT translation, and invalidate any address range written to. I'm not even sure it would be that big of a win. Javascript is awesome. - Ray Cromwell
Oh, right. Ah, the joys of self-modifying code. With the exception of people who get off on bytecode rewriting tricks, I thought we had mostly left that world behind. 6-10x slower then the JVM is about what I'd expect. I've noticed a few cases lately (there was a js jpeg encoder on Ajaxian earlier) where V8 was coming out slower than Nitro on byte manipulation stuff. Oddly, my experience has been the opposite on more "normal" js code. - Joel Webber
Is self-modifying code a large majority of the code emulated here? If not, you might be able to borrow some of the tricks from QEmu: it translates code at a basic block level (ie: atomic assembly sequences without jumps in or out). As code is modified, you translate the dirtied basic blocks again (which would happen in a worker potentially). - Matt M (inactive)
This is all gated by the fact that I did this in a few hours Thanksgiving night so there's a limit to how much I'm going to do to fix it. I'll put up the source code in github for anyone who wants to contribute. :) - Ray Cromwell
Get this into the Sunspider benchmark and it'll quickly become a priority for browser vendors. :) - Matt M (inactive)
Worker<->main page communication can be slower in Chrome than Safari (workers are out of process) - but otherwise Chrome should be just as fast. Would you happen to have some code that I could do isolated side-by-side comparisons with to see where the slowdown is? I would be surprised (but not completely shocked) if it was a pure V8 vs Nitro issue. - James Robinson
@James: have a look at the js jpeg encoder posted on Ajaxian the other day. It was also about 20% slower on v8, with no worker threads. I'm guessing Nitro has implemented some special cases for this kind of byte array code. Js semantics are terrible for this kind of stuff. I bet if you put this in the v8 benchmarks, Lars & co will make it a lot faster :) - Joel Webber
I'll put the source up in a few days. For Chrome benchmarks, I just have the Audio output part log a timestamp and size of the buffer, you can then compute throughput for both Safari and Chrome (# samples output/time). Right now, the Audio driver I wrote for it will dump the buffer whenever 2 seconds worth of samples fill up (@44Khz 16-bit, this is 176k array. Too much larger and data URIs choke on some platforms) - Ray Cromwell
@James Question: does postMessage() from within a worker thread run on the event queue? That is, if you've got a CPU heavy worker loop that is not yielding the CPU and its invoking postMessage(), can the browser onmessage() handler run and process these messages while the worker is pegging the CPU? - Ray Cromwell
@Matt - data urls have a non-encoded form? Any references? (I know the encoding parameter is optional, but I've never seen non-base64 examples) - Nick Lothian
Nick, this is what it looks like for HTML: data:text/html,hi<b>there. If you are just encoding HTML, it's far less wordy than the base-64 version. I'm not sure how browsers treat embedded NUL and binary in there, however. - Matt M (inactive)
Is &#00; allowed in XML/HTML? It's weird, but if they handled 8-bit MIME encoding, that would be pretty cool. Still, I'd like new Audio(array or object) - Ray Cromwell
Looks like chrome filters NUL out from unencoded data URLs. Unless you can represent an audio sample without it, I guess it isn't possible. - Matt M (inactive)
A high-performance AudioData analogue of some of the low-level canvas stuff keeps coming up. I hope that makes it into the spec soon. - Matt M (inactive)
I haven't been following that stuff too closely -- it would seem like things like CanvasFloatArray, et al should be used for both Audio and WebGL. FloatBuffer, anyone? - Joel Webber
What sort of performance advantage do those canvas arrays have? Is it a good idea to use these in standard code instead of JS arrays? Might be a good experiment. Neither FF3.5 nor Safari 4 supports them right now, so it might be a bit premature (Chrome does in the dev channel, Safari in the nightlies). - Matt M (inactive)
Ran an experiment to test performance of canvas arrays. After some false hope from a bad benchmark, it turns out that WebGL buffers are slower than JS arrays by about 50% on Chrome and 300% on Safari. I guess neither engine is properly JITing those things right now. - Matt M (inactive)
Reminds me of the initial DirectByteBuffer vs arrays[] vs HeapByteBuffer() benchmarks. First problem was, ByteBuffers were always inherently polymorphic thanks to bad API design (casting to MappedByteBuffer works). Second, HotSpot didn't have enough 'semantic inlining', which was added later, so that bb.put() essentially becomes *ptr++ = val. Now DirectBuffers are very speedy for communicating with JNI. I expect any specialized buffers will need a lot of work in browsers, hell, they can't even properly polymorphic DOM binding calls yet. - Ray Cromwell
After playing with those arrays for a bit, I really like them. It's a bit annoying that they are named WebGL*Array rather than being a set of objects in JS itself. Not sure if these will be available in the worker thread context, for example. - Matt M (inactive)
Even worse, they're being changed to Canvas*Array now, which is no less specific, but at least it's a breaking change :) - Joel Webber
Also, the fact that they're 50% slower on Chrome and 300% on Safari likely explains the 20% difference in V8 vs. Nitro -- that pretty strongly indicates a very fast byte[] implementation in the latter. - Joel Webber
Apparently the new name is actually WebGL* after all (http://twitter.com/ohunt...), but of course you won't find that anywhere outside of random blog posts and tweets. The whole situation is annoying because there is basically no public spec right now. I couldn't figure out why WebKit browsers didn't have the Canvas* types until I happened across a 3rd-party wiki entry that explained how the private spec changed and browsers were slowly matching it. (http://learningwebgl.com/cookboo...). Why is a spec for the web being developed in private? - Matt M (inactive)
Things seemed better when the HTML5 spec was a giant dumping ground for in-progress stuff. After they decided to move lots of stuff to separate specs, those seem updated less. - Ray Cromwell