arch detail

arch detail

Wednesday, March 21, 2007

The ugly side of the "Web 2.0" approach

I recently did some work on my Google Summer of Code application. They want to know where my online resume is published. I put one up on Google docs and made it public. Now anyone can spider the Web and read my home address, phone number, and e-mail address. The e-mail addres was intentionally obfuscated from XYZ@gmail.com to "GMail: XYZ" in hopes that I won't be inundated with "VIxAy Gzzzz RA" and "hot nuked grils" in my inbox.

So now I find that I have the following:

So now we have tools like OpenID, but of the above, LiveJournal is the only one that supports it, so that helps, uh, none. Friend-of-a-friend is a machine-friendly method for describing relationships between people (like how we have "friends" on Facebook, and "friends" on LiveJournal, and "friends" flickr.) But again, nobody uses it, so that doesn't help.

So I could list on LJ, Facebook, Blogger, YouTube... etc, every one of this increasingly large list of content hosts. This is annoying, and it also means that a potential employer who reads my blog has to go to my Blogger profile to find my resume, and they'll also find things like my personal music work (which, honestly, isn't something that makes me feel at ease.) And really, I want my friends to know about my YouTube videos and music and such. And I shouldn't have to obfuscate my actual contact information in my resume or expect employers to use Facebook (heh).

So I'm now contemplating going whole-hog before my inbox gets spammed to death and just buying some web space and registering, say, jonny-ash.com. Here's what I want for myself:

Some web service which will keep track of my online identity in all its forms. Something that says "I am X on OpenID, Y on Blogger, Z on Flickr." I also want it to allow me to present myself in a professional way while keeping added goodies for people who want to know about me personally, and if possible, I'd like to restrict access to the personal stuff to friends. And lastly, I'd like it to have actual contact information that is protected from spiders and crawlers that sell my information to spammers.

The thing is, a service which *does* all this is totally possible. Nobody's done it yet.

I could.

Here's how it would work: It should support open standards like OpenID and FOAF. When a user logs in, they can set up an account with links to their various web resources. They should be able to add contact information and make it public or private but hide it behind a "verify that you are human" image so spiders can't read it. They should also be able to hide other information behind FoaF, as in, my "friends" and their "friends" (ideally, whether they are "friends" on Facebook, or LJ, or...) should be able to see my Aikido videos and music recordings. Other people shouldn't.

All of this should be pretty trivial to do, and I think I could get the web hosting for, say, $6 a month. I could sell advertising.

Would anyone be interested? Does anyone care? Should I just set up a personal website for myself and let everyone else do the same...?

Thursday, March 08, 2007

Finally, how to Really Parallelize *Any* Stackless DFA

I posted a little while ago about my idea for parallelizing a Huffman Decode. What I didn't realize at the time is that this could be generalized to parallelize any stackless DFA. Here's a simple writeup of a method for calculating the end state of a stackless DFA with input A, on a computer with infinite processors:

Break down the input, A, into N equal (or nearly equal) substrings, A1, A2... AN.

Next, create groups of processors, G0, G1, G2... GN. GN will be one processor, which will start in the start state of the automaton. Each Gi (where i > 0) will be run on groups of S processors, G1, G2... GN, where S is the number of states in the machine, and each processor in that group will start in a different state of the machine.

Each processor in Gi will process the input Ai.

Call G0 "finished" when it has either reached a "finish" state (e.g. SError, or, say, a "terminal" leaf in a tree). Call Gi where i > 0 "finished" when a processor in Gi-1 has been contacted by a prior group and that processor has finished processing all of its input.

When Gi finishes, it is either in a "finish" state (i.e. SError) or another state. If it is in a "finish" state, return this state. If it is in another state, it contacts the processor in Gi+1 which started in the state where Gi finished. The state returned by GN is the final state of the automaton.
Simple enough, right? Funny that I haven't seen this before - and that I hadn't thought of it sooner.


Realistic Applications

Of course, the real drawback here is that for speedup N, we need (N-1)*(S) processors (where S is the number of states in the machine). If we have very few states and a huge input, this could potentially be useful. However, if we have a huge number of states, we will require an even more huge number of processors to get a speedup. Furthermore, this is going to generate a lot of wasted cycles and drive up electricity bills, which is an increasing concern.

However, if we take this general method as a model for dealing with parallel DFA modeling, then it starts to get pretty useful. For example, in the Huffman Decoding method I described in a prior post, we know that the DFA is actually a tree, so group Gi only need to simulate the subtrees which start at a given depth (i.e. i * [len(A0) + len(A1) + ... + len(Ai-1)]), which is likely to be much smaller than the number of states in the DFA.

More generally, if we know beforehand what states are reachable after a given number of inputs (which can be computed easily), we can often reduce the number of processors which must be used.


Long-term, Pie-in-the-Sky Approaches

The approach I described before is actually very similar to some forms of branch prediction. A certain branch-predicting processor might look at a branch that happens on a conditional, then actually start computing two different trains of logic - one based on the assumption that the conditional will evaluate to "true," and one on the assumption that the conditional will evaluate to "false" (okay, so most processors are a little more complex, but that's the idea). When the conditional is evalutated, one of these strings of computation is thrown out and the results of the other are retained.

Imagine if our comnputer knew that a certain computation-heavy function would return one of X different values, and simply assigned a processor to make all of these assumptions, and the results from one of the X processors was retained. It's not all that different from the method I've described for simulating DFAs.

Granted, at this point, this approach is only useful if we know that the function will return one of a handful of known values. But what if we effectively had an infinite number of processors to throw at this problem? Say that we had a function which returns a 4-byte integer. That's quite a few possibilities, but what if we still just assigned a processor to each of them? If the number of cores in a processor doubles every 18 months, then we might actually be able to do something like this - and while it would be a dumb way to parallelize all our sequential code, it would still be possible to do entirely at the hardware level.

In the meantime, more formal languages allow us to specify the exact range of values a function could return. Again, this could make such a "shotgun" approach as I have described much less wasteful.

D-Wave's Quantum Computing Not Totally Legit

I've been trying to learn about that damned quantum computing demonstration from D-Wave for a while now. Ars Technica's article is the first one that's satisfied me.

The most important detail: Quantum computing, at least as D-WAVE has it, does not mean that NP-complete problems will be solvable in polynomial time. I had heretofore been under the impression that they might be.

Thus, computer scientists will not be rendered obsolete.

Thank you, Ars Technica, for clearing that up.

OpenBox is Great

I've grown disenfranchised with FluxBox. I'd been loyal, but I had some very strange issues with X/FluxBox going into a nasty freeze and the FB community was less-than-helpful.

I've tried Blackbox, and I have to say, it's too damned ugly. Admittedly, I never tried very hard at fixing that.

So today I tried OpenBox. It is a breath of fresh air. It is beautiful. It gets all my fonts right. And the more I learn about it, the better it seems. I think my first feeling of excitement came from reading the OpenBox "about" page:

Openbox works with your applications, and makes your desktop easier to manage. This is because the approach to its development was the opposite of what seems to be the general case for window managers. Openbox was written first to comply with standards and to work properly. Only when that was in place did the team turn to the visual interface.
Finally, a WM with it's head on straight.

The behavior is way more configurable than any WM I've seen before, because anything you can imagine is specifiable through an XML configuration file. While the difficulty of setting up a simple menu with XML is not easily overlooked, tools like denu greatly simplify the process.

One of the things that most impressed me was that they made a point of complying with freedesktop.org standards. If the post-fd.o world is really this well-done, the Free world will finally be a wonderful thing.