Monday, July 4, 2011

MySQL: Really?

I only recently started developing against MySQL. Most of my development has been with Oracle or MS SQL Server (don't sneer, SQL Server's quite good, and has been good since v7), and I'm used to dealing with the query planners on those platforms. MySQL's not backed by equally big bucks, but it's sufficiently old and widely used that I assumed that its query planning wouldn't be seriously lacking in any way, at least on ordinary, standard, everyday sorts of queries.

So I was stunned and a little disbelieving when this query didn't run instantly, and in fact timed out:
select * from inventors
where id in
(select inventor_id from patent_inventors
where patent_id in (2398463, 2525267, 2516478))
inventors' PK is id, and patent_inventors has an index on patent_id. So this query should do three lookups in the latter index, obtain a handful of inventor_ids, and then do a handful of fetches from inventors. The work of a few milliseconds.

But instead it loops through inventors (several million recs), performing an indexed lookup into patent_inventors on each iteration. Which is incredibly awful. In very rough terms, six orders of magnitude worse than the proper plan.

I won't paste the EXPLAIN output here (Blogger doesn't handle wide lines of text well), but the big red flag is that the patent_inventors lookup is marked DEPENDENT_SUBQUERY: that is, MySQL thinks that the outcome of the subquery will vary depending on what record is currently being processed by the outer or 'main' query. But that is clearly not the case: the subquery is absolutely invariant, does not depend on the outer query in the slightest, and only needs to be run once.

Now, in this case it's easy enough to rewrite the query as a two-table join with no subquery, and MySQL handles that correctly, so this is in no way a showstopper. But it's pretty disturbing. It's disturbing because it suggests either that MySQL's query analyzer can't produce a correct dependency graph, or else that MySQL isn't even using dependency graphs to plan queries. I haven't dug into the source code to see which of these is the case. But either way, it suggests that there are more unhappy surprises waiting for me when I find myself needing to write queries that actually aren't trivial.

Note: At the time of this writing the newest MySQL community GA release is 5.5.13. I am using MySQL 5.1 + InnoDB plugin, because that's what the Ubuntu 11 instance I have makes easy to install. The next VM I set up will probably not use Ubuntu, because its MySQL packages lag the releases so badly and I don't want to have to roll my own installs for something as crucial as this.

Friday, June 24, 2011

Measuring UI utility

Google Tech Talk: Human-Computer Interaction meets Economics: How to Measure Interface Utility with Mechanical Turkers.

This was interesting - an approach to measuring the quality of your user interface by how much you have to pay Mechanical Turkers to accomplish tasks with them (expressed as amount of actual work accomplished at a given wage).
Seems like a very worthwhile thing to do with proposed UI changes before you roll them out to production, and I will definitely keep this option in mind when I am considering different UIs for my current project.

Some of the audio quality is a little dicey, so I switched on Google's automatic transcription, which feels like it's made some progress since I looked at it last. It's often surprisingly smart, but the error rate is still high.

Saturday, June 11, 2011

About the Apple spaceship HQ

The Apple 'Spaceship' headquarters proposal is being discussed to death, so I will only add a little bit here.

I have been on the site where it is to be built. It is a completely unexceptional sprawl of big blocky Silicon Valley office buildings surrounded by endless asphalt parking lottage. The buildings use the usual massive amounts of power for air conditioning, the parking acreage makes everything on the outside even hotter, and while it's not uglier than most office parks, it doesn't contribute to the landscape either.

So if city government has the choice between leaving it as-is or replacing it with a green campus that will provide a huge construction boost to the local economy while being built and will be a symbol of community progressiveness and pride afterward, there's really only one way it makes sense for them to lean. And that's absent any threats that Apple might move its tax base out of the city.

The biggest downside to this project, and I mean this quite seriously, is that the eye-catching campus is right alongside a major freeway (Hwy 280 northbound). Rubbernecking may be a big problem.