So I was stunned and a little disbelieving when this query didn't run instantly, and in fact timed out:
select * from inventorsinventors' PK is id, and patent_inventors has an index on patent_id. So this query should do three lookups in the latter index, obtain a handful of inventor_ids, and then do a handful of fetches from inventors. The work of a few milliseconds.
where id in
(select inventor_id from patent_inventors
where patent_id in (2398463, 2525267, 2516478))
But instead it loops through inventors (several million recs), performing an indexed lookup into patent_inventors on each iteration. Which is incredibly awful. In very rough terms, six orders of magnitude worse than the proper plan.
I won't paste the EXPLAIN output here (Blogger doesn't handle wide lines of text well), but the big red flag is that the patent_inventors lookup is marked DEPENDENT_SUBQUERY: that is, MySQL thinks that the outcome of the subquery will vary depending on what record is currently being processed by the outer or 'main' query. But that is clearly not the case: the subquery is absolutely invariant, does not depend on the outer query in the slightest, and only needs to be run once.
Now, in this case it's easy enough to rewrite the query as a two-table join with no subquery, and MySQL handles that correctly, so this is in no way a showstopper. But it's pretty disturbing. It's disturbing because it suggests either that MySQL's query analyzer can't produce a correct dependency graph, or else that MySQL isn't even using dependency graphs to plan queries. I haven't dug into the source code to see which of these is the case. But either way, it suggests that there are more unhappy surprises waiting for me when I find myself needing to write queries that actually aren't trivial.
Note: At the time of this writing the newest MySQL community GA release is 5.5.13. I am using MySQL 5.1 + InnoDB plugin, because that's what the Ubuntu 11 instance I have makes easy to install. The next VM I set up will probably not use Ubuntu, because its MySQL packages lag the releases so badly and I don't want to have to roll my own installs for something as crucial as this.