One thing that disturbs me is the proliferation evil agents who love key-value stores. Especially those that love key-value stores in a latency-infested cloud. What upsets me more are the infinitely confused people who believe a database is *worse* than their key-value storage.
Here is one where Ian prefers Cassandra over proper database indexes:
For some reason, Ian has compared his terrible query to his optimized Cassandra implementation. The query (and schema) are so bad, I suspect it’s a straw man.
Ian does not provide the SQL which makes him conclude that “Computing the intersection with a JOIN is much too slow in MySQL, so we have to do it in PHP.”. Any statement that implies a join is done faster outside the database should set of warning bells: The database should have all the information required to make your queries fast. If this is not the case, then something is seriously wrong with your indexes.
An all-database solution, even if it is a stored procedure, will be faster than a networked solution just because of latency. Personally, I have found returning a few hundred extra rows from a single “close enough” query significantly faster than issuing two queries with perfect results: Latency is your biggest enemy.
Let’s look at the Digg schema provided:
CREATE TABLE 'Diggs' (
'id' INT(11),
'itemid' INT(11),
'userid' INT(11),
'digdate' DATETIME,
PRIMARY KEY ('id'),
KEY 'user' ('userid'),
KEY 'item' ('itemid')
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE 'Friends' (
'id' INT(10) AUTO_INCREMENT,
'userid' INT(10),
'username' VARCHAR(15),
'friendid' INT(10),
'friendname' VARCHAR(15),
'mutual' TINYINT(1),
'date_created' DATETIME,
PRIMARY KEY ('id'),
UNIQUE KEY 'Friend_unique' ('userid','friendid'),
KEY 'Friend_friend' ('friendid')
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Some changes to the indexes would help:
- KEY ‘user’ (‘userid’) – does not help much when a user has digged many items: The index will help by pointing to all the actual ‘Diggs’ records, but the database will have to load every one of those blocks from disk to get that information (very likely one block per record). I would have suggested UNIQUE KEY ‘user’ (‘userid’, ‘itemid’, ‘digdate’) – This would have allowed the query to simply use the index, and not have to go back to the massive, unsorted, ‘Diggs’ table.
- UNIQUE KEY ‘Friend_unique’ (‘userid’,‘friendid’) – Seems to be the correct index to for “Query Friends for all my friends.”; this should be a single block lookup. There is no reason this should take 1.5seconds.
- KEY ‘Friend_friend’ (‘friendid’) – Maybe instead, Ian intended to have a list of all users that made ‘me’ a friend, rather than all users ‘I’ have befriended. This certainly explains the 1.5sec response time. In this case, the index should be expanded so the table blocks do not need to be loaded: UNIQUE KEY ‘Friend_friend’ (‘friendid’, ‘userid’).
- Maybe MySQL is poor database and loads the original records during a query even if the columns are not needed
Anyone who may be complaining about the extra disk space required to write these indexes should note that Ian’s Cassandra implementation consumes much more space than I am advocating here.
Even *IF* the Digg database is so big that the index lookups take too long, we should realize that we can pre-compute query results in the database, just like in Ian’s Cassandra implementation. If the database does not have materialized views, we can always add triggers to do the job ourselves. The former is still a limited technology, and the latter is quite messy, but both are better than changing your whole platform.
Finally, it seems Ian is trying to optimize for the worst case: “Kevin Rose, for example, has 40,000 followers”. I disagree with changing your infrastructure for a single use case for a minority of users, but that is a business decision that involves more issues than Ian’s blog entry can be expected to consider.
In conclusion, I am angry that the human race has lost another soul to the legion of key-value fanatics. I am further incensed that apparently 298 other nameless souls have followed Ian into the pits of hell. (298 diggs at time of writing).