Notes on Redis: Data Modeling, Hashes, and Namespaces
March 30, 2010
h3. What is Redis? Redis is a simple database. It stores its data in memory and saves a snapshot to disk in the background periodically. It can be used as a simple key-value store like Memcached, but it also supports more complex data types such as lists, sets, sorted sets and hashes. These will come in handy. Redis is versatile and is used for a plethora of tasks: "as a job queue":http://github.com/defunkt/resque, as a cache store, for "tracking downloads of Ruby Gems":http://gist.github.com/296921, for "storing A/B testing results":http://vanity.labnotes.org/, and even as the backend for "a chat server":http://try.redis-db.com:8082. Redis has been receiving a lot of praise lately and is the subject of lot of excitement online. This praise is deserved – it's an excellent piece of software. I've had the pleasure of using Redis recently and here are some of my findings. h3. Our Use Case On "mediaFAIL":http://mediafail.com, we wanted to track how many mentions a fail receives on Twitter. We wanted to show a count of a fail's Twitter mentions and roll that number into a fail's votes – this feature lies somewhere between "act.ly":http://act.ly/ and "Tweetmeme":http://tweetmeme.com/. Couldn't we just store each tweet in our @votes@ table? Perhaps. But this would break our "user has many fails through votes" association. How do I store this data in Redis? h3. Data Modeling My first pass at the problem took advantage Redis's sets. For each fail that was mentioned on Twitter, I created a set containing the users that mention a fail. So, key @fail:1:tweets@ contained the set @['alexanderkahn', 'levjoy', 'TimKarr']@. Since elements in sets must be unique, I wouldn't have to worry about a user making multiple mentions of a fail getting counted more than once. This was an elegant solution: it allowed looking up the number of mentions a fail has received using @SLEN@ (set length) and retrieving all the Twitter usernames for a fail using @SMEMBERS@ (set members). As I continued work on this feature, I realized that I wanted to link to the specific tweet where a Twitter user mentioned a fail. One way to do that would be to set a string key for each member of a set that corresponds to a member of the @tweet:1:tweets@ set. So, for the above example, I would also set @fail:1:tweet:alexanderkahn@ to "12330508057", the tweet ID for my mention of the fail. But this would cause a lot of keys to be created and mean that looking up usernames and tweet IDs for a fail would require a great deal of queries for a popular fail. Thanks to Redis's new hash data structure, there was a cleaner way. In a hash, I can store both the username and the tweet ID under the same @fail:1:tweets@ key. The hash for this key would look like @{"alexanderkahn" => "12330508057", "levjoy" => "12330508067", "TimKarr" => "12330518057"}@. Since hash keys have to be unique, a Twitter mention can't be counted twice, just like with a set. Now I can look up how many mentions a fail has received with @HLEN@ (hash length) and fetch both usernames an tweet IDS with one @HGETALL@ (hash get all) query. Groovy. h3. Redis and Ruby The Ruby library for Redis, written by Ezra Zygmuntowicz, is a pleasure to use. It has certain touches that make it feel like Redis was born for Ruby. One example of this quality is "how it handles Redis's @MULTI@/@EXEC@ transactions":http://github.com/ezmobius/redis-rb/blob/master/lib/redis/client.rb#L395-407:r = Redis.new
r.multi do
r.set 'foo', 'bar'
r.incr 'baz'
end
redis = Redis.new
namespaced = Redis::Namespace.new(:my_feature, :redis => redis)
# Interact with namespaced Redis as you would a normal Redis client:
namespaced.set "foo", "bar"
namespaced.get "foo"
# => 'bar'
namespaced.keys "*"
# => ['foo']
# The actual key in Redis is prefixed with our namespace:
redis.keys "*"
# => ['my_feature:foo']
$redis = Redis.new # Or a namespaced equivalent