super ninja privacy techniques

5 April, 2007

At the Etech 2007, Marc Hedlund and Brad Greenlee gave a technical talk about privacy techniques for web applications. They both work for Wesabe, a online community where people can manage their money. Users can upload their bank account information which is aggregated in the community. From the collected information good tips and recommendations can be made to help people reach their financial goals.

However, the talk was more about low level techniques for better privacy. There were five points that were dealt with:

  1. critical data local – for a user it can be rather frightening to upload all his account information into wesabe. Not just for ‘shameful expenses’, but there just are some things you don’t want the rest of the world to see about your spending habits. The solution Wesabe takes is to offer a local download client with filters. This tool downloads information from your bank, filters it and then uploads it to Wesabe. It might not be about real privacy (a user cannot really see if the information is actually filtered), but it solves the trust issue. There are some downsides to this approach: users have to go over a threshold (download a client), the burden is now placed on the security level of the user’s computer and there is a severe risk for trojans.
  2. privacy wall – this is a clever idea: Normally, in a database tables are connected through keys: each row in one table has an identifier, the other table a reference to that identifier. In case of Wesabe, there are some tables that connect the user (with its id) to some piece of information (referencing the id). However, it would be better to keep this connection secret. This is easily done by cryptographic hash of the reference to an id. This way, without some sort of password the connection cannot be made. Again, there are some problems with this approach: the biggest is when a user forgets his password. In this scenario, it takes some more effort to get all information back. (read Brad’s comment on this writeup) For a more in depth explanation of this idea, read Brad’s blog posting on the subject.
  3. partitioning – in a way, this concept is somewhat related to the previous. It is always possible that a system become compromised to people with bad intentions. When this happens the actual damage should be kept to a minimum. What Wasabe does, is partioning the databases in such a way that different kinds of data about the same user are stored in different places. For example, a membership and account database can be kept apart. Then a security breach stays compartimentalized. This compartimentalisation is even better when the databases are stored on different systems (not only physical, but also different OSes, database systems, etc).
  4. data fuzzing and log scrubbing – when building a web application with modern tools, a lot of debug and logging is done automatically by the system (for example in ruby on rails or django). This poses a serious threat, as these logs often contain sensitive information. Not just explicitly, also timestamps and IP addresses might be traced back to certain users or other information. When designing and building such a system logs and debug information should be handled very carefully. Wesabe made a point of scrubbing the logs meticiously, and had a retainment policy for logs. Error messages, which are normally send around to developers are now stored on disk and link is sent. When the error is dealt with the log is immediately deleted. However, it keeps being a challenge to fix all possible holes (for example backups of logs also pose problems).
  5. voting algorithms – Wesabe relies on the community to build up knowledge about account information. For example, codes for bankaccount numbers are hard to read. When a user changes such a number to a sensible name, this might be interesting for other users as well. Again, this might be a privacy problem: not all users have to see the name someone gives to an account number. This is fixed by a voting algorithm, just like the one that is being used by Google to classify pictures. If a certain amount of people classify a picture as being a cat, then it is probably a cat. This way, only common knowledge becomes public, without introducing privacy problems.
  6. miscellaneous – furthermore, there were some best practices. Of course, one should always hash passwords in a database. Database IDs should be randomized instead of sequential (although this often is the default in database systems). Finally, the company or website should have a policy to describe how is being dealt with privacy sensitive information.

2 Responses to “super ninja privacy techniques”

  1. Thanks for the writeup, Joost. One clarification on the privacy wall: if a user loses their password, all is not lost. For one, if the user can give you enough information about their private data (in our case, last 4 digits of account numbers, names of banks, and some recent transaction names and amounts) so that you can (a) find their data in the system and (b) verify that they own that data, you can then re-associate it with their user record. Also, I didn’t get into this during the talk, but we have a whole password recovery mechanism in place (using security questions and an email address) that allows a user to reset their password if they forget it, without losing their data. I’m planning on doing a follow-up blog post soon that gets into more of these details.

  2. Brad, thank you very much for your comment, I’ve changed the post, and I’m looking forward to the follow-up!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: