Tuesday, May 29, 2012

Facebook database design?


I have always wondered how Facebook designed the friend <-> user relation.



I figure the user table is something like this:




user_email PK
user_id PK
password



I figure the table with user's data (sex, age etc connected via user email I would assume).



How does it connect all the friends to this user?



Something like this?




user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N



Probably not. Because the number of users is unknown and will expand.


Source: Tips4all

15 comments:

  1. Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.

    Somewhat useful example:

    Table Name: User
    Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

    TableName: Friends
    Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign
    keys, both pointing back to the user table. One ID will point to the
    logged in user, the other ID will point to the individual friend
    of that user)


    Example Usage:

    Table User
    --------------
    UserID EmailAddress Password Gender DOB Location
    ------------------------------------------------------
    1 bob@bob.com bobbie M 1/1/2009 New York City
    2 jon@jon.com jonathan M 2/2/2008 Los Angeles
    3 joe@joe.com joseph M 1/2/2007 Pittsburgh

    Table Friends
    ---------------
    UserID FriendID
    ----------------
    1 2
    1 3
    2 3


    This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.

    ReplyDelete
  2. It's most likely a many to many relationship:

    FriendList (table)

    user_id -> users.user_id
    friend_id -> users.user_id
    friendVisibilityLevel


    EDIT

    The user table probably doesn't have user_email as a PK, possibly as a unique key though.

    users (table)

    user_id PK
    user_email
    password

    ReplyDelete
  3. My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.

    Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.

    ReplyDelete
  4. Take a look at these articles describing how LinkedIn and Digg are built:


    http://hurvitz.org/blog/2008/06/linkedin-architecture
    http://highscalability.com/scaling-digg-and-other-web-applications


    There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:

    http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

    Also, there's this article that talks about non-relational databases and how they're used by some companies:

    http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

    You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

    There are a lot of links on the first two articles that should give you some more insight.

    HTH

    ReplyDelete
  5. Its not possible to retrive datas from rdbms for user friends datas for datas which crossed more than half a billion at a constant time
    so facebook implemented this using hash database(no sql) and they opensourced the database called cassandra
    so every user has its own key and the friends details in a queue
    to know how cassandra works look at this
    http://prasath.posterous.com/cassandra-55

    thanks

    ReplyDelete
  6. Searching a little on Google I found this reverse-engineered object model / DB schema.

    However, this is probably far from what you would find in reality on Facebook's servers. They claim to have >200 million users and we can easily guess that the number of FriendRelations will be in the range of billions. So this will need a highly optimized schema and database engine probably only the people working at Facebook will know.

    ReplyDelete
  7. Have a look at the following database schema, reverse engineered by Anatoly Lubarsky:

    ReplyDelete
  8. You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.



    Example schema:


    Users Table
    userID PK
    other data
    Friends Table
    userID -- FK to users's table representing the user that has a friend.
    friendID -- FK to Users' table representing the user id of the friend

    ReplyDelete
  9. Keep in mind that database tables are designed to grow vertically (more rows), not horizontally (more columns)

    ReplyDelete
  10. Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.

    Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.

    ReplyDelete
  11. My guess would be something along the lines of a large key-value store for speed. This probably isn't what you'd be doing for a smaller site, as it makes things a lot more complex. For example, there would be something along the lines of:

    get_data(userid = 12345) returns: { name: "John Doe", email: ... }
    get_friends(userid = 12345) returns: { 22222, 33333, 44444, ... }


    When updating friends information, the data on both sides would need to be updated.

    ReplyDelete
  12. Well thinking about Oracle, I've heared that is used for military purpose too... Maybe it has something to deal with a lot of datas too. I think is a big advantage, for facebook, to use a relational database instead of a non relatinal.... but obviusly this is my way of thinking

    ReplyDelete
  13. Its a type of graph database:
    http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html

    Its not related to Relational databases.

    Google for graph databases.

    ReplyDelete
  14. Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.

    Whenever a user adds another user as a friend, two new rows are created.

    For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').

    Now when showing the friends-list to a particular user, a simple sql would do that: "select frnd_id from frnd_list where user_id="
    where is the id of the logged-in user (stored as a session-attribute).

    ReplyDelete
  15. I dunno if its the right thing to do - but what if a separate table is made which contains the friends' id?. For example the tables would be named as User1_frndlist, User2_frndlist,... and so on.

    These would contain the ids of other users(friends) which that user(id 1) has added.

    We can control which table to see by looking into the session variable and appending that id no. to the table name.

    Would this method be inefficient in some way?

    ReplyDelete