[system design] data partitioning

Data Partioning

Data partitioning is a technique to break up a big database (DB) into many smaller parts. It is the process of splitting up a DB/table across multiple machines to improve the manageability, performance, availability, and load balancing of an application.

Partitioning Methods

  • Horizontal Partitioning(Data Sharding) In this scheme, we put different rows into different tables.
    • Disadvantage If the value whose range is used for partitioning isn’t chosen carefully, then the partitioning scheme will lead to unbalanced servers.
  • Vertical Partitioning In this scheme, we divide our data to store tables related to a specific feature in their own server.
    • Example For example, if we are building Instagram like application - where we need to store data related to users, photos they upload, and people they follow - we can decide to place user profile information on one DB server, friend lists on another, and photos on a third server.
    • Disadvantage The main problem with this approach is that if our application experiences additional growth, then it may be necessary to further partition a feature specific DB across various servers
  • Directory Based Partitioning A loosely coupled approach to work around issues mentioned in the above schemes is to create a lookup service which knows your current partitioning scheme and abstracts it away from the DB access code.

Partitioning Criteria

  • Key or Hash-based partitioning We apply a hash function to some key attributes of the entity we are storing; that yields the partition number.
  • List partitioning