Whatever....: database theory

Showing posts with label database theory. Show all posts

Tuesday, January 25, 2011

Theory : database sharding strategies

There are a number of database sharding strategies that meet the diverse requirements of different categories of application.

Shard by Modulus
For many applications, it is appropriate to shard based on a shard key such as a User ID. Using a modulus of a numeric ID, especially an auto increment primary key, ensures even distribution of data between shards.

Shard by Date/Time Range
For time-based data such as feeds or blogs where data is accumulating over time, it may make sense to shard by date range. For example, each shard could contain data for a single month. New shards can be added each month and old shards can be dropped once historic data is no longer needed.

Master Lookup
It is sometimes a requirement to control the sharding manually or in an application specific manner. One example would be a requirement to host key customer accounts on shards hosted on higher specification hardware. To support this requirement, a master shard can be created which contains lookup tables to map customer IDs to a specific shard number.

Session-based Sharding
Some categories of application, particularly user-centric web applications, can choose a shard when a customer logs in and then direct all queries to the same shard for the duration of that user session.

Fixed Shard
Tables are mapped to specific fixed shards. Also known as table based sharding.

Custom Sharding
If there is any specific logic for sharding, a piece of code can be used to shard data based on that logic.

Global Tables
Global tables are tables which are hosted in all shards and data is automatically replicated across all shards. The benefit is that these tables can be used in joins with sharded tables in each shard. Global tables are typically fairly static tables or with low write volume, such as product codes, countries, and other reference data.

Wednesday, September 02, 2009

DB Basics : Types of joins

Join Types

So, you have designed a normalized database design for your application. Maybe up to the 3rd normal form. And now, when you need to run queries, you would need to join the tables in the query to get the required information. There has to be some common data that allow those tables to be connected in some meaningful way. Although it’s possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.

Lets perform queries against the following table.

company : company_id (pk), company_name
model : model_id(pk), company_id(fk), model_name

Here, we have a company table and a model table in 1:n relationship. A car company can have multiple models in the market. Let us take some sample data

company_id	company_name
1	Audi
2	GM
3	Ford
4	Toyota
5	Tata
6	BMW

model_id	company_id	model_name
1	1	A4
2	1	A6
3	1	A8
4	1	Q7
5	2	Chevrolet
6	2	Hummer
7	3	Ikon
8	3	Endeavor
9	3	Fiesta
10	4	Corola
11	4	Camry
12	4	Innova
13	5	Indica
14	5	Nano
15	5	Safari
16	Null	Custom