Showing posts with label database theory. Show all posts
Showing posts with label database theory. Show all posts

Tuesday, January 25, 2011

Theory : database sharding strategies

There are a number of database sharding strategies that meet the diverse requirements of different categories of application.

Shard by Modulus
For many applications, it is appropriate to shard based on a shard key such as a User ID. Using a modulus of a numeric ID, especially an auto increment primary key, ensures even distribution of data between shards.

Shard by Date/Time Range
For time-based data such as feeds or blogs where data is accumulating over time, it may make sense to shard by date range. For example, each shard could contain data for a single month. New shards can be added each month and old shards can be dropped once historic data is no longer needed.

Master Lookup
It is sometimes a requirement to control the sharding manually or in an application specific manner. One example would be a requirement to host key customer accounts on shards hosted on higher specification hardware. To support this requirement, a master shard can be created which contains lookup tables to map customer IDs to a specific shard number.

Session-based Sharding
Some categories of application, particularly user-centric web applications, can choose a shard when a customer logs in and then direct all queries to the same shard for the duration of that user session.

Fixed Shard
Tables are mapped to specific fixed shards. Also known as table based sharding.

Custom Sharding
If there is any specific logic for sharding, a piece of code can be used to shard data based on that logic.

Global Tables
Global tables are tables which are hosted in all shards and data is automatically replicated across all shards. The benefit is that these tables can be used in joins with sharded tables in each shard. Global tables are typically fairly static tables or with low write volume, such as product codes, countries, and other reference data.

Wednesday, September 02, 2009

DB Basics : Types of joins

Join Types

So, you have designed a normalized database design for your application. Maybe up to the 3rd normal form. And now, when you need to run queries, you would need to join the tables in the query to get the required information. There has to be some common data that allow those tables to be connected in some meaningful way. Although it’s possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.

Lets perform queries against the following table.

company : company_id (pk), company_name
model : model_id(pk), company_id(fk), model_name

Here, we have a company table and a model table in 1:n relationship. A car company can have multiple models in the market. Let us take some sample data









company_idcompany_name
1Audi
2GM
3Ford
4Toyota
5Tata
6BMW




















model_idcompany_idmodel_name
11A4
21A6
31A8
41Q7
52Chevrolet
62Hummer
73Ikon
83Endeavor
93Fiesta
104Corola
114Camry
124Innova
135Indica
145Nano
155Safari
16NullCustom


Inner join : An inner join is defined as a join in which rows must match in both tables in order to be included in the result set.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 INNER JOIN model t2 on t1.company_id = t2.company_id where t1.company_name = 'Audi';

+--------------+-------+
| Manufacturer | Model |
+--------------+-------+
| Audi | A4 |
| Audi | A6 |
| Audi | A8 |
| Audi | Q7 |
+--------------+-------+


Outer Join : Outer joins will return records in one table that aren’t matched in another. Outer joins can be further divided into the two types of left and right. In a left outer join, all records from the first (left-hand) table in a join that meet any conditions set in the WHERE clause are returned, whether or not there’s a match in the second (right-hand) table.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 left join model t2 on t1.company_id = t2.company_id where t1.company_name in ('Toyota','BMW');
+--------------+--------+
| Manufacturer | Model |
+--------------+--------+
| Toyota | Corola |
| Toyota | Camry |
| Toyota | Innova |
| BMW | NULL |
+--------------+--------+


Here 'BMW' is returned even when it does not have any entry in the model Table.

Similar to the left outer join, a right outer join returns all records from the second (right-hand) table in a join that meet any conditions set in the WHERE clause, whether or not there’s a match in the first (left-hand) table.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1 right join model t2 on t1.company_id = t2.company_id where t2.model_name in ('Custom','Nano');
+--------------+--------+
| Manufacturer | Model |
+--------------+--------+
| Tata | Nano |
| NULL | Custom |
+--------------+--------+


Cross-join :

The cross-join, also referred to as a Cartesian product, returns all the rows in all the tables listed in the query. Each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables. We do not require cross join in our general applications, so we should try to avoid it. A cross join happens when we fail to provide a filler for the join in the query.

mysql> select t1.company_name as 'Manufacturer', t2.model_name as 'Model' from company t1, model t2;
+--------------+-----------+
| Manufacturer | Model |
+--------------+-----------+
| Audi | A4 |
| GM | A4 |
| Ford | A4 |
| Toyota | A4 |
| Tata | A4 |
| BMW | A4 |
| Audi | A6 |
| GM | A6 |
.....
.....
.....
| Ford | Custom |
| Toyota | Custom |
| Tata | Custom |
| BMW | Custom |
+--------------+-----------+