Which table do I join first?

From time to time, I get confused about the order and the outcome of joins when presented with multiple options. I have decided to make a cheat sheet for future reference.

You can either follow the examples step by step below or jump straight to the summary at the end.

Let’s prepare the sample tables and data:

use tempdb

create table Security
(
    SecurityID int
    , SecurityName varchar(100)
)

create table SecurityCodeType
(
    SecurityCodeTypeID int
    , SecurityCodeType varchar(100)
)

create table SecurityCode
(
    SecurityID int
    , SecurityCodeTypeID int
    , SecurityCode varchar(100)
)
    
insert Security values (123, 'BHP BILLITON LIMITED')
insert SecurityCodeType values (1, 'ASX')
insert SecurityCode values (123, 1, 'BHP')

Now, looking at the following three queries, they all return the same result.

select a.SecurityName, b.SecurityCode, c.SecurityCodeType
from Security a
left join SecurityCode b on a.SecurityID = b.SecurityID
join SecurityCodeType c  on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                            and c.SecurityCodeType = 'ASX'

image

select a.SecurityName, b.SecurityCode, c.SecurityCodeType
from Security a
left join SecurityCode b        on a.SecurityID = b.SecurityID
left join SecurityCodeType c    on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                                    and c.SecurityCodeType = 'ASX'

image

select a.SecurityName, b.SecurityCode, b.SecurityCodeType
from Security a
left join 
(
    select b.SecurityID, b.SecurityCode, c.SecurityCodeType
    from SecurityCode b    
    join SecurityCodeType c    on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                                and c.SecurityCodeType = 'ASX'                                    
) b on a.SecurityID = b.SecurityID

image

This is only true if the matching data is found in all three tables. In other words, the LEFT OUTER JOINs don’t serve much purpose here.

Now if we manipulate the queries so the LEFT OUTER JOINs start to have effects, the results will become:

select a.SecurityName, b.SecurityCode, c.SecurityCodeType
from Security a
left join SecurityCode b    on a.SecurityID = b.SecurityID
join SecurityCodeType c        on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                                and c.SecurityCodeType = 'SEDOL'

image

select a.SecurityName, b.SecurityCode, c.SecurityCodeType
from Security a
left join SecurityCode b        on a.SecurityID = b.SecurityID
left join SecurityCodeType c    on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                                    and c.SecurityCodeType = 'SEDOL'

image

select a.SecurityName, b.SecurityCode, b.SecurityCodeType
from Security a
left join 
(
    select b.SecurityID, b.SecurityCode, c.SecurityCodeType
    from SecurityCode b    
    join SecurityCodeType c    on b.SecurityCodeTypeID = c.SecurityCodeTypeID
                                and c.SecurityCodeType = 'SEDOL'                                    
) b on a.SecurityID = b.SecurityID

image

If the differences in the above queries are not obvious to you, see if it is any better if we read them in English:

1. Return all securities which has a SEDOL code.

2. Return all securities AND their codes if available AND indicate the SEDOL codes.

3. Return all securities AND the codes only if they are SEDOL codes.

Now I hope you understand the strategic placement of the LEFT OUTER JOIN is a critical decision to make that it may totally change the information you intended to extract from the database.

I have summarised the above observations in a table below for quick future reference.

Pattern Mandatory Optional
from Table1
left join Table2
join Table3
Table1, Table3. Table2.
from Table1
left join Table2
left join Table3
Table1. Table2, Table3.
from Table1
left join
(
    from Table2
    join Table3
)
Table1. Table2 + Table3.