Joining Data in SQL
Short Description
Discover all the fundamental in PostgreSQL, like inner, outer self, semi, anti, and cross joins, as well as unions, intersections, and except clauses.
Long Description
Now that you’ve learned the basics of SQL in our Introduction to SQL course, it’s time to supercharge your queries using joins and relational set theory. In this course, you’ll learn all about the power of joining tables while exploring interesting features of countries and their cities throughout the world. You will master inner and outer joins, as well as self joins, semi joins, anti joins and cross joins—fundamental tools in any PostgreSQL wizard’s toolbox. Never fear set theory again after learning all about unions, intersections, and except clauses through easy-to-understand diagrams and examples. Lastly, you’ll be introduced to the challenging topic of subqueries. You will be able to visually grasp these ideas by using Venn diagrams and other linking illustrations.
1 1. Introduction to joins
In this chapter, you’ll be introduced to the concept of joining tables, and will explore the different ways you can enrich your queries using inner joins and self joins. You’ll also see how to use the case statement to split up a field into different categories.
1.1 Introduction to INNER JOIN
Theory. Coming soon …
1. Welcome to the course!
Hi, my name is Chester Ismay and I’ll be your instructor for this course on Joining Data in PostgreSQL. As the name suggests, the focus of this course is using SQL to join two or more database tables together into a single table, an essential skill for data scientists. In this chapter, you’ll learn about the INNER JOIN, which along with LEFT JOIN are probably the two most common JOINs. You’ll see diagrams throughout this course that are designed to help you understand the mechanics of the different joins. Let’s begin with a diagram showing the layout of some data and then how an INNER JOIN can be applied to that data.
2. Initial data diagram
In the videos in this chapter and the next, we’ll often work with two tables named left and right. You can see that matching values of the id field are colored with the same color. The id field is known as a KEY field since it can be used to reference one table to another. Both the left and right tables also have another field named val. This will be useful in helping you see specifically which records and values are included in each join.
3. INNER JOIN diagram
An INNER JOIN only includes records in which the key is in both tables. You can see here that the id field matches for values of 1 and 4 only. With inner joins we look for matches in the right table corresponding to all entries in the key field in the left table.
4. INNER JOIN diagram (2)
So the focus here shifts to only those records with a match in terms of the id field. The records not of interest to INNER JOIN have been faded.
5. INNER JOIN diagram (3)
Here’s a resulting single table from the INNER JOIN clause that gives the val field from the right table with records corresponding to only those with id value of 1 or 4, which are colored as yellow and purple. Now that you have a sense for how INNER JOIN works, let’s try an example in SQL.
6. prime_ministers table
The prime_ministers table is one of the tables in the leaders database. It is displayed here. Note the countries that are included. Suppose you were interested in determining nations that have both a prime minister and a president AND putting the results into a single table. Next you’ll see the presidents table.
7. presidents table
How did I display all of the prime_ministers table in the previous slide? Recall the use of SELECT and FROM clauses as is shown for the presidents table here.Which countries appear in both tables? With small tables like these, it is easy to notice that Egypt, Portugal, Vietnam, and Haiti appear in both tables. For larger tables, it isn’t as simple as just picking these countries out visually. So what does the syntax look like for SQL to get the results of countries with a prime minister and a president from these two tables into one?
8. INNER JOIN in SQL
The syntax for completing an INNER JOIN from the prime_ministers table to the presidents table based on a key field of country is shown. Note the use of aliases for prime_ministers as p1 and presidents as p2. This helps to simplify your code, especially with longer table names like prime_ministers and presidents. A SELECT statement is used to select specific fields from the two tables. In this case, since country exists in both tables, we must write p1 and the period to avoid a SQL error. Next we list the table on the left of the inner join after FROM and then we list the table on the right after INNER JOIN. Lastly, we specify the keys in the two tables that we would like to match on.
9. Let’s practice!
You’ll now practice applying an inner join to two tables and to three tables. Let’s get to it!
1.2 Inner join
Although this course focuses on PostgreSQL, you’ll find that these joins and the material here applies to different forms of SQL as well.
Throughout this course, you’ll be working with the countries
database containing information about the most populous world cities as well as country-level economic data, population data, and geographic data. This countries
database also contains information on languages spoken in each country.
You can see the different tables in this database by clicking on the corresponding tabs. Click through them to get a sense for the types of data that each table contains before you continue with the course! Take note of the fields that appear to be shared across the tables.
Recall from the video the basic syntax for an INNER JOIN
, here including all columns in both tables:
You’ll start off with a SELECT
statement and then build up to an INNER JOIN
with the cities
and countries
tables. Let’s get to it!
Steps
- Begin by selecting all columns from the
cities
table.
name | country_code | city_proper_pop | metroarea_pop | urbanarea_pop |
---|---|---|---|---|
Abidjan | CIV | 4765000 | NA | 4765000 |
Abu Dhabi | ARE | 1145000 | NA | 1145000 |
Abuja | NGA | 1235880 | 6000000 | 1235880 |
Accra | GHA | 2070463 | 4010054 | 2070463 |
Addis Ababa | ETH | 3103673 | 4567857 | 3103673 |
Ahmedabad | IND | 5570585 | NA | 5570585 |
Alexandria | EGY | 4616625 | NA | 4616625 |
Algiers | DZA | 3415811 | 5000000 | 3415811 |
Almaty | KAZ | 1703481 | NA | 1703481 |
Ankara | TUR | 5271000 | 4585000 | 5271000 |
- Inner join the
cities
table on the left to thecountries
table on the right, keeping all of the fields in both tables. - You should match the tables on the
country_code
field incities
and thecode
field incountries
. - Do not alias your tables here or in the next step. Using
cities
andcountries
is fine for now.
SELECT *
FROM cities
-- Inner join to countries
INNER JOIN countries
-- Match on the country codes
ON cities.country_code = countries.code;
name | country_code | city_proper_pop | metroarea_pop | urbanarea_pop | code | name..7 | continent | region | surface_area | indep_year | local_name | gov_form | capital | cap_long | cap_lat |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abidjan | CIV | 4765000 | NA | 4765000 | CIV | Cote d’Ivoire | Africa | Western Africa | 322463 | 1960 | Cote dIvoire | Republic | Yamoussoukro | -4.03050 | 5.33200 |
Abu Dhabi | ARE | 1145000 | NA | 1145000 | ARE | United Arab Emirates | Asia | Middle East | 83600 | 1971 | Al-Imarat al-´Arabiya al-Muttahida | Emirate Federation | Abu Dhabi | 54.37050 | 24.47640 |
Abuja | NGA | 1235880 | 6000000 | 1235880 | NGA | Nigeria | Africa | Western Africa | 923768 | 1960 | Nigeria | Federal Republic | Abuja | 7.48906 | 9.05804 |
Accra | GHA | 2070463 | 4010054 | 2070463 | GHA | Ghana | Africa | Western Africa | 238533 | 1957 | Ghana | Republic | Accra | -0.20795 | 5.57045 |
Addis Ababa | ETH | 3103673 | 4567857 | 3103673 | ETH | Ethiopia | Africa | Eastern Africa | 1104300 | -1000 | YeItyop´iya | Republic | Addis Ababa | 38.74680 | 9.02274 |
Ahmedabad | IND | 5570585 | NA | 5570585 | IND | India | Asia | Southern and Central Asia | 3287260 | 1947 | Bharat/India | Federal Republic | New Delhi | 77.22500 | 28.63530 |
Alexandria | EGY | 4616625 | NA | 4616625 | EGY | Egypt | Africa | Northern Africa | 1001450 | 1922 | Misr | Republic | Cairo | 31.24610 | 30.09820 |
Algiers | DZA | 3415811 | 5000000 | 3415811 | DZA | Algeria | Africa | Northern Africa | 2381740 | 1962 | Al-Jazair/Algerie | Republic | Algiers | 3.05097 | 36.73970 |
Almaty | KAZ | 1703481 | NA | 1703481 | KAZ | Kazakhstan | Asia | Southern and Central Asia | 2724900 | 1991 | Qazaqstan | Republic | Astana | 71.43820 | 51.18790 |
Ankara | TUR | 5271000 | 4585000 | 5271000 | TUR | Turkey | Asia | Middle East | 774815 | 1923 | Turkiye | Republic | Ankara | 32.36060 | 39.71530 |
- Modify the
SELECT
statement to keep only the name of the city, the name of the country, and the name of the region the country resides in. - Alias the name of the city
AS city
and the name of the countryAS country
.
-- Select name fields (with alias) and region
SELECT cities.name AS city, countries.name AS country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;
city | country | region |
---|---|---|
Abidjan | Cote d’Ivoire | Western Africa |
Abu Dhabi | United Arab Emirates | Middle East |
Abuja | Nigeria | Western Africa |
Accra | Ghana | Western Africa |
Addis Ababa | Ethiopia | Eastern Africa |
Ahmedabad | India | Southern and Central Asia |
Alexandria | Egypt | Northern Africa |
Algiers | Algeria | Northern Africa |
Almaty | Kazakhstan | Southern and Central Asia |
Ankara | Turkey | Middle East |
Great work! In the next exercise you’ll explore how you can do more aliasing to limit the amount of writing.
1.3 Inner join (2)
Instead of writing the full table name, you can use table aliasing as a shortcut. For tables you also use AS
to add the alias immediately after the table name with a space. Check out the aliasing of cities
and countries
below.
Notice that to select a field in your query that appears in multiple tables, you’ll need to identify which table/table alias you’re referring to by using a .
in your SELECT
statement.
You’ll now explore a way to get data from both the countries
and economies
tables to examine the inflation rate for both 2010 and 2015.
Sometimes it’s easier to write SQL code out of order: you write the SELECT
statement after you’ve done the JOIN
.
Steps
- Join the tables
countries
(left) andeconomies
(right) aliasingcountries AS c
andeconomies AS e
. - Specify the field to match the tables
ON
. - From this join,
SELECT
:c.code
, aliased ascountry_code
.name
,year
, andinflation_rate
, not aliased.
c.code
, aliased ascountry_code
.name
,year
, andinflation_rate
, not aliased.
-- Select fields with aliases
SELECT c.code AS country_code, name, year, inflation_rate
FROM countries AS c
-- Join to economies (alias e)
INNER JOIN economies AS e
-- Match on code
ON c.code = e.code;
country_code | name | year | inflation_rate |
---|---|---|---|
AFG | Afghanistan | 2010 | 2.179 |
AFG | Afghanistan | 2015 | -1.549 |
AGO | Angola | 2010 | 14.480 |
AGO | Angola | 2015 | 10.287 |
ALB | Albania | 2010 | 3.605 |
ALB | Albania | 2015 | 1.896 |
ARE | United Arab Emirates | 2010 | 0.878 |
ARE | United Arab Emirates | 2015 | 4.070 |
ARG | Argentina | 2010 | 10.461 |
ARG | Argentina | 2015 | NA |
Nicely done! Using this short aliases takes some getting used to, but it will save you a lot of typing.
1.4 Inner join (3)
The ability to combine multiple joins in a single query is a powerful feature of SQL, e.g:
As you can see here it becomes tedious to continually write long table names in joins. This is when it becomes useful to alias each table using the first letter of its name (e.g. countries AS c
)! It is standard practice to alias in this way and, if you choose to alias tables or are asked to specifically for an exercise in this course, you should follow this protocol.
Now, for each country, you want to get the country name, its region, the fertility rate, and the unemployment rate for both 2010 and 2015.
Note that results should work throughout this course with or without table aliasing unless specified differently.
Steps
- Inner join
countries
(left) andpopulations
(right) on thecode
andcountry_code
fields respectively. - Alias
countries AS c
andpopulations AS p
. - Select
code
,name
, andregion
fromcountries
and also selectyear
andfertility_rate
frompopulations
(5 fields in total).
-- Select fields
SELECT c.code, name, region, year, fertility_rate
-- From countries (alias as c)
FROM countries AS c
-- Join with populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code;
code | name | region | year | fertility_rate |
---|---|---|---|---|
ABW | Aruba | Caribbean | 2010 | 1.704 |
ABW | Aruba | Caribbean | 2015 | 1.647 |
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 |
AGO | Angola | Central Africa | 2010 | 6.416 |
AGO | Angola | Central Africa | 2015 | 5.996 |
ALB | Albania | Southern Europe | 2010 | 1.663 |
ALB | Albania | Southern Europe | 2015 | 1.793 |
AND | Andorra | Southern Europe | 2010 | 1.270 |
AND | Andorra | Southern Europe | 2015 | NA |
- Add an additional
INNER JOIN
witheconomies
to your previous query by joining oncode
. - Include the
unemployment_rate
column that became available through joining witheconomies
. - Note that
year
appears in bothpopulations
andeconomies
, so you have to explicitly usee.year
instead ofyear
as you did before.
-- Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code
ON c.code = e.code;
code | name | region | year | fertility_rate | unemployment_rate |
---|---|---|---|---|---|
AFG | Afghanistan | Southern and Central Asia | 2015 | 5.746 | NA |
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 | NA |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 | NA |
AFG | Afghanistan | Southern and Central Asia | 2010 | 4.653 | NA |
AGO | Angola | Central Africa | 2015 | 6.416 | NA |
AGO | Angola | Central Africa | 2010 | 6.416 | NA |
AGO | Angola | Central Africa | 2015 | 5.996 | NA |
AGO | Angola | Central Africa | 2010 | 5.996 | NA |
ALB | Albania | Southern Europe | 2015 | 1.663 | 17.1 |
ALB | Albania | Southern Europe | 2010 | 1.663 | 14.0 |
- Scroll down the query result and take a look at the results for Albania from your previous query. Does something seem off to you?
- The trouble with doing your last join on
c.code = e.code
and not also includingyear
is that e.g. the 2010 value forfertility_rate
is also paired with the 2015 value forunemployment_rate
. - Fix your previous query: in your last
ON
clause, useAND
to add an additional joining condition. In addition to joining oncode
inc
ande
, also join onyear
ine
andp
.
-- Select fields
SELECT c.code, name, region, e.year, fertility_rate, unemployment_rate
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (as p)
INNER JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Join to economies (as e)
INNER JOIN economies AS e
-- Match on country code and year
ON c.code = e.code AND e.year = p.year;
code | name | region | year | fertility_rate | unemployment_rate |
---|---|---|---|---|---|
AFG | Afghanistan | Southern and Central Asia | 2010 | 5.746 | NA |
AFG | Afghanistan | Southern and Central Asia | 2015 | 4.653 | NA |
AGO | Angola | Central Africa | 2010 | 6.416 | NA |
AGO | Angola | Central Africa | 2015 | 5.996 | NA |
ALB | Albania | Southern Europe | 2010 | 1.663 | 14.00 |
ALB | Albania | Southern Europe | 2015 | 1.793 | 17.10 |
ARE | United Arab Emirates | Middle East | 2010 | 1.868 | NA |
ARE | United Arab Emirates | Middle East | 2015 | 1.767 | NA |
ARG | Argentina | South America | 2010 | 2.370 | 7.75 |
ARG | Argentina | South America | 2015 | 2.308 | NA |
Good work! Time to learn something new!
1.5 INNER JOIN via USING
Theory. Coming soon …
1. INNER JOIN via USING
Congratulations on making it through the first set of exercises on using INNER JOIN to combine two or three tables into one! You’ll next learn about the USING keyword in SQL and how it can be used in joins.
2. The INNER JOIN diagram again
Recall the INNER JOIN diagram you saw in the last video. Think about the SQL code needed to complete this diagram. Let’s check it out. We select and alias three fields and use the left table on the left of the join and the right table on the right of the join matching based on the entries for the id key field.
3. The INNER JOIN diagram with USING
When the key field you’d like to join on is the same name in both tables, you can use a USING clause instead of the ON clause you have seen so far.Since id is the same name in both the left table and the right table we can specify USING instead of ON here. Note that the parentheses are required around the key field with USING. Let’s revisit the example of joining the prime_ministers table
4. Countries with prime ministers and presidents
to the presidents table to determine countries with both types of leaders. How could you fill in the blanks to get the result with USING? (Pause for a few seconds)Did you get it? (PAUSE) Ah, I played a bit of a trick on you here. But why does this work? Since an INNER JOIN includes entries in both tables and both tables contain the countries listed, it doesn’t matter the order in which we place the tables in the join if we SELECT these columns. You’ll be told in the exercises which table to use on the left and on the right to avoid this confusion. Note again the use of the parentheses around country after USING.
5. Let’s practice!
Now you’ll test your understanding of INNER JOINs before we delve into an exercise with USING. Go get ’em!
1.6 Review inner join using on
1.7 Question
Why does the following code result in an error?
⬜ Thelanguages
table has more rows than thecountries
table.
⬜ There are multiple languages spoken in many countries.
✅INNER JOIN
requires a specification of the key field (or fields) in each table.
⬜ Join queries may not be followed by a semi-colon.
1.8 Inner join with using
When joining tables with a common field name, e.g.
You can use USING
as a shortcut:
You’ll now explore how this can be done with the countries
and languages
tables.
Steps
- Inner join
countries
on the left andlanguages
on the right withUSING(code)
. - Select the fields corresponding to:
- country name
AS country
, - continent name,
- language name
AS language
, and - whether or not the language is official.
Remember to alias your tables using the first letter of their names.
-- Select fields
SELECT c.name AS country, continent, l.name AS language, official
-- From countries (alias as c)
FROM countries AS c
-- Join to languages (as l)
INNER JOIN languages AS l
-- Match using code
USING(code);
country | continent | language | official |
---|---|---|---|
Afghanistan | Asia | Dari | TRUE |
Afghanistan | Asia | Pashto | TRUE |
Afghanistan | Asia | Turkic | FALSE |
Afghanistan | Asia | Other | FALSE |
Albania | Europe | Albanian | TRUE |
Albania | Europe | Greek | FALSE |
Albania | Europe | Other | FALSE |
Albania | Europe | unspecified | FALSE |
Algeria | Africa | Arabic | TRUE |
Algeria | Africa | French | FALSE |
Well done! Another technique to save you some typing!
1.9 Self-ish joins, just in CASE
Theory. Coming soon …
1. Self-ish joins, just in CASE
You’ll now dive into inner joins where a table is joined with itself. Sounds a little selfish, doesn’t it? These types of joins, as you may have guessed, are called self joins. You’ll also explore how to slice a numerical field into categories using the CASE command. Joining a table to
2. Join a table to itself?
itself may seem like a bit of a crazy, strange thing to ever want to do. Self-joins are used to compare values in a field to other values of the same field from within the same table. Let’s further explore this with an example. Recall the prime_ministers table from earlier. What if you wanted to create a new table showing countries that are in the same continent matched as pairs? Let’s explore a chunk of INNER JOIN code using the prime_ministers table.
3. Join prime_ministers to itself?
You might want to pause the video and think about what the resulting tablewill look like. The country column is selected twice as well as continent. The prime_ministers table is on both the left and the right. The vital step here is setting the key columns by which we match the table to itself. For each country, we will have a match if the country in the “right table” (that is also prime_ministers) is in the same continent. Lastly, since the results of this query are more than can fit on the slide, you’ll only see the first 14 records. See how we have exactly this in the result! It’s a pairing of each country with every other country in its same continent. But do you see a problem here? We don’t want to list the country with itself after all. In the next slide, you’ll see a way to do this. Pause to think about how to get around this before continuing. We don’t want to include rows
4. Finishing off the self-join on prime_ministers
where the country is the same in the country1 and country2 fields. The AND clause can check that multiple conditions are met. Here a match will not be made between prime_ministers and itself if the countries match.You, thus, have the correct table now; the results here are again limited in order for them to fit on the slide. Notice that self-join doesn’t have a syntax quite as simple as INNER JOIN (You can’t just write SELF JOIN in SQL code).
5. CASE WHEN and THEN
The next command isn’t a join, but is a useful tool in your repertoire. You’ll be introduced to using CASE with another table in the leaders database. The states table contains numeric data about different countries in the six inhabited world continents. We’ll focus on the field indep_year now. Suppose we’d like to group the year of independence into categories of before 1900, between 1900 and 1930, and after 1930. CASE will get us there! CASE is a way to do multiple if-then-else statements in a simplified way in SQL.
6. Preparing indep_year_group in states
You can now see the basic layout for creating a new field containing the groupings. How might we fill them in? After the first WHEN should specify that we want to check for indep_year being less than 1900. Next we want indep_year_group to contain ‘between 1900 and 1930’ in the next blank. Lastly any other record not matching these conditions will be assigned the value of ‘after 1930’ for indep_year_group.
7. Creating indep_year_group in states
Check out the completed query with completed blanks.Notice how the values of indep_year are grouped in indep_year_group. Also observe how continent relates to indep_year_group.
8. Let’s practice!
You’ll now work on a couple of exercises for practice, then complete a challenge testing your knowledge of the Chapter 1 material.
1.10 Self-join
In this exercise, you’ll use the populations
table to perform a self-join to calculate the percentage increase in population from 2010 to 2015 for each country code!
Since you’ll be joining the populations
table to itself, you can alias populations
as p1
and also populations
as p2
. This is good practice whenever you are aliasing and your tables have the same first letter. Note that you are required to alias the tables with self-joins.
Steps
- Join
populations
with itselfON
country_code
. - Select the
country_code
fromp1
and thesize
field from bothp1
andp2
. SQL won’t allow same-named fields, so aliasp1.size
assize2010
andp2.size
assize2015
.
-- Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code;
country_code | size2010 | size2015 |
---|---|---|
ABW | 101597 | 103889 |
ABW | 101597 | 101597 |
ABW | 103889 | 103889 |
ABW | 103889 | 101597 |
AFG | 27962208 | 32526562 |
AFG | 27962208 | 27962208 |
AFG | 32526562 | 32526562 |
AFG | 32526562 | 27962208 |
AGO | 21219954 | 25021974 |
AGO | 21219954 | 21219954 |
- Notice from the result that for each
country_code
you have four entries laying out all combinations of 2010 and 2015. - Extend the
ON
in your query to include only those records where thep1.year
(2010) matches withp2.year - 5
(2015 - 5 = 2010). This will omit the three entries percountry_code
that you aren’t interested in.
-- Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015
-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;
country_code | size2010 | size2015 |
---|---|---|
ABW | 101597 | 103889 |
AFG | 27962208 | 32526562 |
AGO | 21219954 | 25021974 |
ALB | 2913021 | 2889167 |
AND | 84419 | 70473 |
ARE | 8329453 | 9156963 |
ARG | 41222876 | 43416756 |
ARM | 2963496 | 3017712 |
ASM | 55636 | 55538 |
ATG | 87233 | 91818 |
- As you just saw, you can also use SQL to calculate values like
p2.year - 5
for you. With two fields likesize2010
andsize2015
, you may want to determine the percentage increase from one field to the next:two numeric fields \(A\) and \(B\), the percentage growth from \(A\) to \(B\) can be calculated as \((B - A) / A * 100.0\).a new field toSELECT
, aliased asgrowth_perc
, that calculates the percentage population growth from 2010 to 2015 for each country, usingp2.size
andp1.size
.
-- Select fields with aliases
SELECT p1.country_code,
p1.size AS size2010,
p2.size AS size2015,
-- Calculate growth_perc
((p2.size - p1.size)/p1.size * 100.0) AS growth_perc
-- From populations (alias as p1)
FROM populations AS p1
-- Join to itself (alias as p2)
INNER JOIN populations AS p2
-- Match on country code
ON p1.country_code = p2.country_code
-- and year (with calculation)
AND p1.year = p2.year - 5;
country_code | size2010 | size2015 | growth_perc |
---|---|---|---|
ABW | 101597 | 103889 | 2.2559721 |
AFG | 27962208 | 32526562 | 16.3232967 |
AGO | 21219954 | 25021974 | 17.9171920 |
ALB | 2913021 | 2889167 | -0.8188750 |
AND | 84419 | 70473 | -16.5199772 |
ARE | 8329453 | 9156963 | 9.9347457 |
ARG | 41222876 | 43416756 | 5.3219963 |
ARM | 2963496 | 3017712 | 1.8294608 |
ASM | 55636 | 55538 | -0.1761449 |
ATG | 87233 | 91818 | 5.2560385 |
Nice!
1.11 Case when and then
Often it’s useful to look at a numerical field not as raw data, but instead as being in different categories or groups.
You can use CASE
with WHEN
, THEN
, ELSE
, and END
to define a new grouping field.
Steps
Using the
countries
table, create a new fieldAS geosize_group
that groups the countries into three groups:- If
surface_area
is greater than 2 million,geosize_group
is'large'
. - If
surface_area
is greater than 350 thousand but not larger than 2 million,geosize_group
is'medium'
. - Otherwise,
geosize_group
is'small'
.
- If
SELECT name, continent, code, surface_area,
-- First case
CASE WHEN surface_area > 2000000 THEN 'large'
-- Second case
WHEN surface_area > 350000 THEN 'medium'
-- Else clause + end
ELSE 'small' END
-- Alias name
AS geosize_group
-- From table
FROM countries;
name | continent | code | surface_area | geosize_group |
---|---|---|---|---|
Afghanistan | Asia | AFG | 652090 | medium |
Netherlands | Europe | NLD | 41526 | small |
Albania | Europe | ALB | 28748 | small |
Algeria | Africa | DZA | 2381740 | large |
American Samoa | Oceania | ASM | 199 | small |
Andorra | Europe | AND | 468 | small |
Angola | Africa | AGO | 1246700 | medium |
Antigua and Barbuda | North America | ATG | 442 | small |
United Arab Emirates | Asia | ARE | 83600 | small |
Argentina | South America | ARG | 2780400 | large |
Well done! Time for the last exercise of this chapter!
1.12 Inner challenge
The table you created with the added geosize_group
field has been loaded for you here with the name countries_plus
. Observe the use of (and the placement of) the INTO
command to create this countries_plus
table:
You will now explore the relationship between the size of a country in terms of surface area and in terms of population using grouping fields created with CASE
.
By the end of this exercise, you’ll be writing two queries back-to-back in a single script. You got this!
Steps
Using the
populations
table focused only for theyear
2015, create a new field aliased aspopsize_group
to organize populationsize
into'large'
(> 50 million),'medium'
(> 1 million), and'small'
groups.
Select only the country code, population size, and this new
popsize_group
as fields.
SELECT country_code, size,
-- First case
CASE WHEN size > 50000000 THEN 'large'
-- Second case
WHEN size > 1000000 THEN 'medium'
-- Else clause + end
ELSE 'small' END
-- Alias name (popsize_group)
AS popsize_group
-- From table
FROM populations
-- Focus on 2015
WHERE year = 2015;
country_code | size | popsize_group |
---|---|---|
ABW | 103889 | small |
AFG | 32526562 | medium |
AGO | 25021974 | medium |
ALB | 2889167 | medium |
AND | 70473 | small |
ARE | 9156963 | medium |
ARG | 43416756 | medium |
ARM | 3017712 | medium |
ASM | 55538 | small |
ATG | 91818 | small |
- Use
INTO
to save the result of the previous query aspop_plus
. You can see an example of this in thecountries_plus
code in the assignment text. Make sure to include a;
at the end of yourWHERE
clause!
- Then, include another query below your first query to display all the records in
pop_plus
usingSELECT * FROM pop_plus;
so that you generate results and this will displaypop_plus
in the query result.
country_code | size | popsize_group |
---|---|---|
ABW | 103889 | small |
AFG | 32526562 | medium |
AGO | 25021974 | medium |
ALB | 2889167 | medium |
AND | 70473 | small |
ARE | 9156963 | medium |
ARG | 43416756 | medium |
ARM | 3017712 | medium |
ASM | 55538 | small |
ATG | 91818 | small |
- Write a query to join
countries_plus AS c
on the left withpop_plus AS p
on the right matchingthe country code fields. - Sort the data based on
geosize_group
, in ascending order so thatlarge
appears on top. - Select the
name
,continent
,geosize_group
, andpopsize_group
fields.
-- Select fields
SELECT name, continent, geosize_group, popsize_group
-- From countries_plus (alias as c)
FROM countries_plus AS c
-- Join to pop_plus (alias as p)
INNER JOIN pop_plus AS p
-- Match on country code
ON c.code = p.country_code
-- Order the table
ORDER BY geosize_group;
name | continent | geosize_group | popsize_group |
---|---|---|---|
India | Asia | large | large |
United States | North America | large | large |
Saudi Arabia | Asia | large | medium |
China | Asia | large | large |
Kazakhstan | Asia | large | medium |
Sudan | Africa | large | medium |
Argentina | South America | large | medium |
Algeria | Africa | large | medium |
Congo, The Democratic Republic of the | Africa | large | large |
Canada | North America | large | medium |
This concludes chapter 1 and you now know the INs of JOINs. Off to chapter 2 to learn the OUTs!
2 2. Outer joins and cross joins
In this chapter, you’ll come to grips with different kinds of outer joins. You’ll learn how to gain further insights into your data through left joins, right joins, and full joins. In addition to outer joins, you’ll also work with cross joins.
2.1 LEFT and RIGHT JOINs
Theory. Coming soon …
1. LEFT and RIGHT JOINs
Congratulations on completing Chapter 1 on INNER JOINs. Welcome to Chapter 2 on OUTER JOINs! You can remember outer joins as reaching OUT to another table while keeping all of the records of the original table. Inner joins keep only the records IN both tables. You’ll begin this chapter by exploring (1) LEFT JOINs, (2) RIGHT JOINs, and (3) FULL JOINs, which are the three types of OUTER JOINs. Let’s begin by exploring how a LEFT JOIN differs from an INNER JOIN via a diagram.
2. INNER JOIN diagram
Recall the inner join diagram from Chapter 1. The only records that were included in the resulting table of the INNER JOIN query were those in which the id field had matching values.
3. LEFT JOIN initial diagram
In contrast, a LEFT JOIN notes those records in the left table that do not have a match on the key field in the right table. This is denoted in the diagram by the open circles remaining close to the left table for id values of 2 and 3. These values of 2 and 3 do not appear in the id field of the right table.
4. LEFT JOIN diagram
You now see the result of the LEFT JOIN query. Whereas the INNER JOIN kept just the records corresponding to id values of 1 and 4, a LEFT JOIN keeps all of the original records in the left table but then marks the values as missing in the right table for those that don’t have a match. The missing values are marked with dark gray boxes here for clarity. Note that the values of 5 and 6 for id in the right table are not found in the result of LEFT JOIN in any way.
5. Multiple INNER JOIN diagram
It isn’t always the case that each key value in the left table corresponds to exactly one record in the key column of the right table. In these examples, we have this layout. Missing entries still occur for ids of 2 and 3 and the value of R3 is brought into the join from right2 since it matches on id 4. Duplicate rows are shown in the LEFT JOIN for id 1 since it has two matches corresponding to the values of R1 and R2 in the right2 table.
6. The syntax of a LEFT JOIN
The syntax of the LEFT JOIN is similar to that of the INNER JOIN. Let’s explore the same code you used before to determine the countries with a prime minister and a president, but let’s use a LEFT JOIN instead of an INNER JOIN. Further, let’s remove continent to save space on the screen. The first four records in this table are the same as those from the INNER JOIN. The last six correspond to the countries that do not have a president and thus their president values are missing.
7. RIGHT JOIN
The RIGHT JOIN is much less common than the LEFT JOIN so we won’t spend as much time on it here. The diagram will help you to understand how it works. Instead of matching entries in the id column on the left table TO the id column of the right table, a RIGHT JOIN does the reverse. Therefore, we see open circles on the ids of 5 and 6 in the right table since they are not found in the left table. The resulting table from the RIGHT JOIN shows these missing entries in the L_val field.As you can see in SQL the right table appears after RIGHT JOIN and the left table appears after FROM.
8. Let’s practice!
I’ll see you again soon to introduce FULL JOINs after you complete the next few exercises.
2.2 Left Join
Now you’ll explore the differences between performing an inner join and a left join using the cities
and countries
tables.
You’ll begin by performing an inner join with the cities
table on the left and the countries
table on the right. Remember to alias the name of the city field as city
and the name of the country field as country
.
You will then change the query to a left join. Take note of how many records are in each query here!
Steps
- Fill in the code based on the instructions in the code comments to complete the inner join. Note how many records are in the result of the join in the query result.
-- Select the city name (with alias), the country code,
-- the country name (with alias), the region,
-- and the city proper population
SELECT c1.name AS city, code, c2.name AS country,
region, city_proper_pop
-- From left table (with alias)
FROM cities AS c1
-- Join to right table (with alias)
INNER JOIN countries AS c2
-- Match on country code?
ON c1.country_code = c2.code
-- Order based on descending country code
ORDER BY code DESC;
city | code | country | region | city_proper_pop |
---|---|---|---|---|
Harare | ZWE | Zimbabwe | Eastern Africa | 1606000 |
Lusaka | ZMB | Zambia | Eastern Africa | 1742979 |
Cape Town | ZAF | South Africa | Southern Africa | 3740026 |
Johannesburg | ZAF | South Africa | Southern Africa | 4434827 |
Durban | ZAF | South Africa | Southern Africa | 3442361 |
Ekurhuleni | ZAF | South Africa | Southern Africa | 3178470 |
Sana’a | YEM | Yemen | Middle East | 1937451 |
Ho Chi Minh City | VNM | Vietnam | Southeast Asia | 7681700 |
Hanoi | VNM | Vietnam | Southeast Asia | 6844100 |
Caracas | VEN | Venezuela | South America | 1943901 |
- Change the code to perform a
LEFT JOIN
instead of anINNER JOIN
. After executing this query, note how many records the query result contains.
SELECT c1.name AS city, code, c2.name AS country,
region, city_proper_pop
FROM cities AS c1
-- Join right table (with alias)
LEFT JOIN countries AS c2
-- Match on country code
ON c1.country_code = c2.code
-- Order by descending country code
ORDER BY code DESC;
city | code | country | region | city_proper_pop |
---|---|---|---|---|
Taichung | NA | NA | NA | 2752413 |
Tainan | NA | NA | NA | 1885252 |
Kaohsiung | NA | NA | NA | 2778918 |
Bucharest | NA | NA | NA | 1883425 |
Taipei | NA | NA | NA | 2704974 |
New Taipei City | NA | NA | NA | 3954929 |
Harare | ZWE | Zimbabwe | Eastern Africa | 1606000 |
Lusaka | ZMB | Zambia | Eastern Africa | 1742979 |
Cape Town | ZAF | South Africa | Southern Africa | 3740026 |
Ekurhuleni | ZAF | South Africa | Southern Africa | 3178470 |
Perfect! Notice that the INNER JOIN
version resulted in 230 records. The LEFT JOIN
version returned 236 rows.
2.3 Left join (2)
Next, you’ll try out another example comparing an inner join to its corresponding left join. Before you begin though, take note of how many records are in both the countries
and languages
tables below.
You will begin with an inner join on the countries
table on the left with the languages
table on the right. Then you’ll change the code to a left join in the next bullet.
Note the use of multi-line comments here using /*
and */
.
Steps
- Perform an inner join and alias the name of the country field as
country
and the name of the language field aslanguage
. - Sort based on descending country name.
/*
Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
-- From left table (alias as c)
FROM countries AS c
-- Join to right table (alias as l)
INNER JOIN languages AS l
-- Match on fields
ON c.code = l.code
-- Order by descending country
ORDER BY country DESC;
country | local_name | language | percent |
---|---|---|---|
Zimbabwe | Zimbabwe | Shona | NA |
Zimbabwe | Zimbabwe | Tonga | NA |
Zimbabwe | Zimbabwe | Tswana | NA |
Zimbabwe | Zimbabwe | Venda | NA |
Zimbabwe | Zimbabwe | Xhosa | NA |
Zimbabwe | Zimbabwe | Sotho | NA |
Zimbabwe | Zimbabwe | sign | NA |
Zimbabwe | Zimbabwe | Shangani | NA |
Zimbabwe | Zimbabwe | Ndau | NA |
Zimbabwe | Zimbabwe | Nambya | NA |
- Perform a left join instead of an inner join. Observe the result, and also note the change in the number of records in the result.
- Carefully review which records appear in the left join result, but not in the inner join result.
/*
Select country name AS country, the country's local name,
the language name AS language, and
the percent of the language spoken in the country
*/
SELECT c.name AS country, local_name, l.name AS language, percent
-- From left table (alias as c)
FROM countries AS c
-- Join to right table (alias as l)
LEFT JOIN languages AS l
-- Match on fields
ON c.code = l.code
-- Order by descending country
ORDER BY country DESC;
country | local_name | language | percent |
---|---|---|---|
Zimbabwe | Zimbabwe | Chibarwe | NA |
Zimbabwe | Zimbabwe | Shona | NA |
Zimbabwe | Zimbabwe | Ndebele | NA |
Zimbabwe | Zimbabwe | English | NA |
Zimbabwe | Zimbabwe | Chewa | NA |
Zimbabwe | Zimbabwe | Xhosa | NA |
Zimbabwe | Zimbabwe | Venda | NA |
Zimbabwe | Zimbabwe | Tswana | NA |
Zimbabwe | Zimbabwe | Tonga | NA |
Zimbabwe | Zimbabwe | Sotho | NA |
Perfect! Notice that the INNER JOIN
version resulted in 909 records. The LEFT JOIN
version returned 916 rows.
2.4 Left join (3)
You’ll now revisit the use of the AVG()
function introduced in our introductory SQL course. You will use it in combination with left join to determine the average gross domestic product (GDP) per capita by region in 2010.
Steps
- Begin with a left join with the
countries
table on the left and theeconomies
table on the right. - Focus only on records with 2010 as the
year
.
-- Select name, region, and gdp_percapita
SELECT name, region, gdp_percapita
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010;
name | region | gdp_percapita |
---|---|---|
Afghanistan | Southern and Central Asia | 539.667 |
Angola | Central Africa | 3599.270 |
Albania | Southern Europe | 4098.130 |
United Arab Emirates | Middle East | 34628.630 |
Argentina | South America | 10412.950 |
Armenia | Middle East | 3121.780 |
Antigua and Barbuda | Caribbean | 13531.780 |
Australia | Australia and New Zealand | 56362.840 |
Austria | Western Europe | 46757.130 |
Azerbaijan | Middle East | 5847.260 |
- Modify your code to calculate the average GDP per capita
AS avg_gdp
for each region in 2010. - Select the
region
andavg_gdp
fields.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region;
region | avg_gdp |
---|---|
Southern Africa | 5051.598 |
Caribbean | 11413.339 |
Eastern Africa | 1757.348 |
Southern Europe | 22926.411 |
Eastern Asia | 26205.851 |
South America | 7420.599 |
Baltic Countries | 12631.030 |
North America | 47911.510 |
Australia and New Zealand | 44792.385 |
Southern and Central Asia | 2797.155 |
- Arrange this data on average GDP per capita for each region in 2010 from highest to lowest average GDP per capita.
-- Select fields
SELECT region, AVG(gdp_percapita) AS avg_gdp
-- From countries (alias as c)
FROM countries AS c
-- Left join with economies (alias as e)
LEFT JOIN economies AS e
-- Match on code fields
ON c.code = e.code
-- Focus on 2010
WHERE year = 2010
-- Group by region
GROUP BY region
-- Order by descending avg_gdp
ORDER BY avg_gdp DESC;
region | avg_gdp |
---|---|
Western Europe | 58130.96 |
Nordic Countries | 57074.00 |
North America | 47911.51 |
Australia and New Zealand | 44792.38 |
British Islands | 43588.33 |
Eastern Asia | 26205.85 |
Southern Europe | 22926.41 |
Middle East | 18204.64 |
Baltic Countries | 12631.03 |
Caribbean | 11413.34 |
Well done. Notice how gradually you’re adding more and more building blocks to your SQL vocabulary. This enables you to answer questions of ever-increasing complexity!
2.5 Right join
Right joins aren’t as common as left joins. One reason why is that you can always write a right join as a left join.
Steps
- The left join code is commented out here. Your task is to write a new query using rights joins that produces the same result as what the query using left joins produces. Keep this left joins code commented as you write your own query just below it using right joins to solve the problem.
Note the order of the joins matters in your conversion to using right joins!
-- convert this code to use RIGHT JOINs instead of LEFT JOINs
/*
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
indep_year, languages.name AS language, percent
FROM cities
LEFT JOIN countries
ON cities.country_code = countries.code
LEFT JOIN languages
ON countries.code = languages.code
ORDER BY city, language;
*/
SELECT cities.name AS city, urbanarea_pop, countries.name AS country,
indep_year, languages.name AS language, percent
FROM languages
RIGHT JOIN countries
ON languages.code = countries.code
RIGHT JOIN cities
ON countries.code = cities.country_code
ORDER BY city, language;
city | urbanarea_pop | country | indep_year | language | percent |
---|---|---|---|---|---|
Abidjan | 4765000 | Cote d’Ivoire | 1960 | French | NA |
Abidjan | 4765000 | Cote d’Ivoire | 1960 | Other | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Arabic | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | English | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Hindi | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Persian | NA |
Abu Dhabi | 1145000 | United Arab Emirates | 1971 | Urdu | NA |
Abuja | 1235880 | Nigeria | 1960 | English | NA |
Abuja | 1235880 | Nigeria | 1960 | Fulani | NA |
Abuja | 1235880 | Nigeria | 1960 | Hausa | NA |
Correct; everything should be reversed!
2.6 FULL JOINs
Theory. Coming soon …
1. FULL JOINs
The last of the three types of OUTER JOINs is the FULL JOIN. In this video, you’ll see the differences between a FULL JOIN and the other joins you’ve learned about. In particular, the instruction will focus on comparing them to INNER JOINs and LEFT JOINs and then to LEFT JOINs and RIGHT JOINs.Let’s first review how the diagram changes between an INNER JOIN and a LEFT JOIN for our basic example using the left and right tables. Then we’ll delve into the FULL JOIN diagram and its SQL code.
2. INNER JOIN vs LEFT JOIN
Recall that an INNER JOIN keeps only the records that have matching key field values in both tables. A LEFT JOIN keeps all of the records in the left table while bringing in missing values for those key field values that don’t appear in the right table. Let’s next review the differences between a LEFT JOIN and a RIGHT JOIN.
3. LEFT JOIN vs RIGHT JOIN
Now you can see the differences between a LEFT JOIN and a RIGHT JOIN. The id values of 2 and 3 in the left table do not match with the id values in the right table, so missing values are brought in for them in the LEFT JOIN. Likewise for the RIGHT JOIN, missing values are brought in for id values of 5 and 6.
4. FULL JOIN initial diagram
A FULL JOIN combines a LEFT JOIN and a RIGHT JOIN as you can see by looking at this diagram. So it will bring in all records from both the left and the right table and keep track of the missing values accordingly.
5. FULL JOIN diagram
Note the missing values here and that all six of the values of id are included in the table. You can also see from the SQL code to produce this FULL JOIN result that the general format aligns closely with the SQL syntax you’ve seen for both an INNER JOIN and a LEFT JOIN. You’ll next explore an example from the leaders database.
6. FULL JOIN example using leaders database
Let’s revisit the example of looking at countries with prime ministers and/or presidents. We’ll walk through the code line by line to do this using a FULL JOIN. The SELECT statement starts us off by including the country field from both of our tables of interest and also the prime_minister and president fields.
7. FULL JOIN example using leaders database
Next, the left table is specified as prime_ministers. Note that the order matters here and if you switched the two tables you’d get slightly different output.
8. FULL JOIN example using leaders database
The right table is specified as presidents with the alias of p2. prime_ministers was aliased as p1 in the previous line.
9. FULL JOIN example using leaders database
Lastly, the join is done based on the key field of country in both tables.
10. FULL JOIN example results using leaders
Time for some practice!
11. Let’s practice!
2.7 Full join
In this exercise, you’ll examine how your results differ when using a full join versus using a left join versus using an inner join with the countries
and currencies
tables.
You will focus on the North American region
and also where the name
of the country is missing. Dig in to see what we mean!
Begin with a full join with countries
on the left and currencies
on the right. The fields of interest have been SELECT
ed for you throughout this exercise.
Then complete a similar left join and conclude with an inner join.
Steps
- Choose records in which
region
corresponds to North America or isNULL
.
SELECT name AS country, code, region, basic_unit
-- From countries
FROM countries
-- Join to currencies
FULL JOIN currencies
-- Match on code
USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
Canada | CAN | North America | Canadian dollar |
United States | USA | North America | United States dollar |
Bermuda | BMU | North America | Bermudian dollar |
Greenland | GRL | North America | NA |
NA | TMP | NA | United States dollar |
NA | FLK | NA | Falkland Islands pound |
NA | AIA | NA | East Caribbean dollar |
NA | NIU | NA | New Zealand dollar |
NA | ROM | NA | Romanian leu |
NA | SHN | NA | Saint Helena pound |
- Repeat the same query as before, using a
LEFT JOIN
instead of aFULL JOIN
. Note what has changed compared to theFULL JOIN
result!
SELECT name AS country, code, region, basic_unit
-- From countries
FROM countries
-- Join to currencies
LEFT JOIN currencies
-- Match on code
USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
Bermuda | BMU | North America | Bermudian dollar |
Canada | CAN | North America | Canadian dollar |
United States | USA | North America | United States dollar |
Greenland | GRL | North America | NA |
- Repeat the same query again but use an
INNER JOIN
instead of aFULL JOIN
.what has changed compared to theFULL JOIN
andLEFT JOIN
results!
SELECT name AS country, code, region, basic_unit
-- From countries
FROM countries
-- Join to currencies
INNER JOIN currencies
-- Match on code
USING (code)
-- Where region is North America or null
WHERE region = 'North America' OR region IS NULL
-- Order by region
ORDER BY region;
country | code | region | basic_unit |
---|---|---|---|
Bermuda | BMU | North America | Bermudian dollar |
Canada | CAN | North America | Canadian dollar |
United States | USA | North America | United States dollar |
Have you kept an eye out on the different numbers of records these queries returned? The FULL JOIN
query returned 18 rows, the OUTER JOIN
returned 4 rows, and the INNER JOIN
only returned 3 rows. Do these results make sense to you?
2.8 Full join (2)
You’ll now investigate a similar exercise to the last one, but this time focused on using a table with more records on the left than the right. You’ll work with the languages
and countries
tables.
Begin with a full join with languages
on the left and countries
on the right. Appropriate fields have been selected for you again here.
Steps
- Choose records in which
countries.name
starts with the capital letter'V'
or isNULL
. - Arrange by
countries.name
in ascending order to more clearly see the results.
SELECT countries.name, code, languages.name AS language
-- From languages
FROM languages
-- Join to countries
FULL JOIN countries
-- Match on code
USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
Vanuatu | VUT | Tribal Languages |
Vanuatu | VUT | English |
Vanuatu | VUT | French |
Vanuatu | VUT | Other |
Vanuatu | VUT | Bislama |
Venezuela | VEN | Spanish |
Venezuela | VEN | indigenous |
Vietnam | VNM | Vietnamese |
Vietnam | VNM | English |
Vietnam | VNM | Other |
- Repeat the same query as before, using a
LEFT JOIN
instead of aFULL JOIN
. Note what has changed compared to theFULL JOIN
result!
SELECT countries.name, code, languages.name AS language
-- From languages
FROM languages
-- Join to countries
LEFT JOIN countries
-- Match using code
USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
Vanuatu | VUT | English |
Vanuatu | VUT | Other |
Vanuatu | VUT | French |
Vanuatu | VUT | Tribal Languages |
Vanuatu | VUT | Bislama |
Venezuela | VEN | indigenous |
Venezuela | VEN | Spanish |
Vietnam | VNM | English |
Vietnam | VNM | Vietnamese |
Vietnam | VNM | Other |
- Repeat once more, but use an
INNER JOIN
instead of aLEFT JOIN
. Note what has changed compared to theFULL JOIN
andLEFT JOIN
results.
SELECT countries.name, code, languages.name AS language
-- From languages
FROM languages
-- Join to countries
INNER JOIN countries
-- Match using code
USING (code)
-- Where countries.name starts with V or is null
WHERE countries.name LIKE 'V%' OR countries.name IS NULL
-- Order by ascending countries.name
ORDER BY countries.name;
name | code | language |
---|---|---|
Vanuatu | VUT | Tribal Languages |
Vanuatu | VUT | Bislama |
Vanuatu | VUT | English |
Vanuatu | VUT | French |
Vanuatu | VUT | Other |
Venezuela | VEN | Spanish |
Venezuela | VEN | indigenous |
Vietnam | VNM | Vietnamese |
Vietnam | VNM | English |
Vietnam | VNM | Other |
Well done. Again, make sure to compare the number of records the different types of joins return and try to verify whether the results make sense.
2.9 Full join (3)
You’ll now explore using two consecutive full joins on the three tables you worked with in the previous two exercises.
Steps
- Complete a full join with
countries
on the left andlanguages
on the right. - Next, full join this result with
currencies
on the right. - Use
LIKE
to choose the Melanesia and Micronesia regions (Hint:'M%esia'
). - Select the fields corresponding to the country name
AS country
, region, language nameAS language
, and basic and fractional units of currency.
-- Select fields (with aliases)
SELECT c1.name AS country, region, l.name AS language,
basic_unit, frac_unit
-- From countries (alias as c1)
FROM countries AS c1
-- Join with languages (alias as l)
FULL JOIN languages AS l
-- Match on code
USING (code)
-- Join with currencies (alias as c2)
FULL JOIN currencies AS c2
-- Match on code
USING (code)
-- Where region like Melanesia and Micronesia
WHERE region LIKE 'M%esia';
country | region | language | basic_unit | frac_unit |
---|---|---|---|---|
Kiribati | Micronesia | English | Australian dollar | Cent |
Kiribati | Micronesia | Kiribati | Australian dollar | Cent |
Marshall Islands | Micronesia | Other | United States dollar | Cent |
Marshall Islands | Micronesia | Marshallese | United States dollar | Cent |
Nauru | Micronesia | Other | Australian dollar | Cent |
Nauru | Micronesia | English | Australian dollar | Cent |
Nauru | Micronesia | Nauruan | Australian dollar | Cent |
New Caledonia | Melanesia | Other | CFP franc | Centime |
New Caledonia | Melanesia | French | CFP franc | Centime |
Palau | Micronesia | Other | United States dollar | Cent |
Well done! How many countries are in the regions you filtered on?
2.10 Review outer joins
A(n) ___ join is a join combining the results of a ___ join and a ___ join.
2.11 Question
Fill out!
⬜ left, full, right
⬜ right, full, left
⬜ inner, left, right
✅ None of the above are true
Correct!
2.12 CROSSing the rubicon
Theory. Coming soon …
1. CROSSing the Rubicon
Now that you’ve worked with INNER JOINs and OUTER JOINs it’s time to check out the CROSS JOIN. CROSS JOINs create all possible combinations of two tables. Let’s explore the diagram for a CROSS JOIN next.
2. CROSS JOIN diagram
In this diagram we have two tables named table1 and table2. Each table only has one field, both with the name of id. The result of the CROSS JOIN is all nine combinations of the id values of 1, 2, and 3 in table1 with the id values of A, B, and C for table2. Next you’ll explore an example from the leaders database and look over the SQL syntax for a CROSS JOIN.
3. Pairing prime ministers with presidents
Suppose that all prime ministers in North America and Oceania in the prime_ministers table are scheduled for individual meetings with all presidents in the presidents table. You can look at all of these combinations by using a CROSS JOIN. The syntax here remains similar to what you’ve seen earlier in the course. We use a WHERE clause to focus on only prime ministers in North America and Oceania in the prime_ministers table. The results of the query give us the pairings for the two prime ministers in North America and Oceania from the prime_ministers table with the seven presidents in the presidents table.
4. Let’s practice!
You’ll now hop into an exercise focusing on a couple of cities in a tribute to the author Charles Dickens. This chapter closes with a challenge to test your comprehension of the content covered here. Good luck!
2.13 A table of two cities
This exercise looks to explore languages potentially and most frequently spoken in the cities of Hyderabad, India and Hyderabad, Pakistan.
Steps
- Create a
CROSS JOIN
withcities AS c
on the left andlanguages AS l
on the right. - Make use of
LIKE
andHyder%
to choose Hyderabad in both countries. - Select only the city name
AS city
and language nameAS language
.
-- Select fields
SELECT c.name AS city, l.name AS language
-- From cities (alias as c)
FROM cities AS c
-- Join to languages (alias as l)
CROSS JOIN languages AS l
-- Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
city | language |
---|---|
Hyderabad (India) | Dari |
Hyderabad | Dari |
Hyderabad (India) | Pashto |
Hyderabad | Pashto |
Hyderabad (India) | Turkic |
Hyderabad | Turkic |
Hyderabad (India) | Other |
Hyderabad | Other |
Hyderabad (India) | Albanian |
Hyderabad | Albanian |
- Use an
INNER JOIN
instead of aCROSS JOIN
. Think about what the difference will be in the results for thisINNER JOIN
result and the one for theCROSS JOIN
.
-- Select fields
SELECT c.name AS city, l.name AS language
-- From cities (alias as c)
FROM cities AS c
-- Join to languages (alias as l)
INNER JOIN languages AS l
-- Match on country code
ON c.country_code = l.code
-- Where c.name like Hyderabad
WHERE c.name LIKE 'Hyder%';
city | language |
---|---|
Hyderabad (India) | Hindi |
Hyderabad (India) | Bengali |
Hyderabad (India) | Telugu |
Hyderabad (India) | Marathi |
Hyderabad (India) | Tamil |
Hyderabad (India) | Urdu |
Hyderabad (India) | Gujarati |
Hyderabad (India) | Kannada |
Hyderabad (India) | Malayalam |
Hyderabad (India) | Oriya |
Good one! Can you see the difference between a CROSS JOIN
and a INNER JOIN
?
2.14 Outer challenge
Now that you’re fully equipped to use OUTER JOIN
s, try a challenge problem to test your knowledge!
In terms of life expectancy for 2010, determine the names of the lowest five countries and their regions.
Steps
- Select country name
AS country
,region
, and life expectancyAS life_exp
. - Make sure to use
LEFT JOIN
,WHERE
,ORDER BY
, andLIMIT
.
-- Select fields
SELECT c.name AS country,
region,
life_expectancy AS life_exp
-- From countries (alias as c)
FROM countries AS c
-- Join to populations (alias as p)
LEFT JOIN populations AS p
-- Match on country code
ON c.code = p.country_code
-- Focus on 2010
WHERE year = 2010
-- Order by life_exp
ORDER BY life_exp
-- Limit to 5 records
LIMIT 5;
country | region | life_exp |
---|---|---|
Lesotho | Southern Africa | 47.48341 |
Central African Republic | Central Africa | 47.62532 |
Sierra Leone | Western Africa | 48.22895 |
Swaziland | Southern Africa | 48.34576 |
Zimbabwe | Eastern Africa | 49.57466 |
This was the last exercise of this chapter on outer joins and cross joins. In the next chapter, you’ll learn about set theory clauses!
3 3. Set theory clauses
In this chapter, you’ll learn more about set theory using Venn diagrams and get an introduction to union, union all, intersect, and except clauses. You’ll finish by investigating semi joins and anti joins, which provide a nice introduction to subqueries.
3.1 State of the UNION
Theory. Coming soon …
1. State of the UNION
Welcome to Chapter 3! You’ve made it through some challenging material so far. Keep up the great work! Next are set theory clauses. In this video, we’ll focus on the operations of UNION and UNION ALL. In addition to joining diagrams, in this chapter, you’ll also see how Venn diagrams can be used to represent set operations. Let’s begin with these Venn diagrams now.
2. Set Theory Venn Diagrams
You can think of each circle as representing a table of data. The shading represents what is included in the result of the set operation from each table. Beginning in the top left, UNION includes every record in both tables but DOES NOT double count those that are in both tables. UNION ALL includes every record in both tables and DOES replicate those that are in both tables. This is why the center is shaded black. The two diagrams on the bottom represent only subsets of data being selected. INTERSECT results in only those records found in both of the two tables. EXCEPT results in only those records in one table BUT NOT the other. Let’s investigate what UNION looks like as a joining diagram.
3. UNION diagram
In this diagram, you have two tables with names left_one and right_one. The “one” here corresponds to each table only having one field. If you run a UNION on these two fields you get each record appearing in either table, but notice that the id values of 1 and 4 in right_one are not included again in the UNION since they were already found in the left_one table.
4. UNION ALL diagram
By contrast (with the same two tables left_one and right_one), UNION ALL includes all duplicates in its result. So left_one and right_one both having four records yields eight records for the result of the UNION ALL. If it were the case that right_one had these same four values and also one more value of 1 for id, you’d see three entries for 1 in the resulting UNION ALL. Let’s check out the SQL syntax using the leaders database for both UNION and UNION ALL, but first you’ll see one more table in the leaders database.
5. monarchs table
Check out the monarchs table in the leaders database that we will use in examples here. The table lists the country, continent, and the name of the monarch for that country. Do some of these names look familiar based on the other tables you’ve seen? They should! We’ll come back to this.
6. All prime ministers and monarchs
You can use UNION on the prime_ministers and monarchs table to show all of the different prime ministers and monarchs in these two tables. The country field is also included here for reference. Note that the prime_minister field has been aliased as leader. In fact, the resulting field from the UNION will have the name of leader. That’s an important property of the set theory clauses you will see in this chapter. The fields included in the operation must be of the same data type since they come back as just a single field. You can’t stack a number on top of a character field in other words.
7. Resulting table from UNION
Our resulting table from the UNION gives all the leaders and their corresponding country. Does something stand out to you here? (PAUSE)
8. UNION ALL with leaders
The countries of Brunei and Oman were listed only once in the UNION table. These countries have monarchs that also act as prime ministers. This can be seen in the UNION ALL results. You’ve seen
9. Let’s practice!
that UNION and UNION ALL clauses do not do the lookup step that JOINs do. They simply stack records on top of each other from one table to the next.
3.2 Union
You have two new tables, economies2010
and economies2015
, available to you. The economies
table is also included for reference.
Steps
- Combine the two new tables into one table containing all of the fields in
economies2010
. - Sort this resulting single table by country code and then by year, both in ascending order.
-- Select fields from 2010 table
SELECT *
-- From 2010 table
FROM economies2010
-- Set theory clause
UNION
-- Select fields from 2015 table
SELECT *
-- From 2015 table
FROM economies2015
-- Order by code and year
ORDER BY code, year;
code | year | income_group | gross_savings |
---|---|---|---|
AFG | 2010 | Low income | 37.133 |
AFG | 2015 | Low income | 21.466 |
AGO | 2010 | Upper middle income | 23.534 |
AGO | 2015 | Upper middle income | -0.425 |
ALB | 2010 | Upper middle income | 20.011 |
ALB | 2015 | Upper middle income | 13.840 |
ARE | 2010 | High income | 27.073 |
ARE | 2015 | High income | 34.106 |
ARG | 2010 | Upper middle income | 17.361 |
ARG | 2015 | Upper middle income | 14.111 |
What a beauty!
3.3 Union (2)
UNION
can also be used to determine all occurrences of a field across multiple tables. Try out this exercise with no starter code.
Steps
- Determine all (non-duplicated) country codes in either the
cities
or thecurrencies
table. The result should be a table with only one field calledcountry_code
. - Sort by
country_code
in alphabetical order.
-- Select field
SELECT country_code
-- From cities
FROM cities
-- Set theory clause
UNION
-- Select field
SELECT code
-- From currencies
FROM currencies
-- Order by country_code
ORDER BY country_code;
country_code |
---|
ABW |
AFG |
AGO |
AIA |
ALB |
AND |
ARE |
ARG |
ARM |
ATG |
Well done! Let’s take it up a notch!
3.4 Union all
As you saw, duplicates were removed from the previous two exercises by using UNION
.
To include duplicates, you can use UNION ALL
.
Steps
- Determine all combinations (include duplicates) of country code and year that exist in either the
economies
or thepopulations
tables. Order bycode
thenyear
. - The result of the query should only have two columns/fields. Think about how many records this query should result in.
- You’ll use code very similar to this in your next exercise after the video. Make note of this code after completing it.
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
UNION ALL
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code, year
ORDER BY code, year;
code | year |
---|---|
ABW | 2010 |
ABW | 2015 |
AFG | 2010 |
AFG | 2010 |
AFG | 2015 |
AFG | 2015 |
AGO | 2010 |
AGO | 2010 |
AGO | 2015 |
AGO | 2015 |
Can you spot some duplicates in the query result?
3.5 INTERSECTional data science
Theory. Coming soon …
1. INTERSECTional data science
You saw with UNION and UNION ALL via examples that they do not do quite the same thing as what a join does. They only bind fields on top of one another in the two tables. The set theory clause INTERSECT works in a similar fashion to UNION and UNION ALL, but remember from the Venn diagram that INTERSECT only includes those records in common to both tables and fields selected. Let’s investigate the diagram for INTERSECT and the corresponding SQL code to achieve it.
2. INTERSECT diagram and SQL code
The result of the INTERSECT on left_one and right_one is only the records in common to both left_one and right_one: 1 and 4. Let’s next see how you could use INTERSECT to determine all countries having both a prime minister and a president.
3. Prime minister and president countries
The code for each of these set operations has a similar layout. You first select which fields you’d like to include in your first table, and then you specify the name of the first table. Next you specify the set operation to perform. Lastly, you denote which fields in the second table you’d like to include and then the name of the second table.The result of the query is the four countries with both a prime minister and a president in the leaders database.
4. INTERSECT on two fields
Next, let’s think about what would happen if we tried to select two columns instead of one from our previous example. The code shown does just that. What will be the result of this query? Will this also give you the names of the countries that have both a prime minister and a president? Hmmm [PAUSE]The actual result is an empty table. Why is that? When INTERSECT looks at two columns it includes both columns in the search. So it didn’t find any countries with prime ministers AND presidents having the same name. INTERSECT looks for RECORDS in common, not individual key fields like what a join does to match. This is an important distinction.
5. Let’s practice!
Let’s get some practice!
3.6 Intersect
UNION ALL
will extract all records from two tables, while INTERSECT
will only return records that both tables have in common. In this exercise, you will create a similar query as before, however, this time you will look at the records in common for country code and year for the economies
and populations
tables.
Note the number of records from the result of this query compared to the similar UNION ALL
query result of 814 records.
Steps
- Use
INTERSECT
to determine the records in common for country code and year for theeconomies
andpopulations
tables. - Again, order by
code
and then byyear
, both in ascending order.
-- Select fields
SELECT code, year
-- From economies
FROM economies
-- Set theory clause
INTERSECT
-- Select fields
SELECT country_code, year
-- From populations
FROM populations
-- Order by code and year
ORDER BY code, year;
code | year |
---|---|
AFG | 2010 |
AFG | 2015 |
AGO | 2010 |
AGO | 2015 |
ALB | 2010 |
ALB | 2015 |
ARE | 2010 |
ARE | 2015 |
ARG | 2010 |
ARG | 2015 |
Boom!
3.7 Intersect (2)
As you think about major world cities and their corresponding country, you may ask which countries also have a city with the same name as their country name?
Steps
- Use
INTERSECT
to answer this question withcountries
andcities
!
-- Select fields
SELECT name
-- From countries
FROM countries
-- Set theory clause
INTERSECT
-- Select fields
SELECT name
-- From cities
FROM cities;
name |
---|
Singapore |
Hong Kong |
Nice one! It looks as though Singapore is the only country that has a city with the same name!
3.8 Review union and intersect
3.9 Question
Which of the following combinations of terms and definitions is correct?
⬜UNION
: returns all records (potentially duplicates) in both tables
⬜UNION ALL
: returns only unique records
✅INTERSECT
: returns only records appearing in both tables
⬜ None of the above are matched correctly
Correct!
3.10 EXCEPTional
Theory. Coming soon …
1. EXCEPTional
Way to go! You’ve made it to the last of the four set theory clauses in this course. EXCEPT allows you to include only the records that are in one table, but not the other. Let’s mix things up and look into the SQL code and result first and then dive into the diagram.
2. Monarchs that aren’t prime ministers
You saw earlier that there are some monarchs that also act as the prime minister for their country. One way to determine those monarchs in the monarchs table that do not also hold the title of prime minister is to use the EXCEPT clause. [CLICK]This SQL query selects the monarch field from monarchs and then looks for common entries with the prime_ministers field, while also keeping track of the country for each leader. [CLICK] You can see in the resulting query that only the two European monarchs are not also prime ministers in the leaders database.
3. EXCEPT diagram
This diagram gives the structure of EXCEPT clauses. Only the records that appear in the left table BUT DO NOT appear in the right table are included.
4. Let’s practice!
After a couple exercises on using EXCEPT clauses, you’ll check out the last two types of joins for the course: semi-joins and anti-joins. I’ll see you in the next video for them!
3.11 Except
Get the names of cities in cities
which are not noted as capital cities in countries
as a single field result.
Note that there are some countries in the world that are not included in the countries
table, which will result in some cities not being labeled as capital cities when in fact they are.
Steps
- Order the resulting field in ascending order.
- Can you spot the city/cities that are actually capital cities which this query misses?
-- Select field
SELECT name
-- From cities
FROM cities
-- Set theory clause
EXCEPT
-- Select field
SELECT capital
-- From countries
FROM countries
-- Order by result
ORDER BY name;
name |
---|
Abidjan |
Ahmedabad |
Alexandria |
Almaty |
Auckland |
Bandung |
Barcelona |
Barranquilla |
Basra |
Belo Horizonte |
EXCEPTional!
3.12 Except (2)
Now you will complete the previous query in reverse!
Determine the names of capital cities that are not listed in the cities
table.
Steps
- Order by
capital
in ascending order. - The
cities
table contains information about 236 of the world’s most populous cities. The result of your query may surprise you in terms of the number of capital cities that do not appear in this list!
-- Select field
SELECT capital
-- From countries
FROM countries
-- Set theory clause
EXCEPT
-- Select field
SELECT name
-- From cities
FROM cities
-- Order by ascending capital
ORDER BY capital;
capital |
---|
Agana |
Amman |
Amsterdam |
Andorra la Vella |
Antananarivo |
Apia |
Ashgabat |
Asmara |
Astana |
Asuncion |
Well done. Is this query surprising, as the instructions suggested?
3.13 Semi-joins and Anti-joins
Theory. Coming soon …
1. Semi-joins and Anti-joins
You’ll now close this chapter by returning to joins. The six joins you’ve worked with so far are all additive joins in that they add columns to the original “left” table. Can you name all six? (1. INNER JOIN, 2. self-join, 3. LEFT JOIN, 4. RIGHT JOIN, 5. FULL JOIN, and 6. CROSS JOIN).
2. Building up to a semi-join
The last two joins we will cover use a right table to determine which records to keep in the left table. In other words, you use these last two joins (semi-join and anti-join) in a way similar to a WHERE clause dependent on the values of a second table. Let’s try out some examples of semi-joins and anti-joins and then return to the diagrams for each.Suppose that you are interested in determining the presidents of countries that gained independence before 1800. Let’s first determine which countries this corresponds to in the states table. Recall from your knowledge of SQL before you knew anything about JOINs how this could be done. To get only the countries meeting this condition you can use the WHERE clause. We’ll next set up the other part of the query to get the
3. Another step towards the semi-join
presidents we want. What code is needed to retrieve the president, country, and continent columns from the presidents table in that order? [PAUSE]Now we need to use this result with the one in the previous slide to further filter the country field in the presidents table to give us the correct result. Let’s see how this might be done next.
4. Finish the semi-join (an intro to subqueries)
In the first query of this example, we determined that Portugal and Spain were both independent before 1800. In the second query, we determined how to display the table in a nice form to answer our question. In order to combine the two tables together we will again use a WHERE clause and then use the first query as the condition to check in the WHERE clause. Check it out!This is your first example of a subquery: a query that sits inside of another query. You’ll explore these more in Chapter 4. What does this give as a result? Is it the presidents of Spain and of Portugal? Since Spain does not have a president, it is not included here and only the Portuguese president is listed. The semi-join chooses records in the first table where a condition IS met in a second table. An anti-join chooses records in the first table where a condition IS NOT met in the second table. How might you determine countries in the Americas founded after 1800?
5. An anti-join
Using the code from the previous example, you only need to add a few pieces of code. So what goes in the blanks? [PAUSE]Fill in the WHERE clause by choosing only those continents ending in America and then fill in the other space with a NOT to exclude those countries in the subquery. The presidents of
6. The result of the anti-join
countries in the Americas founded after 1800 are given in the table.
7. Semi-join and anti-join diagrams
The semi-join matches records by key field in the right table with those in the left. It then picks out only the rows in the left table that match that condition. The anti-join picks out those columns in the left table that do not match the condition on the right table. Semi-joins and anti-joins don’t have the same built-in SQL syntax that INNER JOIN and LEFT JOIN have. They are useful tools in filtering one table’s records on the records of another table.
8. Let’s practice!
This chapter’s challenge exercise will ask you to combine set theory clauses with semi-joins. Before you get to that, you’ll try out some exercises on semi-joins and anti-joins. See you again in Chapter 4!
3.14 Semi-join
You are now going to use the concept of a semi-join to identify languages spoken in the Middle East.
Steps
- Begin by selecting all country codes in the Middle East as a single field result using
SELECT
,FROM
, andWHERE
.
-- Select code
SELECT code
-- From countries
FROM countries
-- Where region is Middle East
WHERE region = 'Middle East';
code |
---|
ARE |
ARM |
AZE |
BHR |
GEO |
IRQ |
ISR |
YEM |
JOR |
KWT |
- Below the commented code, select only unique languages by name appearing in the
languages
table. - Order the resulting single field table by
name
in ascending order.
-- Query from step 1:
/*
SELECT code
FROM countries
WHERE region = 'Middle East';
*/
-- Select field
SELECT DISTINCT name
-- From languages
FROM languages
-- Order by name
ORDER BY name;
name |
---|
Afar |
Afrikaans |
Akyem |
Albanian |
Alsatian |
Amerindian |
Amharic |
Angolar |
Antiguan creole |
Arabic |
- Combine the previous two queries into one query by adding a
WHERE IN
statement to theSELECT DISTINCT
query to determine the unique languages spoken in the Middle East. - Order the result by
name
in ascending order.
-- Query from step 2
SELECT DISTINCT name
FROM languages
-- Where in statement
WHERE code IN
-- Query from step 1
-- Subquery
(SELECT code
FROM countries
WHERE region = 'Middle East')
-- Order by name
ORDER BY name;
name |
---|
Arabic |
Aramaic |
Armenian |
Azerbaijani |
Azeri |
Baluchi |
Bulgarian |
Circassian |
English |
Farsi |
Your first subquery is a fact! Let’s dive a little deeper into the concept.
3.15 Relating semi-join to a tweaked inner join
Let’s revisit the code from the previous exercise, which retrieves languages spoken in the Middle East.
Sometimes problems solved with semi-joins can also be solved using an inner join.
3.16 Question
This inner join isn’t quite right. What is missing from this second code block to get it to match with the correct answer produced by the first block?
⬜HAVING
instead ofWHERE
✅DISTINCT
⬜UNIQUE
3.17 Diagnosing problems using anti-join
Another powerful join in SQL is the anti-join. It is particularly useful in identifying which records are causing an incorrect number of records to appear in join queries.
You will also see another example of a subquery here, as you saw in the first exercise on semi-joins. Your goal is to identify the currencies used in Oceanian countries!
Steps
- Begin by determining the number of countries in
countries
that are listed in Oceania usingSELECT
,FROM
, andWHERE
.
-- Select statement
SELECT COUNT(*)
-- From countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania';
count |
---|
19 |
- Complete an inner join with
countries AS c1
on the left andcurrencies AS c2
on the right to get the different currencies used in the countries of Oceania. - Match
ON
thecode
field in the two tables. - Include the country
code
, countryname
, andbasic_unit AS currency
.
-- Select fields (with aliases)
SELECT c1.code, name, basic_unit AS currency
-- From countries (alias as c1)
FROM countries AS c1
-- Join with currencies (alias as c2)
INNER JOIN currencies AS c2
-- Match on code
ON c1.code = c2.code
-- Where continent is Oceania
WHERE c1.continent = 'Oceania';
code | name | currency |
---|---|---|
AUS | Australia | Australian dollar |
PYF | French Polynesia | CFP franc |
KIR | Kiribati | Australian dollar |
MHL | Marshall Islands | United States dollar |
NRU | Nauru | Australian dollar |
NCL | New Caledonia | CFP franc |
NZL | New Zealand | New Zealand dollar |
PLW | Palau | United States dollar |
PNG | Papua New Guinea | Papua New Guinean kina |
WSM | Samoa | Samoan tala |
Note that not all countries in Oceania were listed in the resulting inner join with
currencies
. Use an anti-join to determine which countries were not included!- Use
NOT IN
and(SELECT code FROM currencies)
as a subquery to get the country code and country name for the Oceanian countries that are not included in thecurrencies
table.
- Use
-- Select fields
SELECT code, name
-- From Countries
FROM countries
-- Where continent is Oceania
WHERE continent = 'Oceania'
-- And code not in
AND code NOT IN
-- Subquery
(SELECT code
FROM currencies);
code | name |
---|---|
ASM | American Samoa |
FJI | Fiji Islands |
GUM | Guam |
FSM | Micronesia, Federated States of |
MNP | Northern Mariana Islands |
Nice! Can you tell which countries were not included now?
3.18 Set theory challenge
Congratulations! You’ve now made your way to the challenge problem for this third chapter. Your task here will be to incorporate two of UNION
/UNION ALL
/INTERSECT
/EXCEPT
to solve a challenge involving three tables.
In addition, you will use a subquery as you have in the last two exercises! This will be great practice as you hop into subqueries more in Chapter 4!
Steps
- Identify the country codes that are included in either
economies
orcurrencies
but not inpopulations
. - Use that result to determine the names of cities in the countries that match the specification in the previous instruction.
-- Select the city name
SELECT name
-- Alias the table where city name resides
FROM cities AS c1
-- Choose only records matching the result of multiple set theory clauses
WHERE country_code IN
(
-- Select appropriate field from economies AS e
SELECT e.code
FROM economies AS e
-- Get all additional (unique) values of the field from currencies AS c2
UNION
SELECT c2.code
FROM currencies AS c2
-- Exclude those appearing in populations AS p
EXCEPT
SELECT p.country_code
FROM populations AS p
);
name |
---|
Bucharest |
Kaohsiung |
New Taipei City |
Taichung |
Tainan |
Taipei |
Success! Head over to the final chapter of this course to feel the power of subqueries at your fingertips!
4 4. Subqueries
In this closing chapter, you’ll learn how to use nested queries and you’ll use what you’ve learned in this course to solve three challenge problems.
4.1 Subqueries inside WHERE and SELECT clauses
Theory. Coming soon …
1. Subqueries inside WHERE and SELECT clauses
You’ve made it all the way to the last chapter of this course. Congratulations and keep up the excellent work. This last chapter is focused on embedding queries inside of queries. These are called nested queries and also known as subqueries as you saw in Chapter 3. The most common type of subquery is one inside of a WHERE statement. Let’s check out another one of these now with a little bit of setting up to do first, of course.
2. Subquery inside WHERE clause set-up
You’ve seen many examples of using a subquery inside a WHERE clause already with the semi-join and anti-join examples and exercises you just completed. With the WHERE clause being the most common place for a subquery to be found, it’s important that you see just one more example of doing so. With this being the final chapter, it’s time to unveil the remaining fields in the states table. Note that the continent field is not shown to display all the fields here.The fert_rate field gives an estimate for the average number of babies born per woman in each country. The women_parli_perc field gives the percentage of women in the elected federal parliament for each country. Across these 13 countries, how would you determine the average fertility rate?
3. Average fert_rate
We will use the average fertility rate as part of a subquery. Recall how this is done. The average babies born to women across these countries is 2-point-28 children.
4. Asian countries below average fert_rate
Let’s use the previous slide’s query as a subquery to determine Asian countries that fall under this average. You’ll see the code in a couple steps. First we select the country name and the fertility rate for Asian countries.
5. Asian countries below average fert_rate
Next, we want to choose records where fert_rate is smaller than What comes next?
6. Asian countries below average fert_rate
The subquery is to get the average fertility rate! And now we can check out our result to make sure it makes sense.
7. Asian countries below average fert_rate
It appears so. These are the two Asian countries we were looking for with fertility rates below 2-point-28 babies per woman.
8. Subqueries inside SELECT clauses - setup
The second most common type of a subquery is inside of a SELECT clause. The task here is to count the number of countries listed in the states table for each continent in the prime_ministers table. Let’s again take the stepwise approach to setting up the problem. What does this code do? [PAUSE] It gives each of the five continents in the prime_ministers table. Let’s keep building our answer in the next slide.
9. Subquery inside SELECT clause - complete
Next is determining the counts of the number of countries in states for each of the continents in the last slide. Combining a COUNT clause with a WHERE statement matching the continent fields in the two tables gets us there. Let’s check out the code and then discuss a bit further. The subquery involving states also can reference the prime_ministers table in the main query. Any time you do a subquery inside a SELECT statement like this, you need to give the subquery an alias like countries_num here. Please pause the video here and carefully review this code. The result of this query comes next. [PAUSE] It’s kinda like magic that this works, huh?! If you haven’t discovered it already, there are often many different ways to solve problems with SQL queries. You could use a carefully constructed JOIN to achieve this same result, for example.
10. Let’s practice!
Test out your subquery expertise with a few exercises. I’ll see you back soon in the subqueries inside FROM clauses video!
4.2 Subquery inside where
You’ll now try to figure out which countries had high average life expectancies (at the country level) in 2015.
Steps
- Begin by calculating the average life expectancy across all countries for 2015.
-- Select average life_expectancy
SELECT AVG(life_expectancy)
-- From populations
FROM populations
-- Where year is 2015
WHERE year = 2015;
avg |
---|
71.67634 |
- Recall that you can use SQL to do calculations for you. Suppose we wanted only records that were above
1.15 * 100
in terms of life expectancy for 2015:
Select all fields from populations
with records corresponding to larger than 1.15 timesaverage you calculated in the first task for 2015. In other words, change the 100
in the example above with a subquery.
-- Select fields
SELECT *
-- From populations
FROM populations
-- Where life_expectancy is greater than
WHERE life_expectancy >
-- 1.15 * subquery
1.15 * (SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015) AND
year = 2015;
pop_id | country_code | year | fertility_rate | life_expectancy | size |
---|---|---|---|---|---|
21 | AUS | 2015 | 1.833 | 82.45122 | 23789752 |
376 | CHE | 2015 | 1.540 | 83.19756 | 8281430 |
356 | ESP | 2015 | 1.320 | 83.38049 | 46443992 |
134 | FRA | 2015 | 2.010 | 82.67073 | 66538392 |
170 | HKG | 2015 | 1.195 | 84.27805 | 7305700 |
174 | ISL | 2015 | 1.930 | 82.86098 | 330815 |
190 | ITA | 2015 | 1.370 | 83.49024 | 60730584 |
194 | JPN | 2015 | 1.460 | 83.84366 | 126958470 |
340 | SGP | 2015 | 1.240 | 82.59512 | 5535002 |
374 | SWE | 2015 | 1.880 | 82.55122 | 9799186 |
Good work! Let’s see how you do on a more high-level question in one go.
4.3 Subquery inside where (2)
Use your knowledge of subqueries in WHERE
to get the urban area population for only capital cities.
Steps
- Make use of the
capital
field in thecountries
table in your subquery. - Select the city name, country code, and urban area population fields.
-- Select fields
SELECT name, country_code, urbanarea_pop
-- From cities
FROM cities
-- Where city name in the field of capital cities
WHERE name IN
-- Subquery
(SELECT capital
FROM countries)
ORDER BY urbanarea_pop DESC;
name | country_code | urbanarea_pop |
---|---|---|
Beijing | CHN | 21516000 |
Dhaka | BGD | 14543124 |
Tokyo | JPN | 13513734 |
Moscow | RUS | 12197596 |
Cairo | EGY | 10230350 |
Kinshasa | COD | 10130000 |
Jakarta | IDN | 10075310 |
Seoul | KOR | 9995784 |
Mexico City | MEX | 8974724 |
Lima | PER | 8852000 |
Alright. You’ve got some practice on subqueries inside WHERE
now. Time to see how you do when these subqueries are in the SELECT
statement!
4.4 Subquery inside select
In this exercise, you’ll see how some queries can be written using either a join or a subquery.
You have seen previously how to use GROUP BY
with aggregate functions and an inner join to get summarized information from multiple tables.
The code given in the first query selects the top nine countries in terms of number of cities appearing in the cities
table. Recall that this corresponds to the most populous cities in the world. Your task will be to convert the second query to get the same result as the provided code.
Steps
- Submit the code to view the result of the provided query.
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
/*
SELECT ___ AS ___,
(SELECT ___
FROM ___
WHERE countries.code = cities.country_code) AS cities_num
FROM ___
ORDER BY ___ ___, ___
LIMIT 9;
*/
country | cities_num |
---|---|
China | 36 |
India | 18 |
Japan | 11 |
Brazil | 10 |
Pakistan | 9 |
United States | 9 |
Indonesia | 7 |
Russian Federation | 7 |
South Korea | 7 |
- Convert the
GROUP BY
code to use a subquery inside ofSELECT
by filling in the blanks to get a result that matches the one given using theGROUP BY
code in the first query. - Again, sort the result by
cities_num
descending and then bycountry
ascending.
/*
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM cities
INNER JOIN countries
ON countries.code = cities.country_code
GROUP BY country
ORDER BY cities_num DESC, country
LIMIT 9;
*/
SELECT countries.name AS country,
-- Subquery
(SELECT COUNT(*)
FROM cities
WHERE countries.code = cities.country_code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;
country | cities_num |
---|---|
China | 36 |
India | 18 |
Japan | 11 |
Brazil | 10 |
Pakistan | 9 |
United States | 9 |
Indonesia | 7 |
Russian Federation | 7 |
South Korea | 7 |
Great! The next video will introduce you to using subqueries in the FROM
clause. Exciting stuff!
4.5 Subquery inside FROM clause
Theory. Coming soon …
1. Subquery inside the FROM clause
The last basic type of a subquery exists inside of a FROM clause. A motivating example pertaining to the percentage of women in parliament will be used now to help you understand this style of subquery. Let’s dig in!
2. Build-up
First, let’s determine the maximum percentage of women in parliament for each continent listed in states. Recall that this query will only work if you include continent as one of the fields in the SELECT clause since we are grouping based on that field. Let’s check out the result. We see that Europe has the largest value and North America has the smallest value for the countries listed in the states table.
3. Focusing on records in monarchs
What if you weren’t interested in all continents, but specifically those in the monarchs table. You haven’t seen this yet in the course but you can include multiple tables in a FROM clause by adding a comma between them. Let’s investigate a way to get the continents only in monarchs using this new trick. We have at least part of our answer here, but how do we get rid of those duplicate entries? And what about the maximum column?
4. Finishing off the subquery
To get Asia and Europe to appear only once, use the DISTINCT command in your SELECT statement. But now how do you get that maximum column to also come along with Asia and Europe? Instead of including states in the FROM clause, include the subquery instead and alias it with a name like subquery. There you have it! That’s how to include a subquery as a temporary table in your FROM clause.
5. Let’s practice!
You are very close to the end of the course. Awesome work on making it this far! The last remaining exercises are designed to really test your understanding of the material. You may be a bit frustrated but stick with it and you’ll take the knowledge gained in these problems with you as you continue to develop your SQL skills. See you in a bit for the course review video!
4.6 Subquery inside from
The last type of subquery you will work with is one inside of FROM
.
You will use this to determine the number of languages spoken for each country, identified by the country’s local name! (Note this may be different than the name
field and is stored in the local_name
field.)
Steps
- Begin by determining for each country code how many languages are listed in the
languages
table usingSELECT
,FROM
, andGROUP BY
. - Alias the aggregated field as
lang_num
.
-- Select fields (with aliases)
SELECT code, COUNT(*) AS lang_num
-- From languages
FROM languages
-- Group by code
GROUP BY code;
code | lang_num |
---|---|
PRY | 2 |
NRU | 3 |
MDG | 3 |
ASM | 5 |
TZA | 4 |
PLW | 6 |
NLD | 1 |
VEN | 2 |
BMU | 2 |
MSR | 1 |
- Include the previous query (aliased as
subquery
) as a subquery in theFROM
clause of a new query. - Select the local name of the country from
countries
. - Also, select
lang_num
fromsubquery
. - Make sure to use
WHERE
appropriately to matchcode
incountries
and insubquery
. - Sort by
lang_num
in descending order.
SELECT local_name, subquery.lang_num
FROM countries,
(SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code) AS subquery
WHERE countries.code = subquery.code
ORDER BY lang_num DESC;
local_name | lang_num |
---|---|
Zambia | 19 |
YeItyop´iya | 16 |
Zimbabwe | 16 |
Bharat/India | 14 |
Nepal | 14 |
Mali | 13 |
France | 13 |
South Africa | 13 |
Malawi | 12 |
Angola | 12 |
This one wasn’t easy!
4.7 Advanced subquery
You can also nest multiple subqueries to answer even more specific questions.
In this exercise, for each of the six continents listed in 2015, you’ll identify which country had the maximum inflation rate, and how high it was, using multiple subqueries. The table result of your final query should look something like the following, where anything between <
>
will be filled in with appropriate values:
+------------+---------------+-------------------+
| name | continent | inflation_rate |
|------------+---------------+-------------------|
| <country1> | North America | <max_inflation1> |
| <country2> | Africa | <max_inflation2> |
| <country3> | Oceania | <max_inflation3> |
| <country4> | Europe | <max_inflation4> |
| <country5> | South America | <max_inflation5> |
| <country6> | Asia | <max_inflation6> |
+------------+---------------+-------------------+
Again, there are multiple ways to get to this solution using only joins, but the focus here is on showing you an introduction into advanced subqueries.
Steps
- Create an
INNER JOIN
withcountries
on the left andeconomies
on the right withUSING
, without aliasing your tables or columns. - Retrieve the country’s name, continent, and inflation rate for 2015.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
USING (code)
-- Where year is 2015
WHERE year = 2015;
name | continent | inflation_rate |
---|---|---|
Afghanistan | Asia | -1.549 |
Angola | Africa | 10.287 |
Albania | Europe | 1.896 |
United Arab Emirates | Asia | 4.070 |
Argentina | South America | NA |
Armenia | Asia | 3.731 |
Antigua and Barbuda | North America | 0.969 |
Australia | Oceania | 1.461 |
Austria | Europe | 0.810 |
Azerbaijan | Asia | 4.049 |
- Select the maximum inflation rate in 2015
AS max_inf
grouped by continent using the previous step’s query as a subquery in theFROM
clause.
Thus, in your subquery you should:
- Create an inner join with
countries
on the left andeconomies
on the right withUSING
(without aliasing your tables or columns). - Retrieve the country name, continent, and inflation rate for 2015.
- Alias the subquery as
subquery
.
- Create an inner join with
- This will result in the six maximum inflation rates in 2015 for the six continents as one field table. Make sure to not include
continent
in the outerSELECT
statement.
-- Select the maximum inflation rate as max_inf
SELECT MAX(inflation_rate) AS max_inf
-- Subquery using FROM (alias as subquery)
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
USING (code)
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent;
max_inf |
---|
21.858 |
39.403 |
121.738 |
7.524 |
48.684 |
9.784 |
- Now it’s time to append your second query to your first query using
AND
andIN
to obtain the name of the country, its continent, and the maximum inflation rate for each continent in 2015. - For the sake of practice, change all joining conditions to use
ON
instead ofUSING
.
-- Select fields
SELECT name, continent, inflation_rate
-- From countries
FROM countries
-- Join to economies
INNER JOIN economies
-- Match on code
ON countries.code = economies.code
-- Where year is 2015
WHERE year = 2015
-- And inflation rate in subquery (alias as subquery)
AND inflation_rate IN (
SELECT MAX(inflation_rate) AS max_inf
FROM (
SELECT name, continent, inflation_rate
FROM countries
INNER JOIN economies
ON countries.code = economies.code
WHERE year = 2015) AS subquery
-- Group by continent
GROUP BY continent);
name | continent | inflation_rate |
---|---|---|
Haiti | North America | 7.524 |
Malawi | Africa | 21.858 |
Nauru | Oceania | 9.784 |
Ukraine | Europe | 48.684 |
Venezuela | South America | 121.738 |
Yemen | Asia | 39.403 |
Wow! Well done! This code works since each of the six maximum inflation rate values occur only once in the 2015 data. Think about whether this particular code involving subqueries would work in cases where there are ties for the maximum inflation rate values.
4.8 Subquery challenge
Let’s test your understanding of the subqueries with a challenge problem! Use a subquery to get 2015 economic data for countries that do not have
gov_form
of'Constitutional Monarchy'
or'Republic'
in theirgov_form
. Here,gov_form
stands for the form of the government for each country. Review the different entries forgov_form
in thecountries
table.
Steps
- Select the country code, inflation rate, and unemployment rate.
- Order by inflation rate ascending.
- Do not use table aliasing in this exercise.
-- Select fields
SELECT code, inflation_rate, unemployment_rate
-- From economies
FROM economies
-- Where year is 2015 and code is not in
WHERE year = 2015 AND code NOT IN
-- Subquery
(SELECT code
FROM countries
WHERE (gov_form = 'Constitutional Monarchy' OR gov_form LIKE '%Republic%'))
-- Order by inflation rate
ORDER BY inflation_rate;
code | inflation_rate | unemployment_rate |
---|---|---|
AFG | -1.549 | NA |
CHE | -1.140 | 3.178 |
PRI | -0.751 | 12.000 |
ROU | -0.596 | 6.812 |
BRN | -0.423 | 6.900 |
TON | -0.283 | NA |
OMN | 0.065 | NA |
TLS | 0.553 | NA |
BEL | 0.620 | 8.492 |
CAN | 1.132 | 6.900 |
Superb! Let’s review subqueries before you head off to the last video of this course!
4.9 Subquery review
4.10 Question
Within which SQL clause are subqueries most frequently found?
✅ WHERE
⬜ FROM
⬜ SELECT
⬜ IN
Correct!
4.11 Course review
Theory. Coming soon …
1. Course Review
Only the challenge problems remain! Way to go! You’re on your way to being a SQL query warrior using PostgreSQL! Before you tackle the three challenge problems, let’s review the main topics covered throughout the course.
2. Types of joins
In SQL, a join combines columns from one or more tables in a relational database via a lookup process.There are four different types of joins you learned about in this course. First, an INNER JOIN is also denoted as just JOIN in SQL. A special case of an INNER JOIN you explored is called a self-join.Second, there are three OUTER JOINs denoted as LEFT JOIN (or LEFT OUTER JOIN), RIGHT JOIN (or RIGHT OUTER JOIN), and FULL JOIN (or FULL OUTER JOIN).Third, you worked with CROSS JOINs to create all possible combinations between two tables. Lastly, you investigated semi-joins and anti-joins. Remember that words appearing in ALL capital letters correspond to the joins having simple SQL syntax. Self-joins, semi-joins, and anti-joins don’t have built-in SQL syntax.
3. INNER JOIN vs LEFT JOIN
An INNER JOIN keeps only the records in which the key field (or fields) is in both tables. A LEFT JOIN keeps all the records in fields specified in the left table and includes the matches in the right table based on the key field or fields. Key field values that don’t match in the right table are included as missing data in the resulting table of a LEFT JOIN.
4. RIGHT JOIN vs FULL JOIN
A RIGHT JOIN keeps all the records specified in the right table and includes the matches from the key field (or fields) in the left table. Those that don’t match are included as missing values in the resulting table from the RIGHT JOIN query. A FULL JOIN is a combination of a LEFT JOIN and a RIGHT JOIN showing exactly which values appear in both tables and those that appear in only one or the other table.
5. CROSS JOIN with code
A CROSS JOIN matches all records from fields specified in one table with all records from fields specified in another table. Remember that a CROSS JOIN does not have an ON or USING clause, but otherwise looks very similar to the code for an INNER JOIN, LEFT JOIN, RIGHT JOIN, or FULL JOIN.
6. Set Theory Clauses
Recall that UNION includes every record in both tables but DOES NOT double count those that are in both tables whereas UNION ALL DOES replicate those that are in both tables. INTERSECT gives only those records found in both of the two tables. EXCEPT gives only those records in one table BUT NOT the other.
7. Semi-joins and Anti-joins
When you’d like to filter your first table based on conditions set on a second table, you should use a semi-join to accomplish your task. If instead you’d like to filter your first table based on conditions NOT being met on a second table, you should use an anti-join. Anti-joins are particularly useful in diagnosing problems with other joins in terms of getting fewer or more records than you expected.
8. Types of basic subqueries
The most common type of subquery is done inside of a WHERE clause. The next most frequent types of subqueries are inside SELECT clauses and inside FROM clauses. As you’ll see in the challenge exercises, subqueries can also find their way into the ON statement of a join in ways similar to what you’ve seen inside WHERE clauses too.
9. Own the challenge problems! You got this!
Well, you are only three exercises away from mastering the content in this course. You are a true SQL ninja. Now take down these last three problems using all the skills you’ve built up in this course!
4.12 Final challenge
Welcome to the end of the course! The next three exercises will test your knowledge of the content covered in this course and apply many of the ideas you’ve seen to difficult problems. Good luck!
Read carefully over the instructions and solve them step-by-step, thinking about how the different clauses work together.
In this exercise, you’ll need to get the country names and other 2015 data in the economies
table and the countries
table for Central American countries with an official language.
Steps
- Select unique country names. Also select the total investment and imports fields.
- Use a left join with
countries
on the left. (An inner join would also work, but please use a left join here.) - Match on
code
in the two tablesAND
use a subquery inside ofON
to choose the appropriatelanguages
records. - Order by country name ascending.
- Use table aliasing but not field aliasing in this exercise.
-- Select fields
SELECT DISTINCT name, total_investment, imports
-- From table (with alias)
FROM countries AS c
-- Join with table (with alias)
LEFT JOIN economies AS e
-- Match on code
ON (c.code = e.code
-- and code in Subquery
AND c.code IN (
SELECT l.code
FROM languages AS l
WHERE official = 'true'
) )
-- Where region and year are correct
WHERE region = 'Central America' AND year = 2015
-- Order by field
ORDER BY name;
name | total_investment | imports |
---|---|---|
Belize | 22.014 | 6.743 |
Costa Rica | 20.218 | 4.629 |
El Salvador | 13.983 | 8.193 |
Guatemala | 13.433 | 15.124 |
Honduras | 24.633 | 9.353 |
Nicaragua | 31.862 | 11.665 |
Panama | 46.557 | 5.898 |
One down, two to go!
4.13 Final challenge (2)
Whoofta! That was challenging, huh?
Let’s ease up a bit and calculate the average fertility rate for each region in 2015.
Steps
- Include the name of region, its continent, and average fertility rate aliased as
avg_fert_rate
. - Sort based on
avg_fert_rate
ascending. - Remember that you’ll need to
GROUP BY
all fields that aren’t included in the aggregate function ofSELECT
.
-- Select fields
SELECT region, continent, AVG(fertility_rate) AS avg_fert_rate
-- From left table
FROM countries AS c
-- Join to right table
INNER JOIN populations AS p
-- Match on join condition
ON c.code = p.country_code
-- Where specific records matching some condition
WHERE year = 2015
-- Group appropriately?
GROUP BY region, continent
-- Order appropriately
ORDER BY avg_fert_rate;
region | continent | avg_fert_rate |
---|---|---|
Southern Europe | Europe | 1.426100 |
Eastern Europe | Europe | 1.490889 |
Baltic Countries | Europe | 1.603333 |
Eastern Asia | Asia | 1.620714 |
Western Europe | Europe | 1.632500 |
North America | North America | 1.765750 |
British Islands | Europe | 1.875000 |
Nordic Countries | Europe | 1.893333 |
Australia and New Zealand | Oceania | 1.911500 |
Caribbean | North America | 1.950571 |
Interesting. It seems that the average fertility rate is lowest in Southern Europe and highest in Central Africa. Two down, one to go!
4.14 Final challenge (3)
Welcome to the last challenge problem. By now you’re a query warrior! Remember that these challenges are designed to take you to the limit to solidify your SQL knowledge! Take a deep breath and solve this step-by-step.
You are now tasked with determining the top 10 capital cities in Europe and the Americas in terms of a calculated percentage using city_proper_pop
and metroarea_pop
in cities
.
Do not use table aliasing in this exercise.
Steps
- Select the city name, country code, city proper population, and metro area population.
- Calculate the percentage of metro area population composed of city proper population for each city in
cities
, aliased ascity_perc
.
- Focus only on capital cities in Europe and the Americas in a subquery.
- Make sure to exclude records with missing data on metro area population.
- Order the result by
city_perc
descending. - Then determine the top 10 capital cities in Europe and the Americas in terms of this
city_perc
percentage.
-- Select fields
SELECT name, country_code, city_proper_pop, metroarea_pop,
-- Calculate city_perc
city_proper_pop / metroarea_pop * 100 AS city_perc
-- From appropriate table
FROM cities
-- Where
WHERE name IN
-- Subquery
(SELECT capital
FROM countries
WHERE (continent = 'Europe'
OR continent LIKE '%America'))
AND metroarea_pop IS NOT NULL
-- Order appropriately
ORDER BY city_perc DESC
-- Limit amount
LIMIT 10;
name | country_code | city_proper_pop | metroarea_pop | city_perc |
---|---|---|---|---|
Lima | PER | 8852000 | 10750000 | 82.34419 |
Bogota | COL | 7878783 | 9800000 | 80.39575 |
Moscow | RUS | 12197596 | 16170000 | 75.43349 |
Vienna | AUT | 1863881 | 2600000 | 71.68773 |
Montevideo | URY | 1305082 | 1947604 | 67.00962 |
Caracas | VEN | 1943901 | 2923959 | 66.48182 |
Rome | ITA | 2877215 | 4353775 | 66.08552 |
Brasilia | BRA | 2556149 | 3919864 | 65.21015 |
London | GBR | 8673713 | 13879757 | 62.49182 |
Budapest | HUN | 1759407 | 2927944 | 60.09018 |