A Guide to Combine Data from Multiple Tables in SQL Using Joins

SQL is a powerful language for querying and manipulating data from relational databases. One of the most common tasks in SQL is to join data from multiple tables based on a related column or condition. Joining data allows you to create more complex and meaningful queries that can answer a variety of business questions.

In this blogpost, we will explore the different types of joins in SQL, when to use them, and some tips and best practices for writing efficient and readable join queries. We will use the AdventureWorks Database 2022, a sample database that contains data about a fictional company that sells bicycles and accessories. You can download the backup file for this database and restore it to your SQL Server instance following the instructions here.

What are Joins in SQL?

Joins in SQL are used to combine information from two or more tables based on a common relationship between them. Imagine you have two sets of data stored in different tables, and you want to bring them together to analyze or present them as a single dataset. Joins allow you to do just that by matching rows from one table with rows from another table, using a shared column or condition. This enables you to retrieve related information from multiple tables in a single query, making it easier to understand and work with your data.

Why do we need to combine data in SQL?

There are many reasons why we might need to combine data from different tables in SQL. Some of the common ones are:

  • To get a complete picture of the data. For example, if we want to analyze the sales performance of our online store, we might need to combine data from the orders, products, and customers tables to get information about the order details, product categories, and customer segments.

  • To perform calculations or aggregations on the data. For example, if we want to calculate the total revenue of our online store, we might need to combine data from the orders and products tables to get the order quantity and product price for each order.

  • To filter or sort the data based on criteria from different tables. For example, if we want to find the top 10 customers who spent the most on our online store, we might need to combine data from the orders and customers tables to get the order amount and customer name for each order, and then sort the result by the order amount in descending order.

Types of Joins in SQL

There are different types of joins in SQL, each serving different purposes. Our choice of join also depends on how we want to combine the data from the tables. The most common types of joins are:

  • Inner Join

  • Left Join

  • Right Join

  • Full Join

Inner Join

INNER JOIN merges rows from two or more tables by matching values in a related column, serving as a filter that only returns rows with corresponding entries in both tables. This join is particularly useful when you need to combine related data from different sources to form a comprehensive dataset. It is the most common type of join and the default join if you do not specify the join type.

Looking at the picture above, let’s assume the user wants to get the countries of the customers in the DimCustomer table from the DimCountry table. Using an Inner Join will result in the picture below:

Did you notice what happened?

Both tables were matched using the CountryID column which is common on both tables. The INNER JOIN operation was used to combine rows from these two tables where the CountryID matches, showing only the customers who have a corresponding CountryID. Let’s explore an example using the AdventureWorks Database 2022. To answer the question “What are the names, IDs, and product numbers of the products that have been sold, including their sales order ID and unit price?”. To achieve this, we have to join two tables; Sales.SalesOrderDetail and Production.Product.

The Sales.SalesOrderDetail table typically contains detailed line items for each sales order, including which products were sold in each order, their quantities, prices, and discounts. The Production.Product table typically holds information about the products, such as their name, ID, and product number.

The query below answers the question above:

SELECT p.name as ProductName, p.ProductID, p.Productnumber, s.SalesOrderID, s.UnitPrice

FROM Production.Product p

JOIN Sales.SalesOrderDetail s

ON p.ProductID=s.ProductID

The query is structured to select product names, product IDs, product numbers from the Production.Product table, as well as SalesOrderID and UnitPrice from the Sales.SalesOrderDetail table. The join is made on the ProductID column that is common to both tables, meaning it will return a combined dataset that includes the product details along with sales order details for each product that exists in both tables.

The picture below shows a snippet of the query result:

Left Join

LEFT JOIN returns all the rows from the left table and the matching rows from the right table. If there is no match for a row in the left table, the result set will contain null values for the columns from the right table. Imagine two tables, DimCustomer and DimCountry, representing a database structure in a retail context. The DimCustomer table holds customer data with CustomerID, FirstName, and a CountryID linking customers to their countries. Conversely, DimCountry contains country data with CountryID and Country name. A left join operation between these tables allows you to list all customers along with their associated country names. The join ensures that even if some customers don't have a linked country (due to missing or NULL CountryID), they are still included in the query output, providing a complete view of the customer records.

Right Join

A RIGHT JOIN works similarly to a left join but in reverse; it returns all rows from the right table, regardless of whether they have matching entries in the left table. If a matching row is not found in the left table, the result set will contain NULL values for those columns. While not as commonly used as inner or left joins, right joins have their moment in the spotlight when you need to ensure no data on the 'right' is left unacknowledged.

Full Join

FULL JOIN returns all the rows from both tables, whether they have a match or not. If there is no match for a row in either table, the result set will contain null values for the columns from the other table. This type of join is the union of both Left and Right Joins, bringing together rows that match the join condition and also those that do not, with NULLs filling in for missing data.

The Full Join shines in scenarios where you need a complete picture, one that includes all data from the involved tables. Whether it’s for a comprehensive audit, data reconciliation, or when dealing with optional relationships, the Full Join is your comprehensive net that captures every data point.

From the image above, imagine two tables within a retail database: DimCustomer and DimCountry. The DimCustomer table stores individual customer information, while DimCountry keeps track of countries. A Full Join operation between these tables would allow us to see all customers with their associated countries and all countries, revealing those without any linked customers. This complete overview provides valuable insights for business strategies, like identifying potential markets without current customer representation or customers lacking country information.

Tips and Best Practices for Writing Join Queries

  • Understand the Data Model: Know the relationships between tables to determine the most appropriate join type.

  • Use Aliases: Aliases can simplify your queries and make them more readable.

  • Be Explicit with Join Conditions: Clearly state the conditions under which the tables should be joined to avoid Cartesian products.

  • Know When to Use Outer Joins: Use left or right outer joins when you need to include rows that do not have matching rows in the other table.

  • Avoid Redundant Data: Especially with outer joins, be aware of the potential for including redundant rows.

  • Optimize for Performance: Be mindful of the join order and WHERE clauses, as they can significantly impact query performance.

 

In summary, mastering the different types of joins in SQL is essential for any data professional looking to harness the full potential of relational databases. As we've explored through the examples of Joins, we've seen how each join serves a unique purpose in data retrieval and analysis. Remember, the key to writing effective joins is to understand your data and write clear and concise conditions. With these best practices in hand, you can execute queries that are not only accurate but also efficient. As you continue to explore the power of SQL joins, let these insights be your guide to deeper data discovery.

Previous
Previous

Web Scraping with Microsoft Excel

Next
Next

Field Parameters in Power BI