Do you really need Foreign Keys?

Relational database - Edition #21

May 06, 2024

Hey, I'm Marco and welcome to my newsletter!

As a software engineer, I created this newsletter to share my first-hand knowledge of the development world. Each topic we will explore will provide valuable insights, with the goal of inspiring and helping all of you on your journey.

In this episode, I introduce the concept of foreign keys in relational databases such as MySQL by showing the advantages and disadvantages of using them

Subscribe to Implementing 💻

👋 Introduction

Imagine we want to represent a reality where we have customers and the orders they have placed, with a one-to-many relationship.

In UML we could represent it like this:

that in MySQL is reflected in this DDL:

CREATE TABLE customers
(
    id    INT AUTO_INCREMENT PRIMARY KEY,
    name  VARCHAR(50),
    email VARCHAR(100)
);

CREATE TABLE orders
(
    id          INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT,
    order_date  DATE,
    amount      DECIMAL(10, 2)
);

To establish a 1-to-many relationship, we create the foreign key customer_id in the orders table. This allows each order to be associated with the respective customer who placed it. In this way, we created a connection from the orders table to the customers table.

Foreign keys

The purpose of foreign keys is to facilitate navigation and linking between two tables, managed through a CONSTRAINT in the database. This means that when inserting new records in tables containing foreign keys, the database ensures that the value entered for the FK exists in the reference table, thus guaranteeing so-called referential integrity.

The foreign key constraint is specified during the definition of the database tables. In the previous example of customers and orders, we can make the constraint explicit like this:

CREATE TABLE customers
(
    id    INT AUTO_INCREMENT PRIMARY KEY,
    name  VARCHAR(50),
    email VARCHAR(100)
);

CREATE TABLE orders
(
    id          INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT,
    order_date  DATE,
    amount      DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers (id)
);

this constraint introduces advantages and disadvantages in its use, which will be addressed in the following sections.

Subscribe to Implementing 💻

Pros

✅ Database consistency

Maintaining data consistency implies that the relationships between tables, especially those established by foreign keys (FK), remain consistent. Imagine a scenario in which we are dealing with data stored in several tables that are related to each other. If we do not manage these relationships correctly, there is a risk of encountering orphan records, i.e. cases where a foreign key in one table points to a record that no longer exists in the reference table.

This can lead to inaccuracies and inconsistencies in data analysis.

Through foreign key constraints, we prevent this from happening by ensuring that our analyses reflect only valid and meaningful data.

✅ CASCADE actions

When we set up foreign key constraints, we can choose strategies to handle what happens when we delete a parent row. In MySQL, with ON DELETE CASCADE, if we delete the parent row, the related child rows can also be automatically deleted. Another option is ON DELETE SET NULL, which means if we delete the parent row, the foreign key values in the child rows will be set to NULL.

CREATE TABLE customers
(
    id    INT AUTO_INCREMENT PRIMARY KEY,
    name  VARCHAR(50),
    email VARCHAR(100)
);

CREATE TABLE orders
(
    id          INT AUTO_INCREMENT PRIMARY KEY,
    customer_id INT,
    order_date  DATE,
    amount      DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES customers (id) ON DELETE CASCADE
);

However, it is important to be careful because implementing these actions can potentially impact a large number of records and could lead to a slowdown in database operations. When a parent row is deleted, especially in scenarios where there are many associated child rows, the database must perform additional operations to maintain data consistency according to the specified constraints. This can lead to increased processing times, especially in the presence of complex relationships and a significant volume of data.

✅ Model alignment

Without clear documentation, developers may have difficulty understanding the model and the relationships between entities through foreign keys.

There are several tools, such as DataGrip, that facilitate introspection by providing a visual representation of the database schema. This graphical view helps developers to more easily understand the structure of the database and get an overview, enabling them to navigate the relationships and understand the connections between different entities, resulting in more efficient development processes.

Database model of my automatic AI-based newsletter Quickview for YouTube video summaries

you can delve into the model in this episode:

How I made Quickview, my AI-based Crypto Newsletter

Marco Moauro

Feb 5

Read full story

Subscribe to Implementing 💻

Cons

❌ Schema evolution

Once two tables are linked via a foreign key, changing the data type of the columns becomes challenging. For example, changing a column from int to bigint in MySQL can create difficulties. This limitation stems from the fact that the FK constraint is based on the consistency of data types and lengths between referenced columns. Changing column types violates this constraint and compromises data integrity.

When considering horizontal scaling techniques such as sharding or partitioning, maintaining FK relationships between shards or partitions presents a significant challenge as data is distributed across multiple nodes or partitions, requiring careful planning and implementation to avoid inconsistencies.

❌ Increased locking

When performing update or delete operations on a table that has external relationships with other tables through foreign keys, the database might need to perform more frequent locking operations to maintain data referential integrity. This can increase the load on the system and potentially lead to performance degradation in case of long running queries impacting records referenced from other tables, especially in highly concurrent environments.

❌ Slower writes and memory consumption

Foreign keys impact write operations, since the database must ensure that they respect referential integrity constraints. Without an index on the foreign key column, this verification process can be slow, as the database sequentially scans the related table to verify the key constraint. This may be particularly time-consuming for large tables.

To speed up this procedure, an index can be included on the column affected by the foreign key. While this speeds up the checking of the key constraint, it also introduces additional overhead during write, update or delete operations, as the index must be adjusted accordingly.

At each insert operation we should therefore add the time T1 for checking the integrity constraint and T2 for updating the index.

Subscribe to Implementing 💻

📚 References

🌟 Top lectures of the week

Why I'm not worried about AI

Jade Wilson

Small Bets for Engineers — with Daniel Vassallo 🎙️

Luca Rossi

What is Kafka? Architecture Deep Dive

Javarevisited Newsletter

Why reading whitepapers takes your career to the next level (and how to do it)

Jordan Cutler

And that’s it for today! If you are finding this newsletter valuable, consider doing any of these:

🍻 Read with your friends — Implementing lives thanks to word of mouth. Share the article with someone who would like it.
Share
📣 Provide your feedback — We welcome your thoughts! Please share your opinions or suggestions for improving the newsletter, your input helps us adapt the content to your tastes.
Leave a comment
💬 Chat with me — If you have any doubts or curiosity, please write to me, I will be happy to answer you!