Using last_insert_id() and how it works
An item that comes up frequently is managing data in two tables where the data is related and you anticipate managing multiple inserts with the data. the worry becomes how to manage the data without running into integrity problems between users.
Let's illustrate the problem with a basic table set-up. We will have one table for inserting articles and the other will hold the categories the articles can belong to.
create table articles
(articleid int,
title varchar(50),
content varchar(250));
create table articletypes(
articleid int
category int);
Normally we'd use TEXT type or some other larger column type for the content, but we are less interested in the size of this data than some of the other data. This size will suffice for now.
Now we know we need to give an article an id, we will also need it for both tables. One approach I've seen people use is to manage the id by having a file hold the id, get the id from the file, increment the id and store it back into the file.
There are many problems with this, but I will quickly outline the two that should stand out. First, you would be saving data outside your database to manipulate data inside your database. You should not have to do so as the database already has such a tool to help you. Secondly and more importantly is the data integrity issue. What happens if two users grab the value at once or both try to write the file at the same time? That's right, both could end up with the same value for their insert, or both could write two conflicting values to the file at the same time.
The next "solution" I've seen is to retrieve the value of the largest id value from the initial table, increment it and then use that new value for insert into both tables. Again it brings up the same data integrity issues, what is known as a race condition. You grab the max value from the id in the articles table, increment it by 1. Let's say the value is 100. So you grab the value of 100 by doing a select:
select max(articleid) from articles
You increment the value so you have 101 to insert into the articles table. At the same time, I also retrieve the value of 100 from the articles table and increase by 1. Perhaps you insert your row a split second after mine. We now have an issue of two articles being the same id, or you already have declared articleid as a primary key and thus the second insert fails and generates a warning of duplicate key on the insert. What now? Let's take this a step further. What happens if you insert your row in the first table and then I do mine shortly after yours. I then quickly insert my data into the second table. We now have two articles in the articles table with the same articleid and I insert my data into the second table before yours? We now have conflicting data in the two tables.
What can we do about this problem? The answer is to declare articleid in the articles table an auto_increment column. Like this:
drop table if exists articles;
create table articles
(articleid int auto_increment primary key,
title varchar(50),
content varchar(250));
An auto increment column works like this, each row that is inserted gets an increased articleid, each user that gets logged in has their own session if you will, so when they insert a row and get the value for articleid for insert into a next table it is not affected by any inserts by concurrent users.
When you do an insert, you either declare NULL for the value of the articleid or you leave that column out entirely. Like this:
insert into articles values
(null, "our article", "our article is about using auto increment columns to ensure data integrity between simultaneous inserts by users.")
Note the value is NULL, meaning the abscense of value and not "null" which would merely be the string with the value "null"
If this method is not quite comfortable to you, then name the columns in your insert and leave off the articleid column entirely:
insert into articles (title, content) values
("our article", "our article is about using auto increment columns to ensure data integrity between simultaneous inserts by users.")
The value that was inserted as the articleid is retrievable by using last_insert_id().
select last_insert_id()
You then have the value to enter into the next table. If another user does an insert into the first table and the second table before you insert your row in the second table, they will not get the same articleid as you, nor would the wrong value get inserted into the second table.
Inserting into the second table is easy as well as you can use last_insert_id directly in your insert like this:
insert into articles (title, content) values
("our article", "our article is about using auto increment columns to ensure data integrity between simultaneous inserts by users.");
insert into articletypes values
(last_insert_id(), 'full articles')
Now when we look at the last inserted rows for both tables we can see they match
select
articleid,
title,
content
from articles
order by articleid desc
limit 1
select
articleid,
category
from articletypes
order by articleid desc
limit 1
If you are using more than two tables there are two scenarios to be aware of. The first type is where the second and third tables do not have an auto_increment type column and are inserted into using the value of the last_insert_id() from the first table. The second scenario involves two of the tables containing auto_increment columns.
Scenario one: as described, one table with an auto increment column, two tables without such a column
create table movies(
id int auto_increment primary key,
title varchar(40));
create table admissions(
movieid int,
gross decimal(10,2));
create table genres(
genre varchar(10),
movieid int);
So now we want to enter data for a film. we will get the auto increment value from the movies table and insert that into both the admissions and genres table.
insert into movies
(title)
values
('Jaws');
select
id, title
from movies;
insert into admissions
(movieid, gross)
values
(last_insert_id(), 20500.00);
select
movieid, gross
from admissions;
insert into genres
(genre, movieid)
values
('Horror',last_insert_id());
select
genre, movieid
from genres;
So you can see how simple it is to manage tables when there is only one auto increment type column involved. What happens when you have two different tables with auto increment columns and you need to insert info from the related tables into subsequent tables? Let's say we are creating a music database, the tables will be artist, albums and tracks.
create table artist(
artistid int auto_increment primary key,
name varchar(20));
insert into artist (name) values ('Nirvana');
create table albums(
albumid int auto_increment primary key,
album varchar(40));
create table tracks(
ar_id int,
alb_id int,
trackname varchar(50));
I'm going to enter a single row in the albums table (perhaps you know of an artist you want to add but don't have an album yet), the reason for entering this row will be evident in the explanation below of using auto_increment in multiple tables.
Upon entering a row in the artist table we will have incremented the auto_increment artistid column. We can then save that to a variable for use in the tracks table.
insert into artist (name) values ('Black Sabbath');
set @artistid = last_insert_id();
insert into albums (album) values ('Paranoid');
insert into tracks values
(@artistid, last_insert_id(), 'War Pigs');
Note we could have saved the album id to a variable as well and used that for the insert into the tracks table.
If we don't save the artistid into a variable watch what happens:
insert into artist (name) values ('Pink Floyd');
select last_insert_id();
insert into albums (album) values ('Atom Heart Mother');
select last_insert_id();
As you see the value for last_insert_id is the value last inserted, not from a specific table, but the last table, so the value in the artistid is lost to us as far as using last_insert_id() to insert into the tracks table goes.
Finally I want to talk about the biggest misconception people have in the use of an auto_increment column. Actually it is two things. First people want to know the value of the auto_increment column before they use it and secondly people worry about gaps in the sequence when they have deleted rows. Both items are related and can be refuted by the work shown above.
First the misconception of needing to know the value of the auto_incrementing column before doing an insert. As you can see above, it is not necessary to know such a value. You enter your row in the primary table and then use last_increment_id() either directly, or saved in a variable, to do the insertions in the secondary table(s).
Secondly the notion of gaps. Say we have entered 10 rows starting at 1*, then after we delete row 5 some people want the next row inserted to use 5 for the auto increment column so there are no "gaps" in the sequence of numbers.
The problem with this is twofold: first the auto_increment number continues to increase in value (it will only error out when it has reached the maximum value that type of column can hold. For instance if you use tinyint that can hold a value between 1 and 127 the next row will produce an error if the auto_increment has reached 127, even if there are only 3 rows in the table. the solution there of course is to use a larger type such as smallint or int.)
The second, and more important thing, your application should not rely on there being no gaps in the sequence of numbers. your users should not know anything about the auto increment column, nor should you display it to them. The only reason you should use an auto increment column is when there are no other obvious candidate columns for your primary key. If you have a table using Social Insurance Numbers for instance, those should be your primary key. when all other columns in the table may have duplicates, then and only then should you consider creating a column as your primary key. At that point, the column is only so you don't have duplicate rows, and then you show the users the rest of your columns and not the primary key.
* you can assign the auto increment value to begin at whatever number you want. Just note it so when creating the table or adding the column as so:
create table mytesttable (id int auto_increment primary key) auto_increment = 140000;