I’ve found this article lacking. Like some other articles in this space, it introduces isolation levels through the lens of the phenomena described in the SQL standard, but I find that there’s a different, more intuitive approach.
I think it’s more tractable to define this problem space starting from the concept of (strict) serializability, which is really a generalization of the concept of thread safety. Every software engineer has an intuitive understanding of it. Lack of serializability can lead to execution-dependent behavior, which usually results in hard-to-diagnose bugs. Thus, all systems should strive towards serializability, and the database can be a tool in achieving it.
Various non-serializable levels of database transaction isolation are relaxations of the serializability guarantee, where the database no longer enforces the guarantee and it’s up to the database user to ensure it through other means.
The isolation phenomena are a useful tool for visualizing various corner cases of non-serializability, but they are not inherently tied to it. It's possible to achieve serializability while observing all of the SQL phenomena. For example, a Kubernetes cluster with carefully-written controllers can be serializable.
show comments
zadikian
Seems like a frequent surprise is that Postgres and MySQL don't default to serializable (so not fully I in ACID). They do read-committed. I didn't see this article mention that, but maybe I missed it. The article says read-committed provides "slightly" better performance, but it's been way faster in my experience. Forget where, but I think they said they chose this default for that reason.
Using read-committed ofc means having to keep locking details in mind. Like, UNIQUE doesn't just guard against bad data entry, it can also be necessary for avoiding race conditions. But now that I know, I'd rather do that than take the serializable performance hit, and also have to retry xacts and deal with the other caveats at the bottom of https://www.postgresql.org/docs/current/transaction-iso.html
show comments
interlocutor
A lot of database tools these days prioritize instant sharing of updates over transactions and ACID properties. Example: Airtable. As soon as you update a field the update shows up on your coworkers screen who also has the same table open. The downside of this is that Airtable doesn't do transactions. And the downside of not doing transactions is potentially dangerous data inconsistencies. More about that here: https://visualdb.com/blog/concurrencycontrol/
rishabhaiover
It's an absolute pleasure reading planetscale blogs. I'm curious about what tool is used to make these visualizations?
show comments
lasgawe
We built an entire project for a client-side project with millions of SQL rows and thousands of users without adding a single transaction. :/
show comments
cryptonector
In the section about serializable read TFA gets the `accounts` `balance` wrong.
show comments
jascha_eng
This actually used to be one of my favorite interview questions for backend engineers. Everyone has used transactions but depending on your seniority you'd understand it to different degrees.
And no I'd never expect people to know the isolation levels by heart, but if you know there are different ones and they behave differntly that's pretty good and tells me you are curious about how things work under the hood.
It goes into not only different isolation levels, but also some ambiguity in the traditional ACID definition.
I believe a 2nd edition is imminent.
chamomeal
never used planetscale but I’ve always liked their blog, and other content. One of the founders interviewed on the software engineering daily podcast and it was super interesting
zabzonk
I thought this was pretty good, not least because it attempts to explain isolation levels, something I always found pretty tricky when teaching SQL. Mind you, I was only teaching SQL, and so isolation, as part of C and C++ courses so that our clients could do useful stuff, but explaining what levels to use was always tuff.
unkulunkulu
> At this stage, it has nothing to do with xmin and xmax, but rather because other transactions cannot see uncommitted data
Am I missing something or this statement is incomplete? Also I find the visualization of commit weird, it “points to” the header of the table, but then xmax gets updated “behind the scenes”? Isnt xmax/xmin “the mechanism behind how the database knows what is committed/not committed”? Also, there could be subtransactions, which make this statement even more contradictory?
I enjoyed the visualizations and explanations otherwise, thanks!
show comments
Quarrelsome
I have learned about the beauty of predicate locks. That's such a sexy way of dealing with the issue instead of just blithely funneling all writes.
nkzd
Have you ever seen anyone changing transaction isolation levels in code? I think pessimistic or optimistic locking is preferred way to handle transaction concurrency.
show comments
nazwa
Is it just me, or are the final results of the deadlock visualisations incorrect? In both animations (mysql/pg), the final `SELECT balance from account...` queries appear to show the result of the two sessions, which have been terminated.
show comments
zzzeek
I think this is a great post to have but I'm going to make a critical usability suggestion:
* the videos should have "pause" and a "step at a time" control *
Even at the "half speed", without a deep knowledge of the context, the videos move way too fast for me to read the syntax that's invoking and line it up with the data on the left side. I (and im definitely not the only one) need to be able to sit on one step and stare at the whole thing without the latent anxiety of the state changing before I've had a chance to grok the whole thing.
this has nothing to do with familiarity with the concepts (read my profile). I literally need time to read all the words and connect them together mentally (ooh, just noticed this is pseudo-SQL syntax also, e.g. "select id=4", that probably added some load for me) without worrying they're going to change before watching things move.
I’ve found this article lacking. Like some other articles in this space, it introduces isolation levels through the lens of the phenomena described in the SQL standard, but I find that there’s a different, more intuitive approach.
I think it’s more tractable to define this problem space starting from the concept of (strict) serializability, which is really a generalization of the concept of thread safety. Every software engineer has an intuitive understanding of it. Lack of serializability can lead to execution-dependent behavior, which usually results in hard-to-diagnose bugs. Thus, all systems should strive towards serializability, and the database can be a tool in achieving it.
Various non-serializable levels of database transaction isolation are relaxations of the serializability guarantee, where the database no longer enforces the guarantee and it’s up to the database user to ensure it through other means.
The isolation phenomena are a useful tool for visualizing various corner cases of non-serializability, but they are not inherently tied to it. It's possible to achieve serializability while observing all of the SQL phenomena. For example, a Kubernetes cluster with carefully-written controllers can be serializable.
Seems like a frequent surprise is that Postgres and MySQL don't default to serializable (so not fully I in ACID). They do read-committed. I didn't see this article mention that, but maybe I missed it. The article says read-committed provides "slightly" better performance, but it's been way faster in my experience. Forget where, but I think they said they chose this default for that reason.
Using read-committed ofc means having to keep locking details in mind. Like, UNIQUE doesn't just guard against bad data entry, it can also be necessary for avoiding race conditions. But now that I know, I'd rather do that than take the serializable performance hit, and also have to retry xacts and deal with the other caveats at the bottom of https://www.postgresql.org/docs/current/transaction-iso.html
A lot of database tools these days prioritize instant sharing of updates over transactions and ACID properties. Example: Airtable. As soon as you update a field the update shows up on your coworkers screen who also has the same table open. The downside of this is that Airtable doesn't do transactions. And the downside of not doing transactions is potentially dangerous data inconsistencies. More about that here: https://visualdb.com/blog/concurrencycontrol/
It's an absolute pleasure reading planetscale blogs. I'm curious about what tool is used to make these visualizations?
We built an entire project for a client-side project with millions of SQL rows and thousands of users without adding a single transaction. :/
In the section about serializable read TFA gets the `accounts` `balance` wrong.
This actually used to be one of my favorite interview questions for backend engineers. Everyone has used transactions but depending on your seniority you'd understand it to different degrees.
And no I'd never expect people to know the isolation levels by heart, but if you know there are different ones and they behave differntly that's pretty good and tells me you are curious about how things work under the hood.
For all interested in this topic, I highly recommend the book Designing Data Intensive Applications https://www.goodreads.com/book/show/23463279-designing-data-....
It goes into not only different isolation levels, but also some ambiguity in the traditional ACID definition.
I believe a 2nd edition is imminent.
never used planetscale but I’ve always liked their blog, and other content. One of the founders interviewed on the software engineering daily podcast and it was super interesting
I thought this was pretty good, not least because it attempts to explain isolation levels, something I always found pretty tricky when teaching SQL. Mind you, I was only teaching SQL, and so isolation, as part of C and C++ courses so that our clients could do useful stuff, but explaining what levels to use was always tuff.
> At this stage, it has nothing to do with xmin and xmax, but rather because other transactions cannot see uncommitted data
Am I missing something or this statement is incomplete? Also I find the visualization of commit weird, it “points to” the header of the table, but then xmax gets updated “behind the scenes”? Isnt xmax/xmin “the mechanism behind how the database knows what is committed/not committed”? Also, there could be subtransactions, which make this statement even more contradictory?
I enjoyed the visualizations and explanations otherwise, thanks!
I have learned about the beauty of predicate locks. That's such a sexy way of dealing with the issue instead of just blithely funneling all writes.
Have you ever seen anyone changing transaction isolation levels in code? I think pessimistic or optimistic locking is preferred way to handle transaction concurrency.
Is it just me, or are the final results of the deadlock visualisations incorrect? In both animations (mysql/pg), the final `SELECT balance from account...` queries appear to show the result of the two sessions, which have been terminated.
I think this is a great post to have but I'm going to make a critical usability suggestion:
* the videos should have "pause" and a "step at a time" control *
Even at the "half speed", without a deep knowledge of the context, the videos move way too fast for me to read the syntax that's invoking and line it up with the data on the left side. I (and im definitely not the only one) need to be able to sit on one step and stare at the whole thing without the latent anxiety of the state changing before I've had a chance to grok the whole thing.
this has nothing to do with familiarity with the concepts (read my profile). I literally need time to read all the words and connect them together mentally (ooh, just noticed this is pseudo-SQL syntax also, e.g. "select id=4", that probably added some load for me) without worrying they're going to change before watching things move.
please add a step-at-a-time button!