MySQL command parsing: How to split by semicolon while ignoring semicolons inside quotes?

I’m working with a text editor that contains multiple SQL statements and need to parse them correctly. Here’s what I’m dealing with:

select * from users;
select count(*) from orders;
describe products;

Currently I’m using this approach to separate the commands:

string[] sqlCommands = textEditor.Content.Split(';');

This gives me the expected result:

sqlCommands[0] = "select * from users";
sqlCommands[1] = "select count(*) from orders";
sqlCommands[2] = "describe products";

This works perfectly for simple cases. However, I run into issues when dealing with INSERT statements that contain semicolons within the string values:

select * from users;
insert into products values ("item;name", "desc;ription", 100);
insert into orders values ("data;value", 25);

When I use the same splitting method, it incorrectly breaks apart the INSERT statements because it treats every semicolon as a command separator, even those inside quoted strings. I need a way to only split on semicolons that are actual statement terminators, not those that appear within string literals. What’s the best approach to handle this parsing challenge?

State machines are your friend here. I built something similar for a query analyzer and went with a simple state machine that tracks parsing context. You track states for normal parsing, inside single quotes, inside double quotes, and handle escape sequences properly. The key is differentiating between quote types and handling MySQL’s specific escaping rules. Basic flags break down with mixed quotes or backslash escapes. State machines handle edge cases like comments with semicolons and work with all MySQL string formats without getting confused by complex nesting.

Been dealing with this parsing nightmare for years in my SQL editor project. Everyone mentions character iteration and state machines, but here’s another angle - just use existing SQL parsing libraries instead of building your own. JSqlParser or ANTLR grammars handle all the edge cases like comments, multi-line strings, and MySQL’s weird syntax quirks. Yeah, they add dependency overhead, but they’re battle-tested against thousands of real SQL variations. If you’re set on manual parsing, remember MySQL supports both quote types and has bizarre rules about doubling quotes for escaping. Don’t forget semicolons inside comments should be ignored too, not just quoted strings.

I hit this exact problem building a database migration tool. Here’s what works: parse character by character and track whether you’re inside quotes with a boolean flag. Only treat semicolons as delimiters when the flag shows you’re outside quoted strings. Don’t forget escaped quotes - stuff like "escaped\"quote" will trip you up. Also watch out for mixing single and double quotes since SQL uses both. Yeah, going character by character sounds slower than regex, but it’s way more reliable and still performs great even on huge SQL files.

just tokenize it right - split on semicolons but skip them when you’re inside quotes. track whether you’re in a quote as you go through each character. pretty straightforward once you handle the escapes properly.

Regex works if you dont wanna loop through chars. Split on semis that aren’t in quotes using lookahead/lookbehind. But char-by-char is safer - SQL gets whacky with nested quotes and escaping.