LLMs (I refuse to call them AI, as there's no intelligence to be found) are simply random word sequence generators based on a trained probability model. Of course they're going to suck at math, because they're not actually calculating anything, they're just dumping what their algorithm "thinks" is the most likely response to user input.
"The ability to speak does not make you intelligent" - Qui-Gon Jin
Programming languages are structured and have rigid syntax that fits well in a LLM model, so it spitting out working code for simple things is like having a sentence that is structured like a normal person.
The code might not do what you are actually trying to do, or might work while being inefficient, even if it runs.
Ive heard the chatgpt math problem was fixed in the new one by having it write a python code to complete the math problem and then providing the answer when the code is run.