Is AI Going to Replace Software Engineers?

How good is AI at software coding?  

Carl Brown, founder of the YouTube channel Internet of Bugs, has made a name for himself by holding AI claims up to scrutiny. So we asked the physicist and software developer to help us assess how good AI is at coding tasks.

Using our AI testing software, which simultaneously queries five leading AI models, Brown asked the models coding questions. He published the results on his YouTube channel and spoke with us about his work on our Ingredients video interview series. 

He found that they were not adept at simple tasks such as debugging code that cropped and resized an image. And he found that they also were not good at an important part of software development – estimating the size and scope of a task. Four of the five models estimated that building a simple photo browser would be the same size task as building a replica of the TikTok app.  

 “Much of the job of a software developer, or at least a senior software developer, is actually talking to people,” Brown said.

Brown started his channel because his child is studying computer science and was worried that AI would make their field obsolete before their career even began. For now, Brown says he doesn’t believe the hype of AI models performing at a human programmer level is real.

Ingredients
Hypothesis
Generative AI cannot replace software engineers, but it can do parts of the job.
Sample size
A dozen questions were asked to five AI models: OpenAI’s GPT-4, Anthropic’s Claude 3 Opus, Google’s Gemini, Mistral’s Mixtral, and Meta’s LLama 2.
Techniques
Posed three types of questions to models: ones that require recent coding knowledge, ones that have multiple solutions, and tasks that require planning.
Key findings
AI models often produced generic answers instead of producing tailored solutions to or plans to execute the specific task at hand, and overall, fell short of what one would expect of a human software engineer.
Limitations
Questions were limited to those that someone who does not code would likely understand. The sample size was small and models may perform differently after updates.

Republish This Article