General Purpose AI Uses 20 to 30 Times More Energy than Task-Specific AI

In the past five years, Google’s electricity consumption has increased 186 percent, Microsoft’s has increased 186 percent and Meta’s has increased 367 percent. We don’t know exactly how much of that surge is due to AI - but we definitely know that AI is a big part of it.

As we have been reporting in our series on AI and the climate, most recently in How AI is Killing Climate Goals, tech companies do not disclose the climate costs of their AI projects. And that makes it hard for lawmakers and the public to determine whether AI’s extraordinary energy use is worth it.

Much of what we know about AI’s carbon emissions come from a study “Power Hungry Processing: Watts Driving the Cost of AI Deployment” by researchers at Hugging Face, an AI platform, and Carnegie Mellon University. They measured carbon emissions from leading open source AI models, which doesn’t include the best known AIs such as Open Ai’s ChatGPT and Google’s Gemini and Anthropic’s Claude. But even those estimates show that we are entering an extremely energy intensive era of computing. 

In our Proof Ingredients video series, I interviewed one of the authors, Sasha Luccioni, research scientist and climate lead at Hugging Face, about generative AI’s extraordinary climate impacts. In our conversation she talks about how the climate impacts of general purpose AI models are magnitudes higher than for task-specific models and calls for more transparency from the companies in order to truly assess the climate impacts of AI. 

A transcript of our conversation, edited for brevity and clarity, is below:

—-----

Angwin: Will AI solve climate change — or make it worse? Predictions are flying around, but here at Proof News, we're actually interested in the facts on the ground right now.

I'm Julia Angwin, founder of Proof News. Welcome to our Proof Ingredients video series, where we talk to people who have proved things about how they did it.

In this segment, we're so lucky to have Sasha Luccioni here to inform us, amidst all of the hype and doomsday scenarios about AI that are swirling around. Sasha has actually done the hard work of analyzing the actual, true, on-the-ground climate costs of AI models that are up and running right now. Her numbers are the most solid information we have, since the companies that make AI aren't telling us that much.

Sasha has a PhD in artificial intelligence and is currently the climate lead for a tech company called Hugging Face that hosts thousands of AI models and apps with the public. Sasha, thank you so much for joining us to talk about your groundbreaking work.

Luccioni: Thank you for having me. It's such an honor.

Angwin: It's great to have you here. And maybe you can tell us your background, how you got into artificial intelligence. You were probably thinking ahead.

Luccioni: Yeah, it's actually really funny. I kind of drifted towards AI. I started out in linguistics because I really like languages. And then I did cognitive science because I wanted to understand how people learn language and how they think. And then finally, I kind of picked up some programming, and I did an internship, and I was like, hey, this programming thing has its benefits. Like, the fact is that you can really kind of create something. It’s a very powerful way of interacting with the world. And so essentially, for my PhD, I decided to go into artificial intelligence. My thesis was on education and how to help people learn languages using AI.

And then when the real world happened after that, I started doing more applied research. And then after a couple of years of working in applied research, I had what I call a quarter-life crisis when I was like, 'What am I doing with my life?' I was like, I'm going to quit my job. I'm going to go plant trees. I'm going to, I don't know, live off the grid. And then my partner was like, well, maybe you can use this like ten years of studies that you did to do some good, some environmentally positive work. And that's how I started working in AI and climate change, and we actually created a community — the climate change AI community — that didn't exist before. And I've been working in this space for like six years now.

Angwin: You come from a long line of women scientists, is that right? Can you tell me?

Luccioni: Yeah, exactly. My great-grandmother, she was a researcher in geology, and then my great grandmother, a chemist. And my mother has a PhD in statistics.

Angwin: So let's talk about your investigation that I was referring to. There's so much out there that is attempting to predict what will happen with AI, right? Like, will it solve climate change? There's a whole bunch of people who say, oh, it's going to magically figure out ways for us to solve it. And then there's all the people saying, oh, it's going to make things so much worse. 

All these projections. But the projections have to be based on something, and that something is sort of unknown because the companies aren't telling us a lot of details about how much electricity AI uses. So tell me what you were looking into investigating and how you came up with the question you were looking into.

Luccioni: Three years ago, when I joined Hugging Face, it was in the scope of the BigScience project that brought together, like, a thousand researchers and AI practitioners and legal scholars from all over the world to train the first open-source large language model, called BLOOM. I was in charge of the carbon footprint estimation work. And we did some really cool stuff because we actually had access to the training logs and we could go into the really granular details of energy usage and carbon emissions and stuff like that.

As part of that study, we essentially deployed the model — we put it up in a cloud compute server — and then people were querying it, kind of organically, people were just interacting with it. We measured the energy usage and estimated carbon emissions based on where that energy was coming from. That was really interesting because we found that, depending on the number of requests we got, obviously the energy consumption varied, but also that there was always an amount of energy that was dedicated to keeping the model in memory. So even if no one was actually interacting with it, it still used energy.

And so that was kind of like the first inkling for me in terms of like, let's look at deployment, because all the previous carbon footprint estimation work and energy estimation work was really focused on the model training, because it's kind of a little bit more well defined. Like, you press a button and then the model training starts, and then you press stop, and then you can really calculate quite precisely how much energy was used and how many tons or kilograms or whatever of carbon emissions were emitted.

But for a deployment, it's a lot more, like — people kept telling me that it's just impossible to estimate because it's almost intangible, like there's all these different instances of models running in all these different places. And so every time I would ask a tech company: how much energy is this AI tool using? They'd always be like, oh, well, it's complicated. We don't have the numbers. And so that's how we decided to do our study on AI model deployment.

Angwin: And so what was your hypothesis going in? Were you testing how much electricity is used per query? What was the exact question that you were looking at?

Luccioni: It was kind of a broad question of like, how do we understand the energy usage of deploying AI models and how does it vary? Essentially, we defined ten different tasks, because when I started kind of playing around with models just off the Hugging Face website — open-source models that people share — and then I realized that there's all these different ways in which models are used.

So I was like, well maybe it doesn't really make sense to compare text-generation models with image-generation models, because they're kind of doing different things. And so we started thinking about how to make categories and how to test the energy consumption of different models. And essentially the idea was like, can we figure out what impacts this energy consumption? Is it going to be the size of the models? Is it going to be the task that they're used for? Is it going to be whether they're generative models or discriminative models that are limited to a set number of categories. What are these different criteria and can we actually measure them?

Angwin: And so then how did you measure them?

Luccioni:  We defined ten different tasks, everything from text classification, image classification, summarization, to question answering. For each of these tasks, we took popular models on these tasks, and we used the same data set across as many models as we could get our hands on, so, I think over 100 models in total, and we compared how much energy was used. The idea was if you use the same hardware, if you use the same data set, if you make sure that you keep all the variables stable, and then you only have the model that's different, can you really measure the energy — like, what differs in terms of energy consumption.

One of the most surprising things that we found was that if you have the same task, like answering a question, and on one hand you have a model that's only doing that task — it's called extractive question answering. So essentially, like, you have a text and you're like, 'What year was Napoleon born?' And it's going to find the Wikipedia entry, the year of birth of Napoleon. And on the other hand, you have a generative model, and you asked it what year was Napoleon born, and it's essentially generating the answer to your question. It's not really extracting it from any, like, set document. It's actually, you know, 'inventing' it, from its training data. There was a 20 or 30 fold difference in terms of energy usage, for the same questions.

Angwin: 20, 30 times more energy used by the generative models?

Luccioni: Yeah. Typically generative models use orders of magnitude more energy for various reasons. On the one hand, it's really that the task is actually different because you're not extracting existing information, you're inventing, generating new information. But also, most generative models in this day and age are supposed to multitask. They're supposed to be able to do multiple things at once. So like both, you know, giving you recipes and answering questions and writing poems and captioning images, and all these different kinds of sub-tasks. And so they're typically bigger and they need more training data. And I think that also part of the issue is that we have these 'general purpose' models that are just inefficient compared to single-task models.

Angwin: So how much energy are we talking about, right? Like, when I do a query like that, is it the same as turning on my lights in my room or leaving a tea kettle running? Like, is there a way to say what the electricity usage is per query, or is that too small a unit?

Luccioni: No it's not, but we don't have the information — like, essentially if people try to estimate it, but Google or whoever hasn't released the information — so we don't know. I mean, what they say is, depending on where you're located in the world, the actual model or the actual algorithm that's serving your results can be at a different place. So, it's hard to estimate exactly how much energy is being used. Essentially, we don't have these numbers.

It's not that much. The estimate that I've seen was like 0.3 watt hours of energy. But the thing is, how many Google queries are done every day, right? How fast does that add up?

Angwin: By comparison, when you do something like that with one of the generative AI models that you tested — it would be like three watt hours of energy?

Luccioni: According to people's estimates, it's around ten times more. If you take the smallest extractive question answering model and the biggest generative model, the difference is like 20 to 30 times more energy.

Angwin: Got it. Okay. So if it's three watt hours for a generative AI model to do a query, what does that look like in my real life? Like what do I do that uses — I'm not used to thinking of my daily life in terms of watt hours.

Luccioni: So something like, maybe, charging a smartphone. Not for the Google query, but for image generation, generating a high quality image. Using one of the bigger image generation models is essentially equivalent to half of a smartphone charge, just to give you an example of orders of magnitude. So it's actually pretty significant. Whereas I mean, for text generation, it would be a lot less because instead of generating pixels on an image you're generating words, tokens. So it's a lot less.

Angwin: So your study, you looked at 100 models and you got these estimates for all of their tasks. And so overall, what was your conclusion about electricity usage? Was it higher than you expected?

Luccioni: It is really variable. I was actually pretty struck by the range. The most efficient task is text classification. Something that's done a lot, for example, is trying to classify either movie reviews or product reviews as being positive or negative. That's the most efficient task. 

If you compare image generation to that, it's thousands of times more energy. If you think about it, on one hand, you have positive / negative as the only two possible options for classifying a product review, whereas when you're generating an image you have the pixel space, however big that is. And for each pixel you have all of the choices of colors, right? Which could be 256, or sometimes it's like thousands of color choices. Essentially the number of possibilities is a lot bigger when you're generating an image versus just classifying a text. So it was a huge range. 

So for instance, if I'm using Meta's Llama it might be one thing, and if I'm using OpenAI it might be another thing. 

Angwin: And if I understand correctly, you actually couldn't test a lot of the closed-source models, right?

Luccioni: I couldn't test any of the closed-source models. These are all open-sourced ones. Essentially, Hugging Face, the company I work at, is like a platform for sharing AI models. And a lot of big tech companies share, like Llama. But for example, if you're using Google and they're doing some generative AI summary, that model won't be on Hugging Face because that's like a proprietary model.

Same thing for ChatGPT. There are some older versions of the GPT models on the Hugging Face hub, but the actual ones like GPT 4.0 or whichever one is being used now is not available. So I can't test it. And it's really frustrating because I got a lot of pushback being like, well, how about actual, like, real, live products? Real, live tools, like, what you studied is not what people use in real life. And you're like, yeah, but I can't access those models. I can't test those models.

Angwin: The companies actually could tell you without even running a test, they could just tell you exactly how much electricity is being used. But they do not.

Luccioni: Right. And also there's currently this weird dynamic because the companies that provide a lot of the AI compute and the solutions and stuff like that are actually the ones making the products. I mean, there's a bit of a conflict of interest, let's say. For example, you could, you know, use Microsoft Copilot, or you could also host your own AI model on one of their Azure compute instances. It's the same company. So I feel that like, they just really don't have any incentive to give you this information because —

Angwin:— because they control all the parts of the supply chain basically. It's Microsoft all the way down.

Luccioni: Exactly. And also they have this incentive to sell you more products. They put so much money into this, they build out these huge data centers that use all this energy. And now they need people to use these data centers.

Angwin: But it is kind of crazy because I think we do have some inkling that this is causing a lot — energy use at these companies to skyrocket, right? Because we've seen the latest sustainability reports the companies publish. And they don't necessarily say that it's because of AI. But it does look like their energy use is really exponentially increasing.

Luccioni:  Yeah. Like Microsoft and Google recently published their reports. And essentially, I think Microsoft the number was like 30% — I think their energy use went up 30% within the last year. They say it's partially due to AI, but it's not 100% due to AI. But what's interesting is essentially they're saying our climate targets or our net zero ambitions were so ambitious, that we didn't expect AI to have such an impact on them. Essentially, they're even themselves surprised, because most of these targets were made before generative AI was the thing.

So you have that on one hand, like, the numbers are there, but on the other hand, a lot of people from these companies keep saying that the problem is just going to solve itself. Like we're seeing the spike now, but either AI is going to help solve climate change or the efficiency is going to go up so much that it's not an issue anymore. Or we're going to have some magical energy solution that's going to magically be invented in the next year or so.

On the one hand, you do have the numbers. On the other hand, you have this like corporate messaging, which is just like, nothing to see here, why are you even talking about this?

Angwin: I find it so interesting because it appears that there's no evidence for it, right? It's magical thinking. They're like, oh, AI's going to solve climate change. And I'm like, well, just show me one little shred of anything that would lead me to believe that that's true. I mean, am I missing something? I don't understand those claims.

Luccioni: So, for me, they're two separate conversations. Climate change is not like a single problem, and that's kind of the issue, right? There's a lot of different things to be solved. And it's true that AI can be a part of the puzzle in a lot of those things. For example, optimizing energy grids, actually. You can use AI to dispatch or to, like, connect grids in ways that are more optimal. It can actually save energy. And AI can be used to detect leaks in energy grids or do predictive maintenance — there's all these kinds of energy-based applications. There's also improved climate modeling. There's all sorts of heating and cooling systems in buildings, for example, that are really inefficient. You can save a lot of money and energy by using AI and doing smart adaptive stuff.

So there are small specific questions that can be solved to some extent, or could be helped by AI, but it's not a panacea. It's not going to solve everything. And presenting it as, we need to be using this much energy because AI is going to solve climate change for me is just like — there's no logical link between these two things.

Angwin:  So let's talk about what we know and don't know based on your work. So, I think you put the first contours around the actual usage of these models. But there still remain a lot of unknowns. So tell me what we still don't know and what we need to know.

Luccioni: The unknowns about proprietary models in general, like, each ChatGPT query, or each Google search — I think we really need to know those numbers. And there's also the question of how many copies of these models are running because ChatGPT is not going to be one model. It's probably like thousands of them or tens of thousands of them. Given the 10 million users they have every day, you need to have like X number of models in order to reply. So we need the atomic number, but also the cumulative number.

In the BigScience project when we were doing the carbon footprint estimation, what I realized is that we were always looking at just the energy usage. But there's also the life cycle. And then what we proposed in that paper is that maybe we should look at the AI life cycle because, for example, if you buy a tote bag or a pair of jeans, you can look it up and there's a life cycle analysis of the water that was used, the cotton, the transportation, the dye. And I think that we should have similar things for AI because on the one hand, of course, we have the energy, but we also have the water, we also have the metals that were used for making the hardware, we also have, you know, the transportation and we have all these different pieces of the puzzle that we don't essentially see if we're only looking at energy, for example.

Angwin: Aren't you trying to put together some sort of rating for AI?

Luccioni: Yeah. It's still about energy use, but I've also been trying to get numbers about hardware. Most AI models are trained on what's called a GPU – a graphical processing unit that is a type of computer chip that was initially invented for computer games. They essentially can do a lot of parallel processing at once, whereas the traditional CPUs were very sequential, like you have one thing running at a time. AI needs a lot of parallel compute so they started using these GPUs.

Before you would use one GPU or eight GPUs. Now people are using like 10,000 GPUs essentially. And the amount of compute needed to train an AI model is now measured in millions of hours of GPU compute. And we don't have any numbers about where these chips are coming from and how many natural resources were used to create them. 

People are like, oh, I'm going to build myself this massive cluster for my company or for my business or whatever. And then they buy, like, you know, 10,000 Nvidia GPUs and then like, there's no assessment of, like, the impact that has on the planet or how much energy was used to create these GPUs or the rare earth metals mined in terrible conditions and countries that are far away that people don't think about.

Angwin: Right. All of these aspects really blow my mind.

Luccioni: So with regards to the project that I'm working on, it's starting with energy because it's kind of like where we have the most information. The idea is to develop energy star ratings for AI models for people who are either training a model or picking a model to use or for policymakers. When they want to get around to legislating these models, to give them an idea of how a given model measures up compared to other models in the same category. 

For example, for question answering, how does the model that you're looking at compare to other models that can do question answering. The same thing for image generation, for voice, for speech-to-text and all these different tasks. The goal is to establish these ranges of minimum and maximum energy efficiency and then essentially show how models measure up.

Angwin: That's so great. I would love to know that because, I mean, I feel like this sort of speaks to the fundamental question I have about generative AI, which is: Is it worth it? 

Is it worth all the electricity, water usage, metal usage, etc. Because right now, I'm not getting a huge value out of it. I'm a writer, so I don't really like it to write for me, but even when I have it make suggestions it's not great. Now my daughter likes it for programming and finds it helpful. So I do feel like there's some uses, but overall I don't know that I could make an informed decision about it because I don't understand what the cost is to the world.

Luccioni:  Yeah, and I honestly think there's no single answer to that question. I think that it's important to kind of disentangle the different usages. 

AI is such a broad term that sometimes people mean ChatGPT and sometimes they mean robots or whatever. I think it's important to have all these different categories and to say, well, for this category, here's the range. So actually, maybe in the grand scheme of things it's a lot of energy, but given the category it's relatively efficient.

Angwin: Well I appreciate your work on quantifying and giving us at least some sense of where we are with how much energy is being used when we use these models. And I'm super excited to see your energy star ratings. I feel like it would be really nice if they were competing on transparency. And like, trying to be the model that was the most transparent.

Luccioni: Before, people would train models and then they would write research papers where they gave access to code, they gave all the details, they gave, sometimes, like, the amount of compute that they used and what kind of compute it was. So even if I could — if I read the paper, I could be like, oh, I can estimate the energy usage based on these elements.

And then, since ChatGPT came out, essentially since November 2022, everyone's cracked down essentially. Now when an especially large language model comes out, the amount of information provided is usually not enough to know neither the energy usage nor where the data is coming from, nor like even sometimes the size of the model isn't divulged. So it's become pretty secretive as a domain.

Angwin:  I hope you get them to release this information, because I think it's really important. We're all trying to make the best decisions we can about our own behavior and how it impacts the climate. And so if we could have information to make better decisions about the use of AI, that would be awesome.

Luccioni: Exactly. When the EPA developed the original Energy Star ratings in the 80s or 90s, the way they were presenting it was not only efficiency for climate change's sake but it was really saving consumers money. And so that's also like an indirect part of the equation.

Angwin: Thank you for your work making AI more transparent for all.

Luccioni: Thank you. 

Republish This Article