In his final days in office, President Biden made a surprising move. He issued an executive order directing the Defense, Energy and the Interior departments to lease federal lands to companies that are seeking to build AI data centers.
“We will not let America be out-built when it comes to the technology that will define the future,” Biden said in a statement. And in his first days in office, President Trump indicated that he endorsed Biden’s move, saying, “I’d like to see federal lands opened up for data centers. I think they’re going to be very important.”
But, as we reported last month in our three-part series on the climate costs of AI, data centers come with environmental costs. And while the Biden Administration order requires developers to develop clean energy sources to fuel the data centers, it says nothing about mitigating the massive water usage of AI data centers.
Tons of water is needed to cool the computers that do the work of AI, particularly when data centers are often located in hot dry climates to maximize opportunities to generate solar energy.
The leading scholar on AI and water usage is Shaolei Ren, an associate professor of electrical and computer engineering at UC Riverside. In our Proof Ingredients video series, I interviewed Ren about the AI’s extraordinary water usage.
He said that AI water usage already exceeds the water usage of the biggest beverage companies in the world. And he suggested that we need climate labeling and transparency on AI so that users can make informed decisions about their AI usage.
A transcript of our conversation, edited for brevity and clarity, is below:
—-----
Angwin: You may have heard that when you ask AI a question, it's like pouring a bottle of water down the drain. That's a popular comparison that's going around to explain how much water is used by AI. And that number comes from a study by researchers at UC Riverside and it's the starting point for a recent Proof investigation into AI and water usage.
I'm Julia Angwin, the founder of Proof News. And this is Proof Ingredients, a series where we talk with journalists, content creators and experts about what went into a great investigation they did. In this episode, we speak with Shaolei Ren, one of the authors of the AI water study. He's an associate professor of electrical and computer engineering at UC Riverside, and his work focuses on responsible AI for a sustainable and equitable future.
Welcome, Professor Ren.
Ren: Thank you. Thank you for having me.
Angwin: So we're here to talk about AI water usage. Specifically, you did a very important study that came out last year called 'Making AI Less Thirsty: Uncovering and Addressing the Secret Water Footprint of AI Models.' So can you tell us just generally what you were looking at in that study and what led you to want to look into the water impacts of AI?"
Ren: We basically look at how much water AI will be consuming. By the way, water consumption is a technical term. So that means the water withdrawal minus water discharge — essentially that evaporative portion of water into the atmosphere.
We wanted to study how much water was consumed for doing the training and doing the inference for some popular AI models like GPT-3. Our study has found that if you have a conversation of 10 to 50 queries with ChatGPT, you're going to consume roughly 500 ml of water. And that consumption is evaporated water.
Why I started to work on this problem: I'm always interested in water and how we can save it because, when I was a little boy, I spent a couple of years of my childhood in a small town where water was rationed and we only had access to water for half an hour each day. So we just had to use water wisely and think about every possible way to save it. Of course, the situation has changed and there is no longer water rationing anymore, but I've just always been interested in how to save water.
When these large language models came out in recent years, people were amazed by how powerful, how capable they were. And some people started to look at the environmental impact and in particular, look at the carbon emissions. So I asked myself: how much water does AI consume? So that somehow motivated our study. And we just did some further research. And I was really shocked by the numbers.
Angwin: Yeah, I think everyone was shocked by the numbers you found. So let's walk through it slowly. I don't think it's obvious to most people why AI would use water. So let's just start at the very beginning. Essentially what you're talking about, as I understand it, is data centers where all the computers live that do the work of artificial intelligence. So explain what you're looking at when you're talking about AI and water use.
Ren: AI uses water throughout the whole lifecycle, from the direct water consumption to indirect water consumption. So the direct water consumption here we're talking about is for cooling down the data center facilities. So when you rent a server, when you're doing the training or inference, there's a huge amount of energy consumption and the energy is eventually transferred into heat. So we would just have to get rid of the heat from the servers to the outside environment.
And there are two steps. The first step is the so-called server level cooling, where you move the heat from the servers to some outdoor heat exchanger or to this data center facility. And the second step is so-called facility level cooling, where we move the heat from the data center facility to the outside atmosphere. And usually water evaporation is the most efficient way to move the heat from the facility to the outside environment. And actually, many of the large tech companies are using water evaporation.
So, this is where direct water consumption occurs. So basically we're just moving the heat from the facility to the outside environment. Some people will always say, oh, we're sort of using a closed loop. It's circular. Well, that's actually within the data center. So inside of the data center you could use liquid or you could use air, and there should be no water consumption at all unless there's a leak. So the direct water consumption is really about moving the heat from the data center facility to the outside environment.
For indirect water consumption, there are also two pathways for generating electricity, because when you use coal, nuclear, natural gas or even hydropower, there is a huge amount of water consumption. And the second part is for the supply chain to make the AI chips. And we need ultra-purified water to clean, to rinse the wafers to prevent contamination. So, that's so-called Scope 3 or supply chain water.
But in our study, we didn't look at the supply chain water. We're only looking at the direct water consumption and also the water consumption for generating electricity.
Angwin: What you're saying is it's actually worse than what you described?
Ren: Yeah, exactly,
Angwin: Because there's water usage even further along the cycle. But let's focus just on training and using AI. You make a distinction between water withdrawal and water consumption. Can you spell that out and explain why that matters?
Ren: Water withdrawal is just simply taking the water from a source. For example, when we take a shower, we withdraw a lot of water, probably a few tens of liters or, even more than 100l to take a shower. But we're just withdrawing water. We're not consuming much of it because the water that we take from the utility just goes back to the sewage immediately. And that water can be reused after processing in the short term.
Water consumption is the evaporated portion of water. That water goes somewhere else and it cannot be reused in the short term. Of course, the water still stays within our planet, but it just depends on where you are. Let's say in California we have a rain season for just a few weeks a year. So the water evaporated. You know, it'll come back, but there are a lot of uncertainties.
Angwin: So basically what you're saying is when you take this water and you pour it out, some of it goes back into the ground and that's reabsorbed. So essentially — that's not consumption. But the part that just evaporates into the air is the part that's lost.
Ren: Yeah, and actually some companies even say this water consumption is considered permanently lost, although it's not really permanent.
Angwin: So when you decided you wanted to study this, though, it's not like you can go — I mean, I'm assuming the company's not letting you come into their data centers to measure the water usage. So can you talk about how you collected the data and how hard it was because, as you talk about in the report, it was sort of a secret number.
Ren: So actually getting the number was really challenging. That was probably the most challenging part because the methodology has been pretty straightforward. We've been using that for many other prior studies.
But to get the numbers at first, we look at the energy consumption for GPT-3. Usually these model developers do not disclose how much energy they use for inference, but somehow I think that was one of the few models where the developers do say something about the energy consumption for inference.
So we got the number from their own papers, and also we verified the number from some third party studies for other similar models in terms of the size. So I think the number is a fairly accurate reflection of the actual energy consumption in the data center.
Once you have the energy consumption number, we also need the water efficiency number. Again, if you look at the large tech companies, most of them, they either do not talk about their water efficiency at all or they just say their global, annualized water efficiency, which is basically not very useful for our estimate, because we really want to know that regional water consumption for any of the models.
But Microsoft was more transparent compared to some other companies. Microsoft discloses the regional water usage effectiveness, and actually even for each data center location. So basically, we had these two pieces of information and we also looked at their power usage effectiveness. That's the energy overhead for cooling down the data centers.
We also checked the regional water intensity for generating electricity. And then we combine all this information together and put it into our formula to come up with the estimates.
Angwin: Basically, this whole study was about GPT-3 and comparable models. And so just so our audience understands, we're now on GPT-4.
Ren: Yeah, we're on GPT-4. It is a completely closed system. Nobody knows the exact numbers for energy or water consumption. But some of the reports say GPT-4 is a lot bigger than GPT-3. It probably uses the so-called mixture of expert architecture, which means at each time, only a subset of the model is activated. So that basically adds more uncertainty to the estimate for GPT-4. But I would personally expect the water consumption and energy consumption of GPT-4 to be substantially higher than GPT-3.
Angwin: So basically what you're describing is that you did a study of the early models and it's only going to be more consumption since then.
Using the information that you got, explain to me what your findings were. There's been this whole thing about the water bottle and how it's like one water bottle per query, but I think that's not quite right. Can you explain what I need to understand about my own AI usage?
Ren: So our estimate was based on the annualized water efficiency number for each location. It's not the global level, but it's for different locations. So that's why you see a number range between 10 and 50. So that depends on where the models are hosted. So if it's hosted in Arizona then the number is probably much lower. But if it's in Ireland, then the number could go up. It really varies a lot.
Also the number that we reported includes both the water consumption for directly cooling down the data center facilities and also the water for generating electricity. So depending on where you are, these two parts are a little bit higher than direct water consumption, but it could be comparable for some regions.
Angwin: But the problem is you don't know when you're running a query, which data center it's being sent to.
Ren: Actually, in a lot of cases, this inference is hosted in data centers operated by some third parties. They're not directly in those data centers directly owned by Microsoft. They could be hosted in, say, some co-location data center. We're basing our studies on Microsoft's own locations.
Angwin: In the paper it says Google, Microsoft and Meta's water usage for AI annually would be twice Denmark's annual water usage, correct?
Ren: That's actually for the global AI demand based on some projections of the energy usage in 2027. And also that refers to water withdrawal, not water consumption. Consumption is much less than water withdrawal.
The reason that we only show the water withdrawal withdrawal is there's no country level water consumption data. From the CIA website, we only got that water withdrawal for each country. So that's why we compared the water withdrawal.
But if you look at the water consumption part, it's still a lot. It's probably comparable to, to some beverage companies. If you view global AI as a single company, the direct water consumption of AI would be easily the biggest beverage company in the world.
Angwin: Which is Pepsi or Coca-Cola?
Ren: Yeah, it’s probably even bigger than those companies combined.
Angwin: They are selling pallets and pallets of this [holds up water bottle] and that is actually less than what AI is using. That's actually kind of terrifying.
Ren: I mean we’re looking at the global scale. It’s not for individual companies. But if you do look at the individual companies, some of them are already as big as a beverage company.
Angwin: I guess I really had not thought of AI as being in the beverage business. Your findings are really shocking, and it was the only one I've seen on this topic. Is there a reason there hasn't been more research about AI and water use?
Ren: Generally, water research is about ten years behind carbon research. Around 2010, the tech companies started to work on carbon neutrality, but only in 2020 did they start to look at water. And most of the companies only look at the Scope 1 water, which is direct water consumption.
And actually they have very little direct carbon emission, almost zero for data centers, unless you do some diesel generation testing for a couple of hours each year. There's no carbon emission directly. So all the carbon for data centers is basically for generating electricity and also for making the servers or constructing the data centers.
But when it comes to water, tech companies only focus on direct water consumption with a few exceptions. Some companies do look at the indirect water consumption for their electricity generation, but that's very few.
Angwin: Yeah, it seems like there's a lot of variety in the reporting and level of reporting that they offer about this. Did you hear any feedback from the companies about whether your numbers were right or wrong or whether they disputed any of this?
Ren: No. I'm confident in the methodology and I'm transparent about all the numbers that we use. I trust my numbers and, it has been peer reviewed, and it will be published in communication of the ACM, in a few months.
Angwin: Every study has limitations. So what are the limitations you have for your findings that we should keep in mind when thinking about them.
Ren: We only look at the annualized averaged values. In the summer, it could be a lot higher than the number that we present. And during the winter time it could be lower.
Also we are assuming each conversation is about 200 to 300 words or less but, in reality, it could be shorter or longer. So there is a lot of variety that we're not taking into account.
So basically we're just averaging out those factors. And if we could get more fine grained data, then we can refine our estimates to make it more accurate.
Angwin: One thing worth noting is the companies provide no information about what types of queries they receive, how many, you know. So there's really no information about how AI is being used, which makes it difficult to do these types of studies.
Ren: Yeah, and if you search for flights or for getting the directions from point A to point B, some companies show you the carbon emissions or the carbon emissions you saved. But when you look at their AI products or services, there's no such information. Not for water, not for carbon, not for energy.
Angwin: What would you want more of from the companies to do better analysis?
Ren: We need more data from the companies so that we can help them improve their water efficiency. Universities have a large talent pool, and we are happy to help the technology industry to improve their water efficiency, to improve their energy efficiency. If they give us some more data we can really try our ideas instead of just doing some simulation in the playground.
Also, I think they could just keep the users informed because eventually it's the users who are using the services. But somehow this, you know, companies want users to use more of their services. So if they let users know their environmental impact of using the models, there could be some conflict with their business growth that might be a concern.
But I truly believe that if you have that transparency, then the users are making more informed decisions. This can help drive efficiency. About 15 years ago, no company disclosed their power, which is the energy efficiency measurement for the data center facilities. But now every company discloses it. This competition turns out to be good because it drives those companies and motivates them to become more efficient.
Angwin: So what you're saying is basically, if they had to put on each search query like how much water you're using in this response and this question, then they would want to compete to make that a better number.
Ren: Yeah.
Angwin: Has this changed your use of AI? Do you feel like it's the same as, like, letting the water run?
Ren: Of course, water is one factor. But even, besides water I tend to use fewer searches, fewer AI models as much as I can because, for example, if I'm doing some, trying to search for some information, I usually use search because that's a lot less energy consumption compared to using the AI models.
Angwin: Thank you very much. Is there anything you think you should suggest to people who use AI about what they should think about, whether they should take this water usage into consideration, or do you think it's something we should leave it to the companies to try to solve.
Ren: As I said, users do have a role. But we shouldn't be scared. We just need to be mindful about our usage. Let's not waste it.
Angwin: Thank you very much. This is really helpful and I really appreciate your contribution, because without your work, we wouldn't have known any of this.
Ren: Thank you. My pleasure.