Evaluating the Costs of Generative AI (2024)

Evaluating the Costs of Generative AI (1)

Introduction

Generative AI is driving a transformative era, promising to revolutionize everything from creative tasks such as new product development, to customer engagement and operational efficiency (Eloundou et al. 2023). It can synthesize vast datasets to provide actionable insights, automate routine tasks, and support decision-making with unprecedented accuracy. For instance, one study on GitHub Copilot, a generative AI-powered coding assistant, revealed that developers using AI implemented tasks 55.8% faster than those who did not (Dohmke et al. 2023). Another study found that generative AI significantly boosts the performance of lower-performing individuals, reducing performance disparities and democratizing access to high-quality outputs (Dell'Acqua et al. 2023).

There are clear benefits to building a proprietary generative AI model from scratch. These include improved data protection over sensitive data, while using trusted and verified sources. Higher performance because of the use of proprietary data to fine tune the model to specific tasks, while also having better control over infrastructure response times, processing and scaling decisions. Finally, there are strategic benefits because of proprietary data ownership and the development of improved data analytics capabilities that enables organizations to anticipate market trends and personalise their customer experience with renewed business models (IoD and LBS 2024).

However, building a generative AI model comes with significant costs. Training general-purpose generative AI models such as OpenAI’s GPT-3 requires substantial computational power and access to vast amounts of data, which can be prohibitively expensive for many organizations. For instance, the training of GPT-3 by OpenAI required 175 billion parameters to train and tens of millions of dollars in compute costs alone. GPT-3 required “45TB of compressed plaintext before filtering and 570GB after filtering, roughly equivalent to 400 billion byte-pair-encoded tokens” (Brown et al., 2020).

Organizations seeking to build a proprietary generative AI model from scratch would likewise have to make significant investments. For example, Bloomberg used a 700 billion token corpus of data (363 billion token dataset consisting of English financial documents, plus a 345 billion token public dataset) to train a 50-billion parameter model (Bloomberg Professional Services 2023). To put this into perspective GPT-4-o is currently priced at $5 per 1million tokens you input and $15 per 1million tokens you output (https://openai.com/pricing). So, based on those prices 700 billion tokens would cost approximately $14 million for input-output pairs.

While the Bloomberg model was found to “outperform existing open models of a similar size on financial tasks by large margins, while still performing on par or better on general NLP benchmarks” (Bloomberg Professional Services 2023), not all organizations will have the resources (including infrastructure, data, and human expertise and capabilities) to build a proprietary generative AI model from scratch. Most organizations would depend on third party provider solutions such as those offered by Open AI, Microsoft and Google. In fact, a recent global report of chief executives found that 78% of organizations rely on third party AI systems, most notably those provided by big tech companies (Renieris et al., 2023). This creates uneven competitive dynamics (IoD and LBS 2024), as more resourceful companies – those with better access to compute power, proprietary data resources and robust and resilient infrastructures – will gain increasingly more market power over their competitors.

Indeed, companies like Microsoft are increasingly integrating hardware (Neural Processing Unit - NPU) with software (Windows + Copilot) offers quantum growth in processing power (40 trillion operations per second) enough to make a personal device run multiple AI models and apps concurrently. They boast that “90% of minutes spent on apps will be native”. Generative AI is no longer a cloud option only; with this move Generative AI becomes an extension of edge computing and personal devices. For Microsoft users (especially enterprise ones) this becomes the default option, through high switching costs and lock-in effects.

So how should organizations ride the wave of digital transformation with generative AI? This article offers a detailed review of the costs involved in building or buying generative AI models within organizations and considers the strategic choices leaders should take to navigate this complex landscape.

Building Generative AI Models from Scratch:
Infrastructure, Data and Human Resource Cost Considerations

Infrastructure Costs

A generative AI model would typically require a GPU memory of twice the number of the model’s parameters.For instance, the Bloomberg model with50 billionparameters would need approximately 100GB of GPU space. However, when prioritizing high availability and efficient parallelization, relying on a single GPU is not optimal. In such cases, a configuration comprising of multiple GPU is recommended to fully leverage parallel processing capabilities. In fact, the Bloomberg model was trained on 64 GPU clusters, each with eight Nvidia A100 GPUs (40GB variants), all sitting on Amazon’s AWS cloud. Each cluster cost approximately $10.32 per hour of use for a total of 1.3 million hours of GPU time (this would be divided by 512 [64 GPU clusters of 8 GPUs]). By comparison OpenAI used 25,000 Nvidia A100 GPUs. Servers with these GPUs use about 6.5 kW each, resulting in an estimated 50 GWh of energy usage during training, that amounts to approximately $1 per A100 GPU hour cloud costs (also see Schwartz et al., 2020; Strubell et al., 2019). This is just the infrastructure cost.

Data Costs

In addition, and more importantly, organizations need to decide on the dataset they are going to use to train the model, preprocess and augment the data and then classify the data into different categories such as training, validation and testing. Data preprocessing is the process of cleaning, transforming, and standardizing the data, such as removing outliers, missing values, or noise, encoding categorical variables, scaling numerical variables, or applying feature selection or extraction. Data augmentation is the process of creating new or synthetic data from existing data, such as flipping, rotating, cropping, or adding noise to images, or generating new sentences from existing text. Data preprocessing and augmentation can help to enhance the performance and robustness of a generative AI model and reduce the risk of overfitting or underfitting. Once these preparatory stages are performed, organizations need to classify data into training, validation and testing categories:

The training dataset represents the diversity and complexity found in the real world to help the model learn the right patterns while minimizing errors or loss function.
General purpose models are meant to cover many domains and to perform at a high level across a wide variety of tasks, from writing to software development, to illustration and statistical analysis, among others (Eloundou et al., 2023). As such, they require vast datasets from many different domains usually found online (e.g., the Colossal Clean Crawled Corpus, the Pile, Wikipedia and Reddit). The challenge of using such vast datasets from the web is that (a) much of the data will be generic and not domain or task specific, and (b) they will neither be representative of diverse communities, perspectives and beliefs, but rather they will be filled with biases (Bender et al., 2021). Thus, data preprocessing and augmentation would require significantly higher human effort, time and cost.
In contrast, smaller and more domain specific models like Bloomberg’s GPT rely on more reliable sources that are curated by domain experts (Wu et al., 2023).
See Also
Stanford’s top disinformation research group collapses under pressure Dr. Layla Bahar Al-Aloom on LinkedIn: #arabic #languagelearning #foodiefun #education #culture Troubleshoot problems with Microsoft Authenticator LLM Knowledge Graph Builder: From Zero to GraphRAG in Five Minutes
The validation & testing dataset measures the accuracy of the model against unseen data. Validation is usually done by comparing the performance of the model against other models or benchmarks.
For example, for general-purpose models like OpenAI’s GPT-4 a popular benchmark is the AI2 Reasoning Challenge (ARC) that utilizes a collection of 7,787 science exam questions in English multiple-choice format and plays a significant role in assessing the models' reasoning capabilities. Once again, there are problems with such generic benchmarks, one being that they are only measuring reasoning capabilities in English.
In contrast, the Bloomberg model was validated on domain specific financial benchmarks, as well as a suite of Bloomberg-internal benchmarks that most accurately reflected the intended use cases (Wu et al., 2023).

Although, general purpose generative AI models require far more resources, time and effort to prepare and augment the appropriate datasets, domain specific models would also undergo the same data cleaning process prior to training, validating and testing the model. In domains that are highly regulated, and which deal with very sensitive datasets, the cost of data preparation, training, validation and testing would increase. In the healthcare sector, the costs are particularly high due to the need for specialized data and compliance with stringent regulations. For instance, developing a generative AI model for diagnostic purposes requires extensive data from medical records, imaging, and other sources, which must be anonymized and processed to ensure patient privacy (Constantinides 2023). The cost of proprietary data acquisition would go up in such highly regulated sectors because of stringent policies on accessing and processing personal data such as GDPR and HIPAA. Complying with these policies would necessitate collaboration across multidisciplinary teams, including data scientists, domain experts, and legal advisors. The costs would not involve just financial but also significant time investments and organizational coordination.

Human Resource Costs

Finally, while generative AI models are few shot learning models, that is, they have the capability to learn about contextual nuances from only a few cases, human code labelling, fine tuning and prompting are required to correct for errors, omissions and biases (Austin et al., 2021). For example, OpenAI used hundreds of experts from different domains to train GPT-4. The process involved prompting the model using a sample from the training dataset. Then a human labeller would demonstrate the desired output to the model, before then fine tuning the model with more supervised learning. This process would use rewards to optimize reinforcement learning with humans in the loop. This cycle would be repeated until the model achieved expected performance levels. The costs of hiring technical experts (or the equivalent of using existing human resources) to train a generative AI model would vary depending on the size of the model, the dataset and the target use cases, but it is a significant dimension to the overall cost, nonetheless. Additionally, staff training would be necessary to ensure effective use and maintenance of the generative AI system.

Buying Open-Source vs Proprietary Generative AI Models

The other option would be to buy (license) a general-purpose generative AI model and then customize it according to organizational needs, enterprise systems and use cases. The buy option includes both open-source and proprietary generative AI models.

Proprietary Generative AI Models

More recently, PWC announced that it will provide OpenAI’s ChatGPT Enterprise to its 75,000 U.S. employees and 26,000 UK employees. Similarly, the Wharton School of the University of Pennsylvania announced that it will offer ChatGPT Enterprise licenses to MBA students, as well as its faculty. Individual, monthly subscriptions to GPT-4-o are priced at $20, with enterprise subscriptions estimated at double that amount with a minimum number of users required on a 12-month contract to account for extra security features, dedicated support from Open AI and other controls for fine tuning and tool management^[1].

Microsoft has also made a few announcements for rolling out Microsoft 365 with Copilot for both enterprises and universities. Licenses are priced at $30 per user per month with a prerequisite license of Microsoft 365, totalling approximately $60 per user per month. The difference with Microsoft 365 Copilot and other proprietary models is that Microsoft makes use of proprietary enterprise data from the use of the Microsoft 365 suite of applications, including Word, Excel, and Teams. Microsoft claims that it maintains strict data privacy standards, ensuring customer data is not utilized for training Copilot’s models.

Proprietary models benefit from the substantial financial and technical resources and expert support provided by the model creators and providers. These resources can be fed into extensive research, development, and continuous improvement of the models. As a result, organizations can rely on a robust, well-supported solution for their generative AI projects. So, a price between $40-$60 per month per user is cheaper than building a generative AI model from scratch and having to worry about increasing infrastructure, data and human resource costs, as discussed earlier.

This will, however, depend on the number of users in an organization. Multinationals with hundreds of thousands of users like PWC will quickly bring the monthly cost to millions of dollars and may end up paying more than if building their own model, but also potentially becoming locked-in proprietary models for the years to come. With the source code being proprietary and inaccessible, customization and fine-tuning become challenging. This limitation can hinder the adaptability of the model to specific business needs.

More importantly, not having control over the model challenges any understanding over the model’s inner workings (e.g., weights and parameters). “In many cases, details such as the size, make-up and provenance of the training data, the number of parameters, and the resources required to train the model are also not shared publicly. In essence, then, such models can only be probed in a black-box manner, which may not be sufficient to meet the transparency requirements for stakeholders” (Liao and Vaughan 2023). Enforcing transparency on proprietary model creators and providers will only be possible through regulation such as the EU AI Act.

Open-Source Generative AI Models

An alternative approach would be to licence open-source generative AI models. Some examples include Meta’s LLaMa 3 trained on 70 billion parameters with over 15 trillion tokens of publicly available data spanning various domains, including code, historical knowledge and multiple languages. Another one is BLOOM developed by the BigScience collaborative initiative on 176-billion parameters and using 1.6TB of data spanning 46 natural languages and 13 programming languages. Yet another one is Mistral AI that boasts three different open-source models one built on 7 billion parameters, another on 8 x 7 billion, and the most powerful model built on 8 x 22 billion parameters. Others include Google’s BERT, Falcon developed by the Technology Innovation Institute of the United Arab Emirates, and others. Companies like Databricks, Hugging Face, Mendix and Service Now provide many of these and other open-source models, while also helping organizations to fine tune the models to domain-specific use cases.

Open-source generative AI models offer many advantages over the proprietary models discussed above including better control and flexibility to modify the models to customize it to specific organizational needs and use cases. The open nature of these models means that their underlying architecture and weight parameters are accessible, thus, enabling organizations to fine tune them to domain-specific requirements. Open-source generative AI models often benefit from the collective wisdom and expertise of a large and diverse developer community. This community can provide valuable insights, enhancements, and updates to the model. This global open-source developer community is known for driving open innovation, collaboration and rapid adaptation to emerging technologies and trends. This enables organizations to stay ahead of the curve. Further, open-source generative AI models provide more transparency into their inner workings. This allows organizations to build trust with their customers by demonstrating that the AI systems they deploy are accountable and explainable, while ensuring ethical and regulatory compliance.

At the same time, this global, open-source developer community may not always have the same level of resources as large corporations developing proprietary models. There is less coordination to meet the business objectives of specific organizations which can lead to limitations in terms of research and development, but also support for applying and operating open-source models. The development and improvement of open-source models rely heavily on community contributions, however, these contributions are not always consistent and reliable, which could be a challenge for organizations with specific deadlines or stringent requirements.

Furthermore, although the cost of fine tuning an open-source generative AI model may initially look more attractive than doing the same with a proprietary model, it can accumulate and even spiral out of control. There are no license fees to using these open-source models, however organizations will have to still bear the infrastructure, data and human resource costs as described earlier. Depending on the size of the model, similar GPU and cloud costs would apply. For example, an organization partnering with Databricks to deploy the LLaMa 3 70B on AWS cloud would incur slightly higher costs to those incurred by Bloomberg that developed their own proprietary model (the hourly pricing comes to $14.85). Smaller models would cost less, but once again, there are costs related to tokens of input-output data pairs, fine tuning, integration with existing systems and user training.

A Hybrid Approach using an API Orchestration & App Development Framework

Recent data from startups to large public enterprises show that most organizations use a combination of both open-source and proprietary models, with some being general-purpose generative AI applications like GPT-4-o and others being more domain specific. For example, Matt Baker, Senior Vice President of AI Strategy at Dell, says that organizations are beginning to realize that general-purpose proprietary models like GPT-4-o have little value for domain specific applications. However, when combined with open-source models like LLaMa 2 and domain-specific, proprietary data value tends to increase. Andrew Jardine, an exec at Hugging Face, also commented on how some use-cases such as customer support bots tend to work well with general-purpose proprietary models like GPT-4-o that do reasonably well across natural languages. However, other use cases that require sensitive customer data benefit more from open-source models like LLaMa 2.

API orchestration and app development frameworks such as LangChain and LlamaIndex are emerging as middleware, allowing organizations to call the best model for each task being performed, whether that is open-source or proprietary. On the one hand, LLamaIndex helps to combine data indexing and augmentation, like document search and content generation. On the other hand, LangChain helps to build robust, adaptable applications across domains, including text generation, translation, and summarization. Both are modular and scalable, enabling organizations to customize both open-source and proprietary models with their proprietary data. Still, most organizations would not know where to start and how to apply such a hybrid approach.

This is opening new opportunities for consulting companies like Accenture, IBM and McKinsey to act as service and system integrators, helping client organizations navigate this very complex landscape of technological solutions. Even enterprise system providers like SAP and Salesforce are acting as valued partners for organizations who have no knowledge or expertise on how to deploy generative AI applications across their business units. Although this only feeds into existing inequalities in the competitive dynamics for generative AI markets (IoD and LBS 2024), it is the default option for most organizations.

Conclusion and Recommendations

This article offers a comprehensive examination of the benefits and costs associated with building generative AI models within organizations. The analysis underscores the transformative potential of generative AI in enhancing operational efficiency, customer engagement, and creative tasks. While there are strategic advantages of building and buying generative AI models – both open-source and proprietary – there are substantial costs involved including computational power and data acquisition costs, human resource and integration costs. Most organizations do not have the expertise to navigate this complex landscape on their own and would most often rely on third parties for advice and support.

Recommendations for Leaders

The analysis of the costs of generative AI points at the need for better collaboration and coordination between organizations. No single organization can navigate thecomplexity of the emerging landscape of generative AI by itself. Complexity can be reduced, andrisks can be mitigated if organizations cansynergistically combine efforts while at the same time protecting their unique assets, most notably data assets.

Three long-term strategic decisions are proposed. First, organizations should spend time and resources now to build their own proprietary generative AI model to fit their domain-specific, business operations. In the short-term, this would be costly and would generate little return on investment. However, in the long-term, this would generate significant returns, especially as more data and use-cases become integrated with the core model. Such a proprietary generative AI model could be built on smaller parameter open-source models and then customized to unique requirements, while keeping control of proprietary organizational data.

Second, organizations should buy (license) general-purpose proprietary generative AI models like Microsoft Copilot, as part of their routine, generic tasks such as running meetings, writing reports etc. The benefit of doing so is that, such general-purpose models leverage data from multiple similar organizations and generate optimizations at higher scale than a smaller, domain-specific model would do. Any data insights generated from these routine, generic tasks could be used to develop proprietary datasets that could then be fed into the customized model.

Third, an orchestration framework for integrating the buy and build decisions should be developed. Once again, this would require long-term thinking, including creating a pathway for generating data on the general-purpose model as outputs and then leveraging that data as inputs for the proprietary model, while also ensuring interoperability between existing systems.

These three long-term strategic decisions require several actions:

1. Building and Protecting Proprietary Datasets

Organizations should prioritize the creation of proprietary datasets that are relevant and uniquely beneficial to their strategic needs. Protecting these datasets involves implementing robust cybersecurity measures, encrypting data, and establishing strict access controls. Intellectual property rights should be rigorously defined and enforced to safeguard against competitive threats. Building a data strategy that defines which data should remain under the strict control of the organization and which can be shared with partners should be a priority for all organizations.

2. Forming Strategic Partnerships

To mitigate the high costs associated with building generative AI models, but also to avoid default options, organizations should seek partnerships with other firms in their sector and with technology providers. These partnerships can share the financial burden of model development and provide access to complementary skills and technologies. Collaboration agreements should emphasize shared goals, equitable sharing of costs, benefits, and clearly defined roles and responsibilities. Once again, data sharing agreements should be clearly defined in such partnerships in the context of ecosystem-wide joint value propositions.

3. Developing Digital Governance Frameworks

It is crucial for organizations to develop digital governance frameworks that facilitate the integration of generative AI with existing systems and operations. These frameworks should ensure that AI deployments are compliant with organizational policies and external regulations. Additionally, they should promote ethical AI use, ensuring models do not perpetuate bias or make opaque decisions.

4. Recruitment and Reskilling

As generative AI changes the skill requirements within the workforce, organizations must both attract new talent skilled in AI and data science and reskill existing employees to work alongside AI systems. Investment in continuous learning and development will be key, including workshops, courses, and hands-on projects to ensure staff remain at the cutting edge of AI technology and its applications. Building both short-term and long-term digital career pathways will provide the appropriate incentives for the workforce to keep engaging in innovative activities and new task development with generative AI models.

5. Building Ethical and Legal Frameworks

Organizations must establish frameworks that address the ethical implications of AI and ensure compliance with international data protection laws, such as GDPR, as well as the EU AI Act. This involves setting up oversight bodies to audit AI models regularly, developing clear usage policies, and maintaining transparency with all stakeholders about how AI systems operate and make decisions. This transparency will build trust among users and customers, ensuring the ethical deployment of AI technologies.

By focusing on these strategies, leaders can harness the full potential of generative AI to drive innovation and maintain competitive advantage while navigating the complexities of ethical, legal, and operational challenges.

References

Austin, J., Odena, A., Nye, M., Bosma, M., Michalewski, H., Dohan, D., Jiang, E., Cai, C., Terry, M., Le, Q. and Sutton, C., 2021. Program synthesis with large language models https://arxiv.org/abs/2108.07732

Bender, E.M., Gebru, T., McMillan-Major, A. and Shmitchell, S., 2021, March. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610-623. https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

Bloomberg Professional Services 2023. Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance. https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and Agarwal, S., 2020. Language models are few-shot learners.Advances in neural information processing systems,33, pp.1877-1901. http://arxiv.org/pdf/2005.14165

Constantinides, P., 2023.Digital transformation in healthcare: an ecosystem approach. Taylor & Francis.https://www.amazon.co.uk/Digital-Transformation-Healthcare-Ecosystem-Approach/dp/1032171111

Dell'Acqua, F., McFowland, E., Mollick, E.R., Lifsh*tz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F. and Lakhani, K.R., 2023. Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Technology & Operations Mgt. Unit Working Paper, 24-013. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4573321

Dohmke, T., Iansiti, M. and Richards, G., 2023. Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle. http://arxiv.org/pdf/2306.15033

Eloundou, T., Manning, S., Mishkin, P. and Rock, D., 2023. GPTs are GPTs: An early look at the labor market impact potential of large language models. http://arxiv.org/pdf/2303.10130

IoD and LBS. 2024. Assessing the expected impact of generative AI on the UK competitive landscape.Institute of Directors and London Business School Policy Paper. https://www.iod.com/app/uploads/2024/05/IoD-LBS-Policy-Paper-Assessing-the-expected-impact-of-generative-AI-on-the-UK-competitive-landscape-90514166d3cf6e8f4ee9211073a9ae30.pdf

Liao, Q.V. and Vaughan, J.W., 2023. AI transparency in the age of LLMs: A human-centered research roadmap.https://arxiv.org/abs/2306.01941

Renieris, E., Kiron, D., and Mills, S. (2023). Building Robust RAI Programs as Third-Party AI Tools Proliferate. MIT Sloan Management Review and Boston Consulting Group, June 2023. https://sloanreview.mit.edu/projects/building-robust-rai-programs-as-third-party-ai-tools-proliferate/

Strubell, E., Ganesh, A. and McCallum, A., 2019. Energy and policy considerations for deep learning in NLP.https://arxiv.org/abs/1906.02243

Schwartz, R., Dodge, J., Noah, A.S., Etzioni, O. 2020. Green AI. Communications of the ACM,63(12), pp.54-63. https://dl.acm.org/doi/pdf/10.1145/3381831

Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D. and Mann, G., 2023. Bloomberggpt: A large language model for finance. https://arxiv.org/html/2303.17564v3

^[1]These are estimates based on information available online, but enterprise subscriptions are quoted on a case-by-case basis. https://openai.com/chatgpt/enterprise/