Reimagining the Data Center for the Age of Generative AI

Data

The field of artificial intelligence (AI) has witnessed significant advancements in recent years, with the rise of generative AI technology capturing the attention of enterprises across various industries. One notable example is ChatGPT, a chatbot built on OpenAI’s GPT series of large language models (LLMs). This powerful chatbot has revolutionized content generation, enabling users to obtain answers to complex questions and automate tasks such as software code writing and marketing copy production. However, effectively harnessing the potential of generative AI technology within data centers poses unique challenges. In this article, we will explore how organizations can meet the demands of generative AI technology and reimagine their data centers.

Building a large language model (LLM), like GPT-3 or GPT-4, is a resource-intensive process that involves multiple steps. The initial phase of training these models requires substantial computing power, often necessitating the use of For several weeks or even months, computers in data centers were home to hundreds or even thousands of pricey GPUs stacked together. For instance, the training time for the BLOOM model, an open-source alternative to GPT-3 with 176 billion parameters, on a 384-GPU cluster was 117 days or around 120 GPU years. In order to train and retrain the model, more GPUs are needed as the model grows in size.

Another illustration of the computing demands associated with large-scale language models is Google’s 540 billion-parameter PaLM model. Google used 6,144 chips to train this model, highlighting the large hardware investment necessary. Additionally, firms might not have access to experts in cutting-edge training methods and tools, such as Microsoft DeepSpeed and Nvidia MegaTron-LM.

After the model has been trained, performing inference on it requires the continuing use of GPUs, which raises the cost. Consider the case of using only 500 of the $199,000 Nvidia DGX A100 multi-GPU systems, which are frequently used for LLM training and inference. A project like that would cost about $100 million to complete. The total cost of ownership is also influenced by the power usage and thermal output of these servers. Companies that are not AI-focused enterprises but want to employ LLMs to speed particular business use cases have a huge difficulty due to the significant investment needed for data center infrastructure.

Rather than building LLMs from scratch, organizations can adopt a more cost-effective approach by fine-tuning existing open-source LLMs to suit their specific use cases. Fine-tuning involves training the model on internal corporate data, such as proprietary documents and customer emails, to make it more relevant and attuned to the organization’s unique requirements.

For example, BloombergGPT, a 50-billion-parameter LLM developed by Bloomberg, demonstrates the potential of fine-tuning existing models. Bloomberg successfully trained this model from scratch, leveraging its extensive finance-related data. However, only a limited number of organizations possess comparable volumes of high-quality data, making fine-tuning a more viable option for most enterprises. The Hugging Face hub, a repository of open-source models, offers over 250,000 models for various natural language processing, computer vision, and audio tasks. Organizations can leverage these pre-existing models as a starting point for their projects, significantly reducing the time, budget, and effort required.

If an organization does decide to build an LLM from scratch, starting small and utilizing managed cloud infrastructure and machine learning (ML) services can be a more cost-effective and scalable option. Cloud-hosted ML infrastructure allows organizations to focus on developing the technology itself, rather than worrying about hardware provisioning and maintenance. As the architecture of the solution matures, transitioning to local hosting becomes a more feasible option.

In addition to Nvidia GPUs, cloud providers also give a variety of training choices from AMD, Intel, and customer-specific accelerators like Google TPU and AWS Trainium. Because of this versatility, businesses can choose the hardware that best suits their unique requirements. However, on-site deployment using accelerated technology, like GPUs, becomes the standard option when local laws or regulations forbid the usage of cloud services.

To support the successful adoption of LLMs, enterprises must participate in strategic planning before investing in GPUs, specialist expertise, or cloud partners. To create a clear plan, collaboration with technical decision-makers, business executives, and subject matter experts is essential. Understanding the business case for LLM adoption and identifying the present and foreseeable workload requirements should be the main goals of this partnership. Organizations can choose the right technology, use pre-existing models, and find the right partners for their AI journey by carefully evaluating these variables.

The rapidly evolving landscape of AI/ML necessitates a forward-thinking approach to technology adoption. Future-proofing solutions require a deep understanding of the technologies and hardware involved, as well as a realistic assessment of their potential benefits. Organizations should avoid getting caught up in the hype and instead invest time in comprehending the technologies in question. Working with stakeholders and subject matter experts, they can identify the areas where integration with AI technologies can yield tangible benefits.

Generative AI technology, exemplified by the rise of ChatGPT, has opened up new possibilities for content generation and automation. However, harnessing the potential of generative AI in data centers requires careful consideration of cost, resources, and strategic planning. Building LLMs from scratch can be a costly affair, necessitating significant investments in computing resources and expertise. In contrast, fine-tuning existing open-source LLMs offers a more practical and cost-effective approach for most organizations. Leveraging managed cloud infrastructure and ML services provides scalability and flexibility, while strategic planning ensures informed decision-making and successful integration of LLMs into business processes. By reimagining their data centers and adopting a forward-thinking approach, organizations can unlock the full potential of generative AI and drive innovation in their respective industries.

First reported by VentureBeat.