“AI datacenters will be built next to energy production sites that can produce gigawatt-scale, low-cost, low-emission electricity continuously. Basically, next to nuclear power plants. The advantage is that there is no need for expensive and wasteful long-distance distribution infrastructure.“ /@Yann LeCun/
Securitiex on Substack is a Darwinex Zero Affiliate Partner

Join Darwinex Zero and transform your strategy into an investable index with certified track record (use the code DZ20OFF for 20% off): DZ20OFF
<> rumor
Broadcom, TSMC Weigh Possible Intel Deals That Would Split Storied Chip Maker
https://www.wsj.com/tech/broadcom-tsmc-eye-possible-intel-deals-that-would-split-storied-chip-maker-966b143b : [excerpt] Broadcom has interest in Intel’s chip-design business, while TSMC is looking at the company’s factories
☑️ #44 Feb 17, 2025
The problem is money can't save Intel
@LivyResearch: Here are some of my preliminary thoughts on the $INTC $TSM rumors:
1. What would incentive TSMC to invest in Intel? Trump + Tariffs.
Intel's technology and capacity on the whole remains inferior to TSMC's, so IFS isn't really needed in the first place. Management was clear on this when asked:
"Are we interested to acquire one of IDM's fab? The answer is no. Okay. No, not at all." - C.C. Wei
Intel's American capacity isn't going to help TSMC much either in terms of derisking from geopolitical tension exposure.
Here's why -- TSMC has 4nm in Arizona, and 3nm there coming online soon. More importantly, TSMC is one of few (if not the only) scaled foundries in the world to have a competitive advantage in being able to retool its existing equipment rapidly into the most advanced nodes. Retooling 5nm to 3nm's exactly what it's been doing in Taiwan. Not only has this helped TSMC scale new tech deployments rapidly and address incoming demand, but it's also allowed it to operate an optimized ROI schedule.
And why this is important now? It's because the Taiwanese gov had just approved TSMC to build its most advanced tech overseas (this was previously prohibited). That means Arizona capacity can be rapidly retooled. Instead of taking 4-6yrs for green/brownfield investments, TSMC could get to 2nm in a significantly lesser time.
Hence, IFS doesn't really appeal to TSMC at all. Not in terms of tech, nor in terms of capacity, since leveraging/retooling them to address TSMC's current customers' needs isn't going to be any easier/cheaper than converting Arizona and building more LT capacity outside of Taiwan.
What TSMC does need and likely can't immediately address is a solution to bypass impending tariffs.
Details on what Trump's plans on this end are remain slim. But perhaps the administration's trying to get TSMC in on IFS as a requirement to bypass them. This is really the only thing I could think of that'd incentive TSMC to invest a mere 20% stake in Intel today. Otherwise, TSMC would likely want to control Intel instead, since it'd minimize risks of ceding LT tech leadership/market share to IFS; or alternatively, take on the risk of tariffs and pass them onto end customers (since it has pricing power) instead of the challenge that is turning around/managing IFS.
2. The US needs TSMC If money can solve IFS and the US' current national security concerns in terms of advanced tech, then we probably won't be hearing these TSMC investment rumors.
The problem is money can't save Intel.
What the US needs is homegrown advanced chip manufacturing tech and capacity right now. Intel doesn't have that (18A's external manufacturing capabilities still TBD; advanced packaging still an inferior choice over TSMC CoWoS).
TSMC can make up for the manufacturing tech part, even though the majority of its capacity's still in Taiwan (which is exposed to geopolitical risks). So there's incentive for the Trump admin to get TSMC to invest in Intel and give US access to its tech.
But again, why would TSMC share the secret recipe to its business' success with the US/Intel for just 20% in IFS? Even if Trump's really planning aggressive tariffs on the company, TSMC's industry leadership by wide margins should have enough leverage to share these costs with its customers counting $NVDA $AMD and others...At the minimum, TSMC would need a merger with IFS. That way it secures itself as an American/US-protected company, effectively removing the geopolitical risk overhang. And the US would gain access to advanced manufacturing tech -- this would perhaps be the better win-win situation.
3. Any Intel buyout/investment isn't going to be as value-appreciative as you think
There isn't much in IFS for TSMC that it doesn't already have. IFS' lifetime deal value couldn't compare to what TSMC already has. And neither will 18A and Intel's advanced packaging capabilities drive material synergies for TSMC.
As a result, TSMC's unlikely to pay a premium for IFS. If a deal does move forward, it'd likely benefit TSMC investors more than Intel's. A TSMC-Intel tie-up could help it derisk from the current geopolitical overhang, and unlock significant pent-up value for the stock (and maybe even incentive Buffet to come back?).
⚡️
@satna79: Values your views. It is not clear if $TSMC is clearly superior to $INTC though. See this post from an Intel engineer: https://linkedin.com/posts/josephbonetti_recent-articles-about-a-possible-intel-foundry-activity-7297014118887079936-aZ9G Intel lacks two things: capital and external customers for running a profitable foundry business. Manufacturing is expensive to run in US compared to Taiwan. That is where US gov subsidies and tariffs could make a difference.
⚡️
@satna79: I meant $tsmc tech may not be clearly superior to $intc, especially given progress in 18A. The above posts also talks about $intc's headstart with the next technology using $asml's latest tools
⚡️
@LivyResearch: It’s been an ongoing debate on whether $INTC 18A is the leading edge node. Intel claims it is, but $TSM thinks the superiority only applies to internal Intel products. Either way, my point goes beyond that. As a whole, TSMC’s manufacturing nodes and advanced packaging capabilities still lead. It’s more than just having the most advanced node, but instead the entire package and scalability that makes TSMC the superior choice.
+ Related content:
@LivyResearch: The 20% stake would imply significant influence instead of control, and assuage fears that a $TSM $INTC tie-up would contradict national security interests. Yet it'd allow the US to tap into TSMC's tech advancements and some of the capacity -- which is something that just by giving money to Intel today can't build. But just significant influence likely won't suffice TSMC's appetite. What's in TSMC's interest is protection from geopolitical risks -- the company's biggest turnoff. If they were to partner up with ownership involved, both would likely have to look into a merger.
@dnystedt: TSMC may end up holding a 20% stake in Intel’s foundry business in a cash or technology deal in which a consortium of companies, possibly including Qualcomm, Broadcom and other US chip design giants, join to inject capital into Intel, Taiwan media report, citing foreign media and unnamed supply chain sources. $INTC $TSM #semiconductors #semiconductor https://money.udn.com/money/story/5612/8552308
wsj.com (2/15/25): [Excerpt] Broadcom, TSMC Weigh Possible Intel Deals That Would Split Storied Chip Maker. Broadcom has interest in Intel’s chip-design business, while TSMC is looking at the company’s factories.
Intel’s INTC -2.20%decrease; red down pointing triangle rivals Taiwan Semiconductor Manufacturing Co. and Broadcom AVGO -1.17%decrease; red down pointing triangle are each eyeing potential deals that would break the American chip-making icon in two.
Broadcom has been closely examining Intel’s chip-design and marketing business, according to people familiar with the matter. It has informally discussed with its advisers making a bid but would likely only do so if it finds a partner for Intel’s manufacturing business, the people said.
🙂
☑️ #43 Feb 14, 2025
Is an ASIC cheaper?
@Jukanlosreve: Morgan Stanley: Despite ASICs, Nvidia will continue to maintain a dominant market share.
The ASIC category is neither superior nor inferior to commercial GPUs—it is simply another means to achieve the same result.
Over the past six months, momentum in the AI sector has clearly shifted toward custom silicon. Nvidia has remained stagnant while AMD has performed sharply poorly. Nvidia’s $3 trillion market capitalization is supported by over $32 billion in quarterly AI revenue, whereas AVGO’s $1.1 trillion market cap is based on $3.2 billion in quarterly revenue. Clearly, the market has judged that the growth potential of ASICs will be several multiples that of commercial GPUs.
Will ASICs outperform commercial solutions in the long term? While there are various possible outcomes, our view is that—unless something changes—the incumbent, namely Nvidia, will continue to hold a dominant market share.
The ASIC category is neither superior nor inferior to commercial GPUs. It is merely an alternative means to achieve the same result. We would evaluate an ASIC the same way we evaluate a chip from AMD, a chip from Intel, or any chip from a startup—in the context of the price/performance of that silicon relative to the value the incumbent (primarily Nvidia) can deliver.
The development budget for an ASIC is typically less than $1 billion, and in some cases, much lower. This contrasts with our assumption that Nvidia will invest approximately $16 billion in R&D this year alone. With that funding, Nvidia can maintain a 4–5‑year development cycle by running three design teams sequentially—each with an 18–24‑month architectural cadence—delivering innovation over a five‑year span. In addition, they invest billions in interconnect technologies to boost rack‑scale and cluster‑scale performance, and by being present in every cloud worldwide (subject to U.S. Department of Commerce approval), any investment in improving the Nvidia ecosystem propagates throughout the global ecosystem.
We have examined several theories regarding whether ASICs can outperform commercial solutions. Could a custom chip be superior because it is better suited for a narrower set of applications? Sometimes, yes—and in such cases, it becomes one of the most compelling use cases for customization. The clearest example is Google’s successful TPU tensor processor. Google invented modern LLM transformer technology and directed Broadcom to develop a chip optimized for that technology, while at the time Nvidia was developing GPUs optimized for vision models using convolutional neural networks. To date, the TPU is the most prominent example of a cloud customer gaining a clear advantage through a customized solution, resulting in over $8 billion in TPU revenue for AVGO 0.00%↑ .
However, an important point is that Nvidia is expanding its share of Google’s spending in 2025. Part of this is attributable to investments in the cloud sector, where commercial products tend to outperform ASICs. Yet the customization aspect of the TPU may be less valuable than it once was. Nvidia is also optimizing for transformer models, and in fact, the largest training and inference clusters are not currently highly customized. Instead, the greater benefits of customization going forward seem more likely to manifest in legacy workloads. While Nvidia’s focus on training multimodal AGI models may be overkill for some older applications, it will be difficult to outperform Nvidia when it comes to delivering high‑end training capabilities. In particular, for Google, while purchases of Nvidia products are expected to roughly double this year, TPU growth is anticipated to be more modest—partly due to Google’s investments in enterprise cloud, but also because Nvidia’s LLM transformer performance is exceptionally strong even for internal workloads.
Is an ASIC cheaper? This is the most frequently heard argument. Yes, a given ASIC may not reach the performance levels of the H100 until three years after its debut, but it costs only $3,000, whereas the H100 is priced at $20,000—resulting in a lower total cost of ownership. But does this make sense? Is it really that easy to compete with Nvidia merely by producing a $3,000 AI chip? We have seen countless startups, priced only partially at Nvidia’s level, fail to gain a foothold in this space. Moreover, Intel has struggled for over a decade—even after acquiring companies that were already shipping—while pricing below cost. AMD’s first few generations were non-starters until they broke through with the MI300 last year. If a $3,000 chip were all that was needed, why wouldn’t every competitor simply produce a $3,000 chip?
Meanwhile, Nvidia has introduced several lower-priced chips aimed at legacy inference applications. Two years ago, we reported enthusiastically on the L4, L40, and other announcements. However, Nvidia has also discovered that there is a gravitational pull toward the most expensive cards.
Why is that? While the processor itself may be cheaper, system costs can be higher. The cluster cost for ASICs can be materially higher than that of Nvidia—which has built a 72‑GPU NVLINK domain using copper—because ASICs use more expensive optical technology. Other major cost components are at similar levels. High‑bandwidth memory costs are the same—and in fact, Nvidia is likely favored by its monopsony purchasing power for the latest HBM versions. The same applies to CoWoS; because many ASICs use smaller dies with larger stacks, the CoWoS cost can be higher than that of Nvidia. Of course, Nvidia’s wafer costs might be higher due to reticle‑limited dies, but overall, Nvidia delivers exceptional value.
Furthermore, the software aspect also places a burden on customers, as it is a challenging and time‑consuming task. Ease of use when managing software changes and running various workloads is crucial, and minimizing “software developer hours” is often an overlooked element in total cost of ownership (TCO) calculations. For instance, Databricks—a Trainium customer—anticipates that it will take “weeks or months to get the system up and running” (link reference). Deployment delays can put customers even further behind Nvidia, which is equipped with the widely used CUDA (Compute Unified Device Architecture) SDK.
Over the years, we have observed 20 to 25 alternatives to Nvidia’s products. Initially, there was enthusiasm based on price and potential performance that drove early deployments, but ultimately customers gravitated back to Nvidia, which has the most mature ecosystem. Consequently, alternative products are often put on hold or sometimes disappear from the market altogether. TPU and Trainium, as well as AMD’s MI300, are notable exceptions—but even in those cases, strong investments in Nvidia’s ecosystem continue this year.
This is not to say that cheaper processors are entirely irrelevant; rather, historically, cheaper processors have often failed to garner as much market attention as initially expected.
Could the ASIC relationship give ASIC vendors additional visibility that commercial products do not possess? We disagree. Every hyperscaler investing in ASIC technology intends to use it. ASIC development incurs significant costs because there is an expectation of returns. Therefore, there is a possibility that the SAM analysis is correct. However, the technology must ultimately be more cost‑effective than commercial solutions when delivered; if not, purchasing will often pivot back to the best‑in‑class commercial chips.
One exasperated cloud executive recently remarked, “Every two years, our ASIC team delivers technology that is 2–3 years behind Nvidia. It is economically not that useful.” Of course, this is not everyone’s view, nor is it the objective at the outset of the design process. However, it is a fairly common complaint among cloud vendors, who—even while considering it an investment in the future—are not entirely convinced that adopting inferior ASICs will form the foundation of a differentiated long‑term strategy.
Put another way—why does AVGO’s SAM analysis attract far more attention than AMD’s predictions of “tens of billions in annual revenue potential” for products like the MI400? AMD’s forecasts are also based on conversations with potential customers, yet investor responses are, “How can we be confident in AMD’s 2026 product (MI400) compared to Nvidia’s product (Rubin)?” Indeed, we cannot be sure. Nevertheless, for some reason, we expect that ASICs will be competitive with Rubin in 2026.
However, AMD’s investments should not be taken lightly. The scale of AMD’s investments across the ecosystem tends to far exceed that of ASIC vendors. This year, AMD has completed two acquisitions of AI software assets. One of these—the acquisition of ZT Systems—involved acquiring a major server ODM, divesting the ODM business while retaining key engineering talent related to rack‑ and cluster‑scale computing. Once in possession of such assets, AMD can deploy them across multiple cloud environments, which drives third‑party support and accelerates ecosystem development. Can ASIC designers within the cloud replicate this? Are ASIC chip suppliers willing to replicate it? It is possible—and their networking expertise may help—but it is by no means a given. AMD has secured dominant technology in the server ecosystem through such iterative investments, with the progression from the early struggles of Naples to Rome and Genoa serving as a success story (though Intel’s struggles have clearly contributed as well).
So why is it presumed that AMD will fail to meet its stretch goals while ASICs will succeed? In our view, AMD has just as much potential to achieve its SAM as ASIC competitors, and this is perceived by buyers as a “show-me” story. So why has ASIC not yet become a “show-me” story? In our perspective, ASICs should also be expected to deliver; this does not mean that either technology offers a guaranteed solution, but rather that the challenges of competing with Nvidia should not be taken lightly.
It is difficult to say that the ASIC relationship provides suppliers with a more fixed market share, as nearly every ASIC we know of faces direct competition. For example, MediaTek (as reported by Charlie Chan) is expected to challenge Broadcom within the Google TPU ecosystem, and Alchip (also reported by Charlie Chan) is anticipated to challenge Marvell within the Amazon (as reported by Brian Nowak) Trainium ecosystem. Recent reports indicate that Meta (also reported by Brian Nowak) is considering the acquisition of a Korean ASIC design—suggesting that its relationship with Broadcom is not exclusive—and Meta is also investing significantly in AMD. Bytedance is Broadcom’s third ASIC customer, but there is also the possibility that they have alternative ASIC designs from Chinese vendors in the pipeline due to export control issues. In fact, export controls on AI-related shipments to China have lowered the technological threshold to 80% of the state‑of‑the‑art, creating a highly open competitive environment. OpenAI could become a major ASIC user, but reports indicate that they are already working with multiple suppliers.
Nevertheless, Broadcom and Marvell possess enviable capabilities in this field that will undoubtedly drive growth. Success begets success, yet there are clearly challenges within the ASIC domain as well.
We expect that in 2025, Nvidia and AMD will outperform ASIC competitors—especially in the second half of the year. For 2024, Nvidia’s processor revenue is projected to be approximately $98 billion, AMD’s around $5 billion, AVGO’s about $8 billion, and the combined revenue of Alchip/MRVL roughly $2 billion. This implies that commercial silicon accounts for roughly 90% of the market share, while ASICs constitute about 10%. We expect the 90% share for commercial products to increase slightly this year.
The two major users of ASICs are likely to further increase the proportion of spending on Nvidia. AVGO’s largest customer is Google, which drives most of its 2024 revenue, and we expect that in the 2025 fiscal year, Nvidia will grow 50–100% more than TPU. Marvell is driven by Amazon, and while Amazon’s ASIC purchases are expected to double from approximately $2 billion to about $4 billion this year, Amazon’s purchases of Nvidia products are also projected to more than double beyond that level. Other ASIC customers, such as Meta and Bytedance, are still in the early stages. In our view, Nvidia’s revenue momentum in the second half will be significantly stronger than the builds from ASIC or AMD.
Beyond 2026, ASICs could grow much faster. However, this largely depends on the value delivered by ASICs versus what Nvidia or AMD can offer. According to AVGO’s SAM analysis, Google, Meta, and Bytedance could each achieve between $60 billion and $90 billion in revenue in the 2027 fiscal year with clusters numbering in the millions—which is entirely plausible, though that technology has yet to be delivered.
Yet, with the market discounting such significant market share gains for ASICs—for instance, Nvidia generates 10 times the AI revenue of AVGO this quarter while its market cap is only three times as high—doesn’t the onus fall on the ASIC side to prove its case? And if we consider the Blackwell ramp in the second half of 2025 as the major story, wouldn’t the market capitalization that shifted from Nvidia and AMD to AVGO/MRVL over the past six months swing back?
Does this contradict our earlier work on the ASIC opportunity? In the deep‑dive report on the ASIC opportunity led by Charlie Chan last year (link here), we forecasted that the total addressable market (TAM) for AI ASICs would grow from $12 billion in 2024 to $30 billion in 2027—a figure considerably lower than what is being discounted in the market now (for example, AVGO’s SAM for 2027, based mainly on ASICs for three out of five potential customers, amounts to $60–90 billion). Our view remains unchanged, but market expectations have shifted dramatically.
In particular, Charlie Chan maintains a bearish view on Asian peers such as Alchip and MediaTek, as they are securing projects at the expense of their U.S. competitors. Referring to the same report, Asian vendors like Alchip (with the AWS 3nm project) and MediaTek (with Google’s 3nm TPU v7 for training) are expected to expand their market share in 2026.
So what should we do regarding U.S. AI semiconductor stocks? Simply put, buy Nvidia. The stock remains our top pick and is currently trading at a significant discount compared to AVGO/MRVL. We believe that this discount will not last long. Nvidia’s biggest short‑term risk is U.S. export controls, which are equally problematic for AVGO. In the long term, the greatest risk is not competition but a slowdown in investment—which we forecast to occur around mid‑2026, though we continue to push that timeframe out as visibility remains strong.
Meanwhile, we continue to maintain a bearish view on AVGO while closely monitoring expectations. Frankly, feedback on this note will play a crucial role in shaping investor expectations for AVGO relative to the 2027 SAM analysis. This year, AVGO AI remains in a transitional phase—with one fully penetrated customer (Google) that has slowed due to a product transition and two new customers that are still small. The company’s three‑year SAM analysis was partly presented to explain the short‑term slowdown and the strong ongoing investment, but the trajectory—from $3.2 billion per quarter to $18 billion over 12 quarters—could raise short‑term expectations even further.
We are sidelining AMD, MRVL, ALAB, MU, etc. We are not fundamentally negative on any AI‑related stocks, but we do have concerns about short‑term momentum shifts—particularly as the multiple premiums for MRVL and AVGO far exceed those of Nvidia. While AMD’s valuation is becoming increasingly attractive, the MI350 does not seem revolutionary, and the MI400 remains somewhat of an unknown. We have confidence in the execution of both AMD and Nvidia. ALAB stock is so volatile that it is difficult to make a persistent valuation call, but at current levels, we maintain a constructive view. Management has indicated that Nvidia will be the strongest driver of demand in 202X, yet Nvidia’s data center—which returns twice the operating margins of its peers—continues to record the lowest EV-to-AI revenue ratio in our analysis.
Source: Morgan Stanley Research NVDA 0.00%↑ MRVL 0.00%↑ AVGO 0.00%↑
🙂
☑️ #42 Feb 6, 2025
The trend of increasing HBM payload of ASIC
@Jukanlosreve: HBM Capacity & Total Demand Outlook by AI Chip - Samsung Securities
+ Related content
sagaciousresearch.com (2021): [Excerpt] JEDEC adopted High Bandwidth Memory as an industry standard in October 2013. The 1st generation of HBM had four dies and two 128-bit channels per die, or 1,024-bits. Four stacks enable access to 16 GB of total memory and 4,096 bits of memory width, eight times that of a 512-bit GDDR5 memory interface for GDDR5. HBM with a four-die stack running at 500 MHz can produce more than 100 GB/sec of bandwidth per stack – much greater than 32-bit GDDR5 memory.
The 2nd generation, HBM2, was accepted by JEDEC in January 2016. It increases the signaling rate to 2 Gb/sec, and with the same 1,024-bit width on a stack. A package could drive 256 GB/sec per stack, with a maximum potential capacity of 64 GB at 8 GB per stack. HBM2E, the enhanced version of HBM2, increased the signaling rate to 2.5 Gb/sec per pin and up to 307 GB/sec of bandwidth/stack.
On January 27, 2022, JEDEC formally announced the 3rd generation HBM3 standard.
perplexity.ai: Who owns the most HBM patents?
Intel owns the most High Bandwidth Memory (HBM) patents, with approximately 1950 patent families related to HBM technology [1]. This is about three times more than Samsung, which holds the second-largest number of HBM patent families at around 650 [1]. NVIDIA ranks third in HBM patent ownership [1]. Other significant players in the HBM patent landscape include:
TSMC
Cambricon
Huawei
Micron
IBM
AMD
Pure Storage
SK Hynix
u/jklre: I had a meeting today with a really interesting hardware company as part of my work activities. They make ASIC's specifically for LLM infrence. They are souly focused on the Datacenter server market but I brought up making a consumer PCIE card and or a dev board like a RaspberryPi (or even something as small as a google coral TPU). They seemed very intrested in this market but were not sure that it would catch on. What would you guys think about this? An infrence ASIC card that eats up a lot less power (100 -200w) that can host local models and gives near GROQ levels of performance. Any thoughts?
🙂
☑️ #41 Feb 5, 2025
The Market Cap of Intel and best in class Company
@lithos_graphein: Great point from @clausaasholm:
"The best-in-class companies in each of the areas of the triad are incredibly valuable. Just unlocking a fraction of this value will represent an opportunity for Intel to boost its low market cap."
⚡️
@OutspokenGeek: In theory. The problem is the "best-in-class" part. Those always have best-in-class management.
Intel now has the "worst-in-school" management considering the bumbling fools at the board who have led to the current state of affairs.
🙂
☑️ #40 Feb 3, 2025
Convert CUDA to CUNN
@rwang07: REPORT: 910C Chip Achieves 60% of H100 Performance, DeepSeek Confirms.
According to real-world testing by the DeepSeek team, Huawei’s 910C chip (SMIC 7nm Process Node) has delivered an unexpectedly strong performance in inference tasks.
The report cited the experts predicts that:
Long-term as AI models increasingly converge around Transformer architectures, the importance of CUDA and PyTorch compilers may decline.
DeepSeek’s involvement could significantly reduce dependency on NVIDIA, leading to substantial cost savings.
However, achieving stable long-term training remains a key challenge for Chinese chips. stability of training remains a key obstacle for Chinese chips."
Link below:
+ Related content:
mp.weixin.qq.com (user: AGI Hunt): [Translated from CN] Huawei chips may tear open the iron curtain of computing power!
According to the latest news released by DeepSeek, the performance of Huawei's Ascend 910C chip has reached 60% of NVIDIA H100!
This number may be beyond everyone's expectation.
The actual measurement data of the DeepSeek team shows that the performance of Huawei 910C chip is unexpectedly good in reasoning tasks. Moreover, if the handwriting CUNN kernel is optimized, the performance can be further improved!
This achievement is of great significance.
You know, Huawei has achieved such achievements under strict restrictions. This not only shows the strength of China's chip manufacturing, but also indicates that the AI chip pattern may undergo major changes.
DeepSeek's support brings key advantages to Huawei chips:
Support Huawei Ascend chip from the first day
Independently maintain the PyTorch warehouse, and convert CUDA to CUNN with just one line of code.
Performance optimization has great potential, and higher performance can be achieved through customized optimization.
What's more noteworthy is that this chip is manufactured by China's semiconductor manufacturer SMIC using 7nm process, using chiplet packaging and integrating 53 billion transistors.
In the book Chip War, the author Chris Miller describes in detail the complex supply chain of the global chip industry:
The ultraviolet light system of European ASML is the key to chip manufacturing.
The precision optical system of Zeiss in Germany is indispensable.
The free markets in Europe, the United States and Southeast Asia ensure the smooth supply chain.
And China is excluded from this supply chain.
According to the usual path, it is expected that it will take 15 years for China to establish a production facility similar to TSMC.
But now, Huawei has proved with its strength that even under heavy restrictions, it can still break through the siege.
Although @immanuelg pointed out: "Transformer will not be the ultimate architecture of AGI", when the model architecture tends to be unified by Transformer, the value of manually optimizing specific operators is infinitely amplified.
This also confirms @manjimanga's observation that China may directly skip the traditional process competition through AI-driven design.
But the training field is still the absolute home ground of NVIDIA.
Yuchen Jin said frankly:
"The stability of long-cycle training is the biggest difficulty of Chinese chips."
This involves the deep integration of the chip underlying architecture and the software stack, and the moat of CUDA ecology for 20 years will not collapse easily.
Some experts predict:
As the AI model architecture converges to Transformer, the importance of CUDA and PyTorch compilers will be reduced.
The addition of the DeepSeek team will significantly reduce dependence on NVIDIA and save costs.
The field of training is still a challenge. The stability of long-term training is the key to the breakthrough of Chinese chips.
Huawei's breakthrough may prove one thing: under the iron curtain of computing power, no retreat is the best way out.
🙂
☑️ #39 Feb 2, 2025
CLAIM: “Chinese algorithm boosts Nvidia GPU performance 800-fold in science computing”
@BrianRoemmele: Analysis of Reported GPU Performance Breakthrough
Once again the world is surprised by a potential AI development out of China. This could be important. So let’s do a deep dive analysis on what is known.
A recent South China Morning Post article titled "Chinese algorithm boosts Nvidia GPU performance 800-fold in science computing" claims a revolutionary advancement in GPU-accelerated scientific computing. While independent verification of the method is pending, we examine plausible technical approaches that could enable such gains, contextualized within established GPU optimization research.
Key Optimization Strategies
1. Memory Efficiency
- *Data locality enhancements*: Techniques like memory tiling (partitioning data into GPU-friendly blocks) or advanced sparse matrix formats (e.g., CSR5 variants) could reduce latency by maximizing cache reuse.
- *Bandwidth mitigation*: Minimizing global memory access via register/shared memory prioritization, particularly for matrix operations or iterative solvers.
2. Execution Overhead Reduction
- *Kernel fusion*: Merging sequential kernels into single operations to eliminate launch delays and preserve intermediate data in fast memory.
- *Asynchronous workflows*: Overlapping computation with data transfers using CUDA streams to mask latency.
3. Workload Distribution
- Adaptive load balancing across GPU cores to address irregular computations (e.g., sparse simulations), potentially leveraging warp-level parallelism for finer-grained task allocation.
4. Domain-Specific Tuning
Tailoring algorithms to exploit structural patterns in physics simulations (fluid dynamics, quantum modeling) or linear algebra, such as precomputing symmetric elements or optimizing for sparsity.
Critical Considerations
The reported 800× gain likely applies to niche, memory-bound applications rather than general workloads. Historical precedents like Volkov and Demmel’s dense linear algebra optimizations (2008) show architecture-aware redesigns can yield order-of-magnitude gains. However, three caveats remain:
1. Benchmarks may reflect highly optimized edge cases.
2. Lack of peer-reviewed methodology limits technical assessment.
3. Compatibility with evolving GPU architectures (e.g., tensor cores) requires evaluation.
The reported GPU performance breakthrough, if independently validated, could reshape AI development and deployment. Accelerating GPU-bound operations by 800× would directly impact AI training efficiency. Training large language models (LLMs) or diffusion models, which currently take weeks or months on GPU clusters, could be reduced to days or hours.
This compression of training timelines would lower computational costs, democratizing access to advanced AI tools for smaller organizations and researchers. Faster iteration cycles would also allow rapid experimentation with model architectures, hyperparameters, and training strategies, potentially accelerating breakthroughs in fields like reinforcement learning or multimodal AI.
Real-time inference applications could benefit significantly. Autonomous systems like self-driving cars, surgical robots, or industrial automation rely on low-latency AI inference. An 800× improvement in GPU throughput might enable complex models (e.g., vision transformers) to process high-resolution sensor data with minimal delay, enhancing safety and responsiveness. Edge devices, which often sacrifice model complexity for power efficiency, might gain the ability to run larger models locally without relying on cloud computing—critical for applications in remote or latency-sensitive environments.
Scaling next-generation AI models could become more feasible. Current hardware limitations cap the practical size of models like trillion-parameter LLMs. By addressing memory bottlenecks and computational inefficiencies, this optimization might enable even larger architectures with improved reasoning or generative capabilities. Sparse models, which reduce redundant computations in neural networks, could see disproportionate gains, making them more viable for robotics or resource-constrained deployments.
Many scientific workflows, such as protein folding predictions (e.g., AlphaFold) or climate modeling, combine AI with traditional high-performance computing (HPC). Faster GPUs could tighten feedback loops between simulations and AI-driven analysis, enabling real-time optimization of experiments or drug discovery pipelines. For instance, molecular dynamics simulations accelerated by this method might allow AI systems to iteratively design and test compounds in virtual environments, drastically shortening development cycles.
Hardware-software co-design trends could intensify. If the optimization is tightly coupled to Nvidia’s architecture, it might reinforce Nvidia’s dominance in the AI accelerator market despite growing competition. Frameworks like PyTorch or TensorFlow might integrate similar optimizations, reshaping low-level compiler tools (e.g., CUDA, Triton) and encouraging algorithm designers to prioritize GPU-specific efficiency. However, risks remain. Specialized gains might unevenly benefit niche applications, such as sparse linear algebra, while offering minimal improvements for dense matrix operations common in LLMs. Overhyping unverified claims could divert research efforts toward dead-end optimizations.
This breakthrough could act as a catalyst for AI advancements but requires rigorous validation. Peer-reviewed technical details are essential to assess its generality, particularly for applications beyond scientific computing. If proven robust, it would lower barriers to cutting-edge AI research while posing new challenges in hardware-software alignment and sustainable scaling. Organizations with early access to such optimizations could gain significant competitive advantages, potentially widening existing gaps in AI innovation capacity.
---
References
1. SCMP: *Chinese algorithm boosts Nvidia GPU performance 800-fold in science computing*
2. Luo et al., *CSR5 sparse matrix format*, IEEE Trans. Comput. (2017)
3. Volkov & Demmel, *GPU dense linear algebra*, Supercomputing (2008)
4. Owens et al., *GPU computing survey*, Proc. IEEE (2007)
🙂
☑️ #38 Jan 30, 2025 🟡 emailed
Substack Reads weekly stack: DeepSeek’s stories
+ Related content:
Substack Reads > About: [Excerpt] Substack Reads is a weekly roundup of the greatest essays, art, and ideas created right here on Substack. Posts are recommended by staff and readers, and curated and edited from Substack’s U.K. outpost with writer Hannah Ray, and guest editors, published on Saturdays.
🙂
☑️ #37 Jan 29, 2025 🟥 0-shot prompting
[Topic] Commoditized AI’s industrial lobbies
deepinfra.com (LlaMA 3.3 70b Turbo & DeepSeek-V3): [Sources not verified. Do your own research] Based on the event of Monday, January 27, 2025, and the reaction of the American stock market to the new reasoning model offered by DeepSeek, I'll provide some insights:
Several industrial lobbies are likely interested in commoditized-AI data centers. The significant decline in the Independent Power & Renewables (-23.20%) and Electrical Equipment (-11.05%) industries suggests that the market is concerned about the potential disruption caused by commoditized-AI data centers. This could lead to increased interest from industrial lobbies, such as those representing traditional energy and equipment manufacturers, to invest in or partner with companies that offer commoditized-AI data center solutions. These lobbies may see commoditized-AI data centers as a way to reduce costs, increase efficiency, and stay competitive in a rapidly changing market.
[DeepSeek-V3] The sharp declines in those industries suggest a shift in focus toward more standardized, cost-effective AI infrastructure. Commoditized-AI data centers reduce the need for specialized, high-margin engineering and equipment, which could explain the negative performance of these sectors.
Industrial lobbies, especially those representing energy, construction, and technology, may see commoditized data centers as a way to lower costs and scale AI applications more broadly. This aligns with the trend toward efficiency and scale in AI deployment, rather than bespoke, high-cost solutions.
Possibly, commercial AI may be inevitably commoditized more than optimized. The market reaction to DeepSeek's new reasoning model may indicate that investors are betting on the widespread adoption of commoditized-AI solutions, which could lead to a decrease in the value of optimized AI solutions. The decline in the Semiconductor & Equipment (-13.85%) industry, which is a key player in the development of optimized AI solutions, supports this hypothesis. As commoditized-AI solutions become more prevalent, the demand for high-performance, customized AI solutions may decrease, leading to a shift in focus towards more affordable and accessible commoditized-AI solutions.
[DeepSeek-V3] Commoditization seems likely to outpace optimization in the near term. The market reaction to DeepSeek's model suggests that the focus is shifting toward scale and accessibility rather than cutting-edge performance. The decline in Semiconductor & Equipment (-13.85%) may indicate a reduced emphasis on specialized hardware for AI optimization, as commoditized solutions become more dominant.
However, optimization will remain critical for niche applications (e.g., healthcare, defense, or autonomous systems) where performance is paramount. But for broad commercial use, commoditized AI (offering "good enough" performance at lower costs) will likely dominate due to its potential for widespread adoption and integration into existing systems.
Not necessarily, algorithm innovations may still be important, even in a scenario where AI-commoditized real estate developments become prominent. While the market reaction suggests that commoditized-AI solutions are gaining traction, it's unlikely that algorithm innovations will become less important. Instead, the focus may shift towards developing more efficient, scalable, and affordable algorithms that can be deployed in commoditized-AI environments. The Construction & Engineering (-15.57%) industry decline may be related to the potential disruption caused by AI-commoditized real estate developments, but it's unlikely to diminish the importance of algorithm innovations. In fact, the demand for innovative algorithms may increase as companies seek to differentiate themselves in a commoditized-AI market.
[DeepSeek-V3] In the short term, real estate developments may overshadow algorithmic innovations, especially if commoditized-AI data centers become the norm. The significant drop in Construction & Engineering suggests a possible oversupply or reduced demand for new, specialized infrastructure, but the long-term trend may still favor real estate for AI data center expansion.
Keep in mind that these are hypothetical scenarios, and the actual market reaction may be driven by a complex array of factors. However, based on the provided information, it appears that the market is responding to the potential disruption caused by commoditized-AI solutions, and this may have significant implications for various industries and companies involved in the AI ecosystem.
+ Related content:
S&P Composite 1500 Index (Top 4 Industries - laggards; Daily Index performance based on Total Return Jan-27-2025):
-23.20% Independent Power & Renewables
-15.57% Construction & Engineering
-13.85% Semiconductor & Equipment
-11.05% Electrical Equipment
🙂
☑️ #38 Jan 27, 2025
Chip independence is basically a national focus at this point in China
@Dorialexander: I feel this should be a much bigger story: DeepSeek has trained on Nvidia H800 but is running inference on the new home Chinese chips made by Huawei, the 910C.
⚡️
[2/6] The 910Cs are an alternative to the H100, and just been released. Chip independence is basically a national focus at this point in China: it’s extremely hard to reconstruct one of the most complex industrial chains in the world but they have high incentives for it
[3/6] For now, the objective is not fully attained. Export restrictions affect the lithography machines that creates GPU chips in the first place. 910C are (slightly) less performant and, even more importantly, does not come yet with a good interconnect which is critical for training.
[4/6] Yet Huawei is catching up. 910c was made primarily with inference in mind and you don’t need to connect that many GPUs for this. And the next chip, the 920c, is aiming for B200 performance (the current Nvidia flagship).
[5/6] It seems DeepSeek would now contemplate training their V4 model on 32k 910c. Even though Huawei struggle to meet demand, this would be such a strong symbolic statement, they will certainly deliver them in priority. And DS is probably the one team to make interconnect work
[6/6] If this ever happens, this does change overnight US position on the global AI market. Local market may be sheltered for some time by tarifs plus force sell à la Tiktok. Elsewhere, well, if we have competing Chinese labs breaking the price both for models and chips…
+ Related content:
@EleanorOlcott: Huawei is working with AI groups, including DeepSeek, to make Ascend AI chips work for inference. Beijing told Chinese big tech to buy domestic AI chips to wean reliance on Nvidia. Huawei has sent teams of engineers to customers to help run Nvidia-trained models on Ascend.
[2/2] Read more from @zijing_wu and myself in these two pieces:
ft.com (1/22/25): [Excerpt] TikTok owner ByteDance plans to spend $12bn on AI chips in 2025.
Chinese company seeks growth from new technology as social media business comes under pressure in US.
ft.com (1/20/25): [Excerpt] Huawei seeks to grab market share in AI chips from Nvidia in China.
Tech giant pushes its artificial intelligence chips as hardware of choice for ‘inference’ tasks.
ctee.com.tw (1/21/25): [Excerpt] [Translated from CN] At the end of December last year, the third phase of the big fund invested in the establishment of Huaxinding New Equity Investment Fund and Guotuji New Equity Investment Fund, with an investment of 93 billion and 71 billion yuan respectively. With the launch of the above fund, the third phase of the large fund is expected to raise more external funds and strengthen capital investment in chip enterprises. Earlier analysis pointed out that the third phase of the Big Fund is actively strengthening the local supply chain in response to the possibility of stricter rules after the inauguration of President-ect Trump.
Comprehensive mainland media reported on the 20th that the above-mentioned partnership was established on the 17th. The partners are Guozhi Investment (Shanghai) Private Equity Fund Management Co., Ltd. and the third phase of the large fund, and the executive affairs partner is Guozhi Investment.
According to the information, Guozhi Investment was established in November 2024. It is a member of Shanghai Guosheng Group with a state-owned background, with a registered capital of 100 million yuan. The third phase of the large fund was registered on May 24, 2024, with a registered capital of 344 billion yuan, higher than the sum of 138.72 billion yuan in the first phase and the 204.15 billion yuan in the second phase.
A number of banks recently released a report pointing out that the third phase of the large fund aims to guide social capital to strengthen multi-channel financing support for the semiconductor industry for the development of the entire semiconductor industry chain.
For the specific investment areas of the third phase of the large fund, industry insiders believe that in addition to the previous semiconductor manufacturing, equipment, materials, parts and other "stuck neck" fields, with the rise of AI technology in the past two years, the key fields of AI semiconductors such as computing chips and high-bandwidth memory chips closely related to it may also become new The investment focus.
Huaxin Securities said that computing chips and memory chips will become the key focus of the industrial chain. In addition to continuing to invest in semiconductor equipment, materials, etc., the third phase of the large fund is more likely to focus on HBM chips, etc. as the key investment direction.
Xingye Securities said that although the new AI control regulations introduced by the United States implement strict export controls on some countries and regions, they also promote the self-reliance of domestic computing power.
🙂
☑️ #37 Jan 27, 2025
Why buy the cow when you can get the milk for free?
[Probably synthetic data powered by an unidentified LLM model]
@GavinSBaker: 1) DeepSeek r1 is real with important nuances. Most important is the fact that r1 is so much cheaper and more efficient to inference than o1, not from the $6m training figure. r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild. Simple math is that every 1b active parameters requires 1 gb of RAM in FP8, so r1 requires 37 gb of RAM. Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud. Would also note that there are true geopolitical dynamics at play here and I don’t think it is a coincidence that this came out right after “Stargate.” RIP, $500 billion - we hardly even knew you.
Real:
It is/was the #1 download in the relevant App Store category. Obviously ahead of ChatGPT; something neither Gemini nor Claude was able to accomplish.
It is comparable to o1 from a quality perspective although lags o3.
There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference. Training in FP8, MLA and multi-token prediction are significant.
It is easy to verify that the r1 training run only cost $6m. While this is literally true, it is also *deeply* misleading.
Even their hardware architecture is novel and I will note that they use PCI-Express for scale up.
Nuance:
The $6m does not include “costs associated with prior research and ablation experiments on architectures, algorithms and data” per the technical paper. “Other than that Mrs. Lincoln, how was the play?” This means that it is possible to train an r1 quality model with a $6m run *if* a lab has already spent hundreds of millions of dollars on prior research and has access to much larger clusters. Deepseek obviously has way more than 2048 H800s; one of their earlier papers referenced a cluster of 10k A100s. An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m. Roughly 20% of Nvidia’s revenue goes through Singapore. 20% of Nvidia’s GPUs are probably not in Singapore despite their best efforts.
There was a lot of distillation - i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1.
As @altcap pointed out to me yesterday, kinda funny to restrict access to leading edge GPUs and not do anything about China’s ability to distill leading edge American models - obviously defeats the purpose of the export restrictions. Why buy the cow when you can get the milk for free?
⚡️
@GavinSBaker: 2) Conclusions:
Lowering the cost to train will increase the ROI on AI.
There is no world where this is positive for training capex or the “power” theme in the near term.
The biggest risk to the current “AI infrastructure” winners across tech, industrials, utilities and energy is that a distilled version of r1 can be run locally at the edge on a high end work station (someone referenced a Mac Studio Pro). That means that a similar model will run on a superphone in circa 2 years. If inference moves to the edge because it is “good enough,” we are living in a very different world with very different winners - i.e. the biggest PC and smartphone upgrade cycle we have ever seen. Compute has oscillated between centralization and decentralization for a long time.
ASI is really, really close and no one really knows what the economic returns to superintelligence will be. If a $100 billion reasoning model trained on 100k plus Blackwells (o5, Gemini 3, Grok 4) is curing cancer and inventing warp drives, then the returns to ASI will be really high and training capex and power consumption will steadily grow; Dyson Spheres will be back to being best explanation for Fermi’s paradox. I hope the returns to ASI are high - would be so awesome.
This is all really good for the companies that *use* AI: software, internet, etc.
From an economic perspective, this massively increases the value of distribution and *unique* data - YouTube, Facebook, Instagram and X. 7) American labs are likely to stop releasing their leading edge models to prevent the distillation that was so essential to r1, although the cat may already be entirely out of the bag on this front. i.e. r1 may be enough to train r2, etc.
Grok-3 looms large and might significantly impact the above conclusions. This will be the first significant test of scaling laws for pre-training arguably since GPT-4. In the same way that it took several weeks to turn v3 into r1 via RL, it will likely take several weeks to run the RL necessary to improve Grok-3’s reasoning capabilities. The better the base model, the better the reasoning model should be as the three scaling laws are multiplicative - pre-training, RL during post-training and test-time compute during inference (a function of the RL). Grok-3 has already shown it can do tasks beyond o1 - see the Tesseract demo - how far beyond is going to be important. To paraphrase an anonymous Orc from “The Two Towers,” meat might be back on the menu very shortly. Time will tell and “when the facts, I change my mind.”
🙂
☑️ #36 Jan 27, 2025 🟥 Zero-shot prompting
[Topic] Algorithmic progress vs AI-optimized data centers and the Oracle’s ability to build large-scale data center campuses
lmarena.ai (DeepSeek R1): [Sources not verified. Do your own research] The expansion of Oracle's data center campuses through the Stargate project remains critically necessary despite advancements in AI algorithms. Here's a structured analysis of the key factors supporting this conclusion:
Algorithmic Efficiency vs. Computational Demand
Efficiency Gains: While improved algorithms reduce computational needs per task, they also enable more complex AI capabilities (e.g., AGI/ASI), which inherently require exponentially greater resources. Efficiency often drives demand, as seen historically with Jevons Paradox, where lower costs per unit of computation lead to increased overall usage.
Scalability of AI Applications: As AI models become more capable, their deployment across industries (healthcare, finance, autonomous systems) will expand, necessitating infrastructure to handle a surge in demand.
AGI/ASI Requirements
Unprecedented Compute Needs: AGI/ASI systems, if realized, will likely demand vast, specialized computational resources for tasks like continuous learning, real-time reasoning, and multimodal processing. Tailored data centers are essential to meet these requirements, which current infrastructure may not support.
Market and Strategic Positioning
Competitive Edge: Oracle’s partnership with NVIDIA, Microsoft, and OpenAI positions it to dominate the cloud market for AI. Delaying expansion risks ceding ground to competitors like AWS or Google Cloud, which are also investing heavily in AI infrastructure.
Vertical Integration: Stargate’s ecosystem (Arm’s energy-efficient chips, NVIDIA GPUs, Microsoft’s cloud expertise) creates a synergistic platform optimized for cutting-edge AI, attracting enterprise clients and researchers.
Timeline and Infrastructure Lead Times
Long-Term Investment: Data center construction takes years, and AI progress is accelerating. Proactive expansion ensures readiness for future breakthroughs, avoiding bottlenecks that could hinder AGI/ASI development.
Future-Proofing: Even if algorithms advance, infrastructure built today can incorporate modular, upgradable designs to adapt to new technologies (e.g., quantum computing, advanced cooling systems).
Sustainability and Regulatory Factors
Energy Efficiency: New data centers can integrate renewable energy and advanced cooling solutions, addressing environmental concerns while scaling compute capacity—a critical factor as AI’s carbon footprint draws scrutiny.
Conclusion
Algorithmic progress enhances AI capabilities but does not negate the need for robust infrastructure. Instead, it amplifies the potential applications and complexity of AI systems, driving greater demand for specialized data centers. Oracle’s Stargate project is a strategic move to capture market share, support AGI/ASI development, and ensure scalability in an AI-driven future. The $500 billion investment reflects a long-term vision where infrastructure and innovation are interdependent, not mutually exclusive.
+ Related content:
Don’t Worry About the Vase (1/24/25): Stargate AI-1 (Abilene Clean Campus (Stargate Site 1, Texas)
@siliconomy: Assuming the founding partners of Stargate can find an initial $100B in their combined cookie jars and that the investment is immediate, as indicated (according to rumours, it is already underway), this is the Capital Expenditures of the Top AI Companies.
The six largest companies driving AI investments are expected to have Capital Expenditures of $330B in 2025, representing 41% growth over 2024.
Adding Stargate to the equation increases the 2025 capex by 30%, representing a CapEx growth of 84% over 2024.
While the AI demand is considered insatiable, eventually, there has to be some method to the madness. Japanese and Arab banks are not investing in US sovereign AI for philanthropic reasons. There needs to be a ROI and, with that, a business model.
I cannot remember when a 100B$ corporate investment was announced without a business model. Read more here: https://lnkd.in/d6G-t_S9
@WhiteHouse (1/21/25): President Trump has unveiled a $500 BILLION American AI investment alongside tech leaders Larry Ellison, Masayoshi Son, and Sam Altman, set to create 100,000 American jobs almost immediately! America is leading the way into the future! 🇺🇸
@bigcountryhomepage7078 (10/30/24): Construction has come a long way out at the Lancium clean compute campus from it's conception and announcement over three years ago. Company leadership, Abilene, Taylor County, and state leadership taking time Tuesday to come together and celebrate the unofficial milestone amid the infrastructure of what will one day be a driving force of technological innovation. Large groups coming to tour the campus and hear what's in store later on down the road.
lancium.com (10/15/24): [Excerpt] Crusoe, Blue Owl Capital and Primary Digital Infrastructure Enter $3.4 billion Joint Venture for AI Data Center Development.
The Crusoe AI data center is a build-to-suit, two-building data center constructed to industry leading efficiency and reliability standards, and capable of supporting high energy density IT applications. The project is 100% long-term leased to a Fortune 100 hyperscale tenant with occupancy expected to begin in 1H 2025. Supported by Blue Owl’s investment and developed by Crusoe, the project will incorporate an innovative DC design, sharpened to support cutting edge AI workloads at an industry-leading scale. The design will be optimized for direct-to-chip liquid cooling and will also accommodate air cooling. At completion, the data center will be able to operate up to 100,000 GPUs on a single integrated network fabric, advancing the frontier of data center design and scale for AI training and inference workloads.
According to the announcement, 10 buildings are currently being built and will later be expanded to 20. The campus covers and area of 1,100 acres and reportedly has secured 1.2 GW of power, with 200 MW deployed as of January 2025.
🙂
☑️ #36 Jan 24, 2025
Meta is building a 2GW+ datacenter
@zuck: This will be a defining year for AI. In 2025, I expect Meta AI will be the leading assistant serving more than 1 billion people, Llama 4 will become the leading state of the art model, and we'll build an AI engineer that will start contributing increasing amounts of code to our R&D efforts. To power this, Meta is building a 2GW+ datacenter that is so large it would cover a significant part of Manhattan.
⚡️
@zuck: We'll bring online ~1GW of compute in '25 and we'll end the year with more than 1.3 million GPUs. We're planning to invest $60-65B in capex this year while also growing our AI teams significantly, and we have the capital to continue investing in the years ahead. This is a massive effort, and over the coming years it will drive our core products and business, unlock historic innovation, and extend American technology leadership. Let's go build!
+ Related content:
datacenters.atmeta.com: Data Centers > Locations
Meta's new 2GW+ datacenter will be located in Richland Parish, Louisiana (pdf). This facility is set to become Meta's largest data center to date and is designed to play a vital role in accelerating the company's AI progress.
$10 billion+ investment
2024 break ground
5,000+ skilled trade workers on site at peak construction
500+ operational jobs supported once completed
🙂
☑️ #35 Jan 24, 2024 🟥 Zero-shot prompting
[Topic] Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data.
@grok: [Sources not verified. Do your own research] Determining the exact associated costs for prior research and ablation experiments related to DeepSeek-V3 involves a degree of speculation due to the complexities and often proprietary nature of such research expenses.
However, based on general industry practices and available information, here's a theoretical scheme for estimating these additional costs:
Personnel Costs:
Research Teams: Developing models like DeepSeek-V3 involves teams of highly skilled researchers, engineers, and data scientists. Their salaries can be substantial, especially in tech hubs where AI talent is in high demand. Assuming a team of 50-100 researchers at an average annual cost of $200,000-$300,000 per person for a couple of years, this could easily reach into the tens of millions:
For example, if we consider 75 researchers working for 2 years at $250,000/year, this would be approximately $37.5 million for personnel alone.
Infrastructure and Compute Costs:
Preliminary Experiments: Before the final training run, numerous smaller experiments are conducted. These experiments might use less compute but are still significant. If we assume these experiments collectively consume about 20-30% of the GPU hours used in the final training phase:
Given the 2.788M GPU hours at $2 per hour for the final training, preliminary experiments could cost around $1.1 to $1.7 million in compute.
Data Acquisition and Management:
Data Costs: Collecting, cleaning, and labeling large datasets require resources. The cost here can vary widely based on whether data is generated synthetically or bought/annotated. If we estimate the cost of data management and acquisition at 10% of the total project cost (excluding final training), this could add:
Approximately another $3 to $5 million, considering the project's scale.
Software Development and Tools:
Custom Software: Developing specialized algorithms, frameworks, and tools for training efficiency involves significant software engineering efforts. This might include costs for licensing software, developing custom solutions, or even hardware-software co-design. An estimate here could be:
Around $2 to $3 million for software development over the project's lifecycle.
Miscellaneous and Overheads:
Other Expenses: This includes office space, utilities, travel, conferences, and other indirect costs. Assuming these are around 5-10% of the total personnel and infrastructure costs:
An additional $2 to $4 million could be a reasonable estimate.
Putting these together, we can theoretically estimate the costs associated with prior research and ablation experiments:
Personnel: $37.5 million
Preliminary Compute: $1.1 - $1.7 million
Data: $3 - $5 million
Software Development: $2 - $3 million
Miscellaneous: $2 - $4 million
Total estimated additional costs: Approximately $45.6 to $51.2 million.
Please note, this is a very rough estimate and actual costs could be higher or lower depending on numerous factors like the intensity of experimentation, the scale of the team, the efficiency of the research process, and the specific economic conditions in the region where the research is conducted. These numbers are based on typical industry benchmarks and the specifics of DeepSeek's operations might differ.
+ Related content:
arxiv.org (pdf): [Excerpt] DeepSeek-v3 Technical Report.
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.
github.com (pdf): [Excerpt] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.
In this paper, we take the first step toward improving language model reasoning capabilities using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ GRPO (Shao et al., 2024) as the RL framework to improve model performance in reasoning.
🙂
☑️ #34 Dec 27, 2024
TSMC Arizona
spectrum.ieee.org: [Excerpt] The U.S. will start manufacturing advanced chips.
A TSMC fab will open in Arizona in 2025, a test of the CHIPS Act.
In late October 2024, the company announced that yields at the Arizona plant were 4 percent higher than those at plants in Taiwan, a promising early sign of the fab’s efficiency. The current fab is capable of operating at the 4-nanometer node, the process used to make Nvidia’s most advanced GPUs. A second fab, set to be operational in 2028, plans to offer 2- or 3-nm-node processes. Both 4-nm and more advanced 3-nm chips began high-volume production at other TSMC fabs in 2022, while the 2-nm node will begin volume production in Taiwan this year. In the future, the company also has plans to open a third fab in the United States that will use more advanced technology.

+ Related content:
tsmc.com: [Excerpt] TSMC Arizona. In the city of Phoenix, TSMC Arizona will represent the world’s most advanced semiconductor technology in the United States.
@CNBC (12/13/24): TSMC’s New Arizona Fab! Apple Will Finally Make Advanced Chips In The U.S.
nist.gov > TSMC Arizona > Key Facts.
🙂
☑️ #33 Dec 16, 2024
The news of Intel Corp
(1) Intel’s foundry business
siliconangle.com: [Excerpt] A bold plan to spin out and revive Intel’s foundry business.
This special Breaking Analysis outlines a bold plan for spinning out Intel’s foundry business, relying on multi-stakeholder investments from tech giants, private equity and government funding, coupled with strategic partnerships with industry leaders such as Taiwan Semiconductor Manufacturing Corp. or possibly Samsung Electronics Co. Ltd — because only TSMC (or Samsung) has the necessary expertise to design, build and operate modern foundries and get to profitability in a reasonable timeframe.
(2) Altera chip unit
bloomberg.com (update; 12/19/24): [Excerpt] Intel said to shortlist suitors for Altera chip unit: Intel Corporation has narrowed down a list of potential acquirers for its Altera chip division and is inviting formal offers in January, as reported by local news outlets on Thursday.
Among the interested parties are private equity firms like Francisco Partners and Silver Lake Management, which are competing alongside Lattice Semiconductor Corp. in the second bidding round for Altera. The division is known for its expertise in designing low-power programmable chips.
The tech giant has curated a shortlist of prospective buyers for its Altera segment as part of Intel's broader strategy to optimize its business operations. According to sources familiar with the matter, the move aims to enhance focus on its primary sectors, which include cutting-edge semiconductor technology and artificial intelligence.
+ Related content:
altera.com: [Excerpt] Altera, an Intel Company, provides leadership programmable solutions that are easy-to-use and deploy in applications from the cloud to the edge, offering limitless AI possibilities. Our end-to-end broad portfolio of products including FPGAs, CPLDs, Intellectual Property, development tools, System on Modules, SmartNICs and IPUs provide the flexibility to accelerate innovation.
Our innovation of programmable logic started in 1983 in Silicon Valley. In 1984, Altera unveiled the world’s first programmable logic device capable of being programmed, erased, and reprogrammed altering the future of innovation.
cnbc.com (10/17/24): [Excerpt] Intel seeks billions for minority stake in Altera business, sources say.
A representative for Intel declined to comment. The sale process represents an abrupt change from Intel’s prior commentary on Altera. As recently last month, CEO Pat Gelsinger said that Intel’s leadership considered the business to be a core part of Intel’s future.
Intel has previously said it could look to monetize Altera business through an IPO, possibly as soon as 2026. But the idea of taking strategic or private equity investment would be a marked acceleration of those plans.
thedigitalinsider.com (3/1/24): Intel Eyes AI Applications for Its Programmable Chip Unit Altera – Technology Org. ntel has high hopes in the field of artificial intelligence for its recently established standalone programmable chip unit Altera.
The unit, known officially as “Altera, an Intel company,” became independent at the beginning of the year. Experts highlight the versatility of the programmable FPGA chips designed by Altera, emphasizing their utility in AI applications and computing tasks that lie between specialized processors developed by cloud computing giants like Amazon and general-purpose AI chips produced by Nvidia.
As the hardware landscape for AI evolves, CEO Sandra Rivera sees programmable chips occupying a significant and expanding niche market. The parent company, Intel, intends to conduct a stock offering for Altera within the next two to three years.
The market estimation for programmable chips in 2023 ranged from $8 to $10 billion, though the exact potential remains uncertain due to a lack of comprehensive third-party data. Rivera believes the opportunity for programmable chips is more extensive than commonly acknowledged, attributing this to the myriad ways these chips assist in various stages of the AI workflow.
Intel’s current line of programmable chips, Agilex, is manufactured by Intel Foundry, the company’s contract manufacturing arm.
Rivera did not disclose whether Intel Foundry would produce the upcoming Agilex 3 chips, but she hinted at favorable treatment for the company’s business units, potentially receiving a “friends and family discount” based on their volumes. An Intel spokesperson later clarified that Intel Foundry’s customers may receive advantageous pricing depending on anticipated volumes, with Intel business units being the largest customer(s) of Intel Foundry at present.
intel.com (4/8/24): [Excerpt] Intel and Altera Announce Edge and FPGA Offerings for AI at Embedded World. New edge-optimized processors and FPGAs bring AI everywhere across edge computing markets including retail, industrial and healthcare.
@dnystedt (10/24/23): Intel plans to spin out its programmable chip (FPGA) unit from Jan. 1, 2024 and IPO within 2-3 years, media report, noting Intel paid US$16.7 billion for it in the 2015 Altera acquisition. Intel’s Sandra Rivera said it is increasingly using Intel’s fabs to produce the chips, rather than Taiwan (TSMC). $INTC $TSM https://reuters.com/technology/intel-operate-its-programmable-solutions-group-standalone-business-2023-10-03/
🙂
☑️ #32 Dec 16, 2024
Enhanced, refined, second guessed, changed, altered, revised, revisited and had holes punched thru it at every turn
@BrentM_SpaceX: Our process has been enhanced, refined, second guessed, changed, altered, revised, revisited and had holes punched thru it at every turn. That said, every turn we made the road got straighter and now our process is good BUT….. (you guessed it) still not good enough. The best can only get better and colossus is cooking so calling all (s).
Join the @xai team now to help turn sci-fi into reality and bring exceptional talent to an exceptional team. *Do not apply if you’re not hardcore. *Apply if you have exceptional skills, traits and willingness to get better every single day.
🙂
☑️ #31 Dec 14, 2024
5,381
@EugeneNg_VCap: US data centres.
🙂
☑️ #30 Dec 12, 2024
TSMC 2nm (n2) technology
spectrum.ieee.org: [Excerpt] TSMC Lifts the Curtain on Nanosheet Transistors. And Intel shows how far these devices could go.
N2 is “the fruit of more than four years of labor,” Geoffrey Yeap, TSMC vice president of R&D and advanced technology told engineers at IEDM. Today’s transistor, the FinFET, has a vertical fin of silicon at its heart. Nanosheet or gate-all-around transistors have a stack of narrow ribbons of silicon instead.
The difference not only provides better control of the flow of current through the device, it also allows engineers to produce a larger variety of devices, by making wider or narrower nanosheets. FinFETs could only provide that variety by multiplying the number of fins in a device—such as a device with one or two or three fins. But nanosheets give designers the option of gradations in between those, such as the equivalent of 1.5 fins or whatever might suit a particular logic circuit better.
Called Nanoflex, TSMC’s tech allows different logic cells built with different nanosheetwidths on the same chip. Logic cells made from narrow devices might make up general logic on the chip, while those with broader nanosheets, capable of driving more current and switching faster, would make up the CPU cores.
+ Related content:
tsmc.com: [Excerpt] TSMC 2nm (N2) technology development is on track and made good progress. N2 technology features the company’s first generation of nanosheet transistor technology with full-node strides in performance and power consumption. Volume production is expected in 2025.
Major customers completed 2nm IP design and started silicon validation. TSMC also developed low resistance RDL (redistribution layer), super high performance metal-insulator-metal (MiM) capacitors to further boost performance.
TSMC N2 technology will be the most advanced technology in the semiconductor industry in both density and energy efficiency, when it is introduced in 2025. N2 technology, with leading nanosheet transistor structure will deliver full-node performance and power benefits, to address the increasing need for energy-efficient computing. With our strategy of continuous enhancements, N2 and its derivatives will further extend our technology leadership well into the future.
🙂
☑️ #29 Dec 11, 2024
Ayar Labs, with Investments from AMD, Intel Capital, and NVIDIA, Secures $155 million to Address Urgent Need for Scalable, Cost-Effective AI Infrastructure
ayarlabs.com: [Excerpt] Advent Global Opportunities and Light Street Capital lead Series D to accelerate high volume manufacturing of Ayar Labs’ in-package optical interconnects.
AI infrastructure is projected to see more than $1 trillion in investments over the next decade, highlighting the critical need for solutions that eliminate bottlenecks created by traditional copper interconnects and pluggable optics.
Ayar Labs has developed the industry’s first in-package optical I/O solution to replace electrical I/O that is standards-based, commercial-ready, and optimized for AI training and inference. Optical I/O allows customers to maximize the compute efficiency and performance of their AI infrastructure, while reducing costs and power consumption, to dramatically improve profitability metrics for AI applications.
🙂
☑️ #28 Dec 10, 2024
The only US-based manufacturer of memory chips
whitehouse.gov: [Excerpt] Today, thanks to our historic legislation, the Department of Commerce has finalized one of its largest awards to date with Micron Technology, the only U.S. based manufacturer of memory chips. This more than $6.1 billion investment in Clay, NY and Boise, ID supports the construction of several state-of-the-art memory chips facilities as part of Micron’s total $125 billion investment over the next few decades, creating at least 20,000 jobs by the end of the decade. These investments will help the U.S. grow its share of advanced memory manufacturing from nearly 0% today to 10% over the next decade.
+ Related content:
micron.com > Memory: [Excerpt] Transforming how the world uses information.
Micron offers a rich portfolio of innovative memory solutions that are enabling advances across a wide range of markets and industries. From CXL memory, to high-speed GDDR6, to DDR5 and NVDIMMs, Micron memory products deliver fast data access speeds, reduce power consumption, and lower the space requirements.
DRAM components
DRAM modules
Low-power DRA components
High-bandwidth memory
Graphics memory
CXL memory
micron.com (updated June 2024): How DRAM changed the world.
🙂
☑️ #27 Dec 10, 2024
Bezos-backed U.S. chip designer plans foray into Tokyo
asia.nikkei.com: [Excerpt] Tenstorrent eyes outsourcing to Rapidus to meet demand for cutting-edge devices.
TOKYO -- U.S. chip designer Tenstorrent will launch a business in Japan to design cutting-edge semiconductors, Nikkei has learned. The company backed by Amazon founder Jeff Bezos has seen an increase in demand from such clients as self-driving technology companies and data center operators. It will also consider outsourcing the semiconductor production to Japanese chipmaker Rapidus.
Founded in 2016, Tenstorrent is headed by CEO Jim Keller who has designed chips at Apple, Advanced Micro Devices and Tesla. In December, it announced an investment of $693 million from Bezos and other sponsors.
+ Related content:
rapidus.inc: [Excerpt] Japanese development and manufacturing capabilities,
with global collaboration. Rapidus develops and manufactures the world’s most advanced logic semiconductors.tenstorrent.com : [Excerpt] We bring together experts in the field of computer architecture, ASIC design, advanced systems, and neural network compilers to build the next generation of computing.
🙂
☑️ #26 Dec 9, 2024
Data centers are driving US power demand to hard-to-reach heights
Big Tech wants massive amounts of energy to fuel their AI ambitions. That could strain utilities, ratepayers, and efforts to decarbonize the grid.
This rapid demand growth is not spread evenly across the country. “Most of the load growth is occurring in the Dallas–Fort Worth region, in the Northern Virginia region, and in the Atlanta region,” Wilson said — regions where data-center developers are seeking gigawatts of power for projects they hope to build as quickly as possible.

+ Related content:
gridstrategiesllc.com (December 2024): Strategic Industries Surging: Driving US Power Demand.
🙂
☑️ #25 Dec 5, 2024
Supermicro is here to support xAI's massive 10-fold expansion of the Colossus supercomputer in Memphis
charlesliang: Supermicro is here to support xAI's massive 10-fold expansion of the Colossus supercomputer in Memphis with over 1 million GPUs by establishing local operations/production, validation, service and support. With our optimized datacenter building blocks (DCBBS) and ambient temperature direct liquid cooling (DLC), Supermicro green computing provides the best AI performance propelling the future of AI alongside @NVIDIA while reducing datacenter power, space, water and cost. Stay tuned for more datacenter-scale innovations! #AI #Supercomputing #SMCI
⚡️
@AGItechgonewild: Awesome, Charles! You have legendary expertise in building data centers that is super appreciated!
+ Related content:
🙂
☑️ #24 Dec 5, 2024
Insatiable AI Data Centers
@EugeneNg_VCap: How Data Center growth could look like over the next 5 years.
🙂
☑️ #23 Dec 4, 2024
Episode 64: NVIDIA's Josh Parker on AI and Energy Efficiency
Special Competitive Studies Project: NVIDIA's Josh Parker joins Jeanne Meserve for a conversation on the energy efficiency of AI, the role of data centers, and the potential of renewable energy.
🙂
☑️ #22 Dec 4, 2024
27 data centers and counting
facebook.com: [Excerpt] Hello, Louisiana!
We are excited to announce that Richland Parish, Louisiana, will be home to Meta’s newest data center — our 23rd data center in the United States and 27th in the world. This custom-designed 4 million-square-foot campus will be our largest data center to date. It will play a vital role in accelerating our AI progress.
Once completed, the Richland Parish Data Center will represent an investment of more than $10 billion in Louisiana and will support over 500 operational jobs. We are also investing over $200 million in local infrastructure improvements.
+ Related content:
datacenters.atmeta.com (Other posts in 2024):
11/14/24 online Tennessee - Gallatin
8/29/24 <work in progress> South Carolina - Aiken
7/2/24 <work in progress> Wyoming - Cheyenne
5/2/24 <work in progress> Alabama - Montgomery
3/14/24 <work in progress> Minnesota - Rosemount
1/25/24 <work in progress> Indiana - Jeffersonville
@StockMKTNewz: Facebook $META just unveiled plans to invest $10 Billion to set up an AI data center in Louisiana, in what would be its largest data center in the world - Reuters.
🙂
☑️ #21 Dec 2, 2024 🔴 rumor
OpenAI Cloud and datacenters
ft.com: [Excerpt] OpenAI targets 1bn users in next phase of growth. ChatGPT-maker aims for big boost from new AI products, Apple partnership and infrastructure investment.
She [Sarah Friar] added: “We’re in a massive growth phase, it behoves us to keep investing. We need to be on the frontier on the model front. That is expensive.”
To achieve its goals, OpenAI plans to invest in building clusters of data centres in parts of the US midwest and south-west, according to Chris Lehane, OpenAI’s new policy chief.
This push to build its own AI infrastructure follows a similar strategy by Big Tech rivals such as Google and Amazon. Lehane said “chips, data and energy” are the critical resources required to succeed in the AI race.
🙂
☑️ #20 Nov 27, 2024
Green light?
@Supermicro_SMCI: Green means go. With all the green LED lights illuminated on NVIDIA NVLink Switches, our end-to-end liquid-cooled NVIDIA GB200 cluster is fully connected and ready to handle demanding AI workloads.
Learn more about our AI data center: https://hubs.la/Q02ZJ_Yz0
+ Related content:
ir.supermicro.com (last update: 11/20/24): Super Micro Computer Inc. Nasdaq Non-Compliance Update.
sec.gov (11/13/24) > NT 10-Q (Late filing notice) > Notification of Late Filing:
PART III — NARRATIVE
Super Micro Computer, Inc. (the “Company”) is unable to file its Quarterly Report on Form 10-Q for the period ended September 30, 2024 (the “Q1 2025 Form 10-Q”) in a timely manner without unreasonable effort or expense. As previously announced, the Company has been unable to file its Annual Report on Form 10-K for the period ended June 30, 2024 (the “2024 Form 10-K,” and together with the Q1 2025 Form 10-Q, the “Delinquent Reports”).
As previously disclosed, in response to information that was brought to the attention of the Audit Committee of the Company’s Board of Directors, the Board of Directors formed a committee (the “Special Committee”) to review certain of the Company’s internal controls and other matters (the “Review”). As disclosed in the Company’s Form 8-K filed on October 30, 2024, prior to the completion of the Review, the Company’s independent registered public accounting firm at the time (the “Former Firm”) resigned. The Company is diligently working to select an independent registered public accounting firm (a “Successor Firm”).
The Special Committee has completed its investigation based on a set of initial concerns raised by the Former Firm. The Special Committee has other work that is ongoing, but expects the Review to be completed soon. Additional time is also needed for: (i) the Company to select and engage a Successor Firm, (ii) the Company’s management to complete its assessment of the effectiveness of its internal controls over financial reporting as of June 30, 2024, and (iii) the Successor Firm to conduct its audit of the financial statements to be incorporated in the 2024 Form 10-K and conduct its audit of the Company’s internal controls over financial reporting as of June 30, 2024. Further additional time is needed for the Company to prepare the Q1 2025 Form 10-Q and for the Successor Firm to review the interim financial statements to be included in the Q1 2025 Form 10-Q. The Q1 2025 Form 10-Q cannot be completed and filed until the 2024 Form 10-K is completed and filed.
As a result of the foregoing, the Company needs additional time to finalize the financial statements and related disclosures to be filed as part of the Delinquent Reports.
🙂
☑️ #19 Nov 27, 2024
Take a peek inside the 4 levels of an Intel fab
@intelnews: Semiconductor factories, commonly known as "fabs," are manufacturing marvels — producing thousands of complex computer chips every hour, every day. Take a peek inside the 4 levels of an Intel fab, and learn how #semiconductors are brought to life: intel.com/content/www/us/en/newsroom/tech101/manufacturing-101-how-semiconductor-factory-works.html
+ Related content:
intel.com: [Excerpts] How a Semiconductor Factory Works. Discover how semiconductors come to life in some of the largest, most complex factories in the world.
What does it take to build a fab?
An Intel semiconductor factory, or “fab,” is a manufacturing marvel. Every hour, every day, the 70-foot-tall structure produces thousands of computer chips, the most complex products on Earth and each not much bigger than a fingernail.
A typical fab includes 1,200 multimillion-dollar tools and 1,500 pieces of utility equipment. It costs about $10 billion and takes three to five years and 6,000 construction workers to complete. Three of the fab’s four levels support the clean room, the home to chip production.
An inside look at the four levels of an Intel fab
First level: Interstitial and fan deck. The fan deck houses systems that keep the air in the clean room particle-free and precisely maintained at the right temperature and humidity for production.
Second level: Clean room level. A clean room is made up of more than 1,200 factory tools that take pizza-size silicon wafers and eventually turn them into hundreds of computer chips. Clean room workers wear bunny suits to keep lint, hair and skin flakes off the wafers. Fun fact: Clean rooms are usually lit with yellow lights. They are necessary in photolithography to prevent unwanted exposure of photoresist to light of shorter wavelengths.
Third level: Clean sub fab level. The clean sub fab contains thousands of pumps, transformers, power cabinets and other systems that support the clean room. Large pipes called “laterals” carry gases, liquids, waste and exhaust to and from production tools. Workers don’t wear bunny suits here, but they do wear hard hats, safety glasses, gloves and shoe covers.
Fourth level: Utility level. Electrical panels that support the fab are located here, along with the “mains” — large utility pipes and ductwork that feed up to the lateral pipes in the clean sub fab. Chiller and compressor systems also are placed here. Workers who monitor the equipment on this level wear street clothes, hard hats and safety glasses.
Where are Intel’s fabs?
Intel is upgrading or expanding new fabs in Oregon, Israel, Arizona, New Mexico and Ireland. New fabs are under construction in Ohio and planned for Germany.
🙂
☑️ #18 Nov 26, 2024
The story of ASML
@deedydas: A single fairly unknown Dutch company makes maybe the most expensive and complex non-military device ($200M) that builds on 40 years of Physics and has a monopoly responsible for all AI advancement today.
Here's the story of ASML, the company powering Moore's Law..
1/9
⚡️
2/9 ASML's extreme ultraviolet (EUV) machines are engineering marvels.
They shoot molten tin droplets 50,000x/s with a 25kW laser turning it into plasma as hot as the sun's surface to create 13.5nm UV light —so energetic it's absorbed by air itself.
⚡️
3/9 Each $200M machine contains mirrors that are the smoothest objects humans have ever created.
They're made with layers of molybdenum/silicon, each just a few atoms thick. If you scaled one to the size of Germany, its largest imperfection would be 1mm high.
⚡️
4/9 This light goes through the mirrors onto moving 300mm silicon wafers at highway speeds (~1m/s) with precision better than the width of a SINGLE SILICON ATOM (0.2nm).
That's like hitting a target in SF from NYC with the accuracy of a human hair.
⚡️
5/9 TSMC's 4nm process for NVIDIA H100 needs ~15 EUV layers (+80 DUV layers).
Each layer must align within nanometers. One machine processes ~100 wafers/hr. Cost? About $150K of chips per hour.
Other techniques cannot get the quality + throughput + cost to this level.
⚡️
6/9 Why monopoly?
The supplier network:
Zeiss (optics)
Trumpf (lasers)
VDL (frames)
40 years of co-development, 40,000 patents, 700+ suppliers. They own 24.9% of Zeiss's semiconductor div.
Replication would take decades + $100B+.
⚡️
7/9 The complexity is astounding.
Each machine ships in 40 containers and takes 4 months to install. The supply chain spans 700+ companies. 100K+ parts per machine, 40K patents protecting it.
One missing component = global semiconductor disruption.
⚡️
8/9 Only three companies can run cutting-edge EUV:
TSMC (that makes GPUs for Nvidia)
Samsung
Intel.
ASML machines are the only way to make chips dense enough for modern AI. Each H100 has 80B transistors. The next gen will need >100B.
Impossible without EUV.
⚡️
9/9 Rich Sutton's "The Bitter Lesson" is that general methods that leverage computation and Moore's Law are the most effective for advancing AI research.
In the iceberg of AI technology, while LLMs are at the top, ASML is at the murky depths. It has kept Moore's Law alive.
+ Related content:
@Edelweiss_Cap (update; 12/15/24): $ASML Quite fresh from the press. Looks interesting.
@ASMLcompany (11/28/24): From early design to volume manufacturing, our computational lithography software enables chipmakers to optimize the chip patterning process. Layer by layer, chip by chip, wafer by wafer, computational lithography is helping push microchip technology to new limits.
incompleteideas.net (3/13/19): [Excerpt] The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. Most AI research has been conducted as if the computation available to the agent were constant (in which case leveraging human knowledge would be one of the only ways to improve performance) but, over a slightly longer time than a typical research project, massively more computation inevitably becomes available. Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation. These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other. There are psychological commitments to investment in one approach or the other. And the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation. There were many examples of AI researchers' belated learning of this bitter lesson, and it is instructive to review some of the most prominent.
🙂
☑️ #17 Nov 22, 2024
AI customers crave speed more than price in high-bandwith memory
bloomberg.com: [Excerpt] High-bandwidth memory chips can add value with each new generation, boosting profits for SK Hynix, Micron Technology and Samsung Electronics, as data-transfer speed is a top competitive factor in artificial-intelligence computing. Many AI customers prize high speed more than low price in high bandwidth memory chips because time to market is critical.
DRAM speed crucial to AI development
The performance of DRAM and HBM chips is key to development of artificial intelligence, so the added value that can be offered by DRAM, especially its HBM subset, might rise over the long term. DRAM data-transfer speeds haven’t kept up with the calculation speeds of central processing units.
🙂
☑️ #16 Nov 19, 2024
Analog AI Startup Aims to Lower Gen AI's Power Needs
spectrum.ieee.org: [Excerpt] Sageance emerges from stealth promising Llama 2 at 10 percent power.
The core power-savings prowess for analog AI comes from two fundamental advantages: It doesn’t have to move data around and it uses some basic physics to do machine learning’s most important math.
That math problem is multiplying vectors and then adding up the result, called multiply and accumulate.Early on, engineers realized that two foundational rules of electrical engineers did the same thing, more or less instantly. Ohm’s Law—voltage multiplied by conductance equals current—does the multiplication if you use the neural network’s “weight” parameters as the conductances. Kirchoff’s Current Law—the sum of the currents entering and exiting a point is zero—means you can easily add up all those multiplications just by connecting them to the same wire. And finally, in analog AI, the neural network parameters don’t need to be moved from memory to the computing circuits—usually a bigger energy cost than computing itself—because they are already embedded within the computing circuits.
+ Related content:
sagence-ai.com: [Excerpt] New class of highly efficient AI inference machine. Unprecedented combination of highest performance at lowest power with economics that match costs to value, scaling from data center to edge applications.
llama.com: [Excerpt] Llama 2: open source, free for research and commercial use. We're unlocking the power of these large language models. Our latest version of Llama – Llama 2 – is now accessible to individuals, creators, researchers, and businesses so they can experiment, innovate, and scale their ideas responsibly.
🙂
☑️ #15 Nov 19, 2024
Micron in a very brutal market
@techfund: Former director at Micron gives a brief overview of the history of the competitive landscape in the memory industry (Tegus):
It's been a very brutal market. When Micron started as a start-up almost 40 years ago, at that time, Intel was still producing memory. You had the Japanese companies like Hitachi, Toshiba, all in the memory business, and a number of the European companies like Philips were involved in memory. Over roughly a 20-year period from the late 1980s to the 2000s, there was lots of consolidation.
During that period of time, Micron was able to absorb a lot of the weaker memory companies to begin to get more capacity and more technology. In particular, they absorbed a lot of initially Japanese companies that the Japanese government was forcing to merge because they were performing poorly.
Later, a number of European companies that were similarly in a position, they just could not compete, especially with the rise of the Korean companies, first Samsung and then Hynix. Right now it's really a troika of those three. About 10 years ago, the Chinese government, realizing that memory and storage is a critical part of their supply chain, began to fund aggressively the growth of a number of China-based memory and storage companies.
Those companies began a bit behind the main three, but they've been pretty aggressively catching up and also looking at ways that they can continue to grow. They've been somewhat hindered in the last couple of years because of the lack of access to more modern fab technology for memory. There's still a growing force.
+ Related content:
🙂
☑️ #14 Nov 18, 2024
IBM Expands its AI Accelerator Offerings; Announces Collaboration with AMD
newsroom.ibm.com: [Excerpt] IBM Cloud to Deploy AMD Instinct™ MI300X Accelerators to Support Performance for Generative AI Workloads and HPC Applications.
For generative AI inferencing workloads, IBM plans to enable support for AMD Instinct MI300X accelerators within IBM's watsonx AI and data platform, providing watsonx clients with additional AI infrastructure resources for scaling their AI workloads across hybrid cloud environments. Additionally, Red Hat Enterprise Linux AI and Red Hat OpenShift AI platforms can run Granite family large language models (LLMs) with alignment tooling using InstructLab on MI300X accelerators.
IBM Cloud with AMD Instinct MI300X accelerators are expected to be generally available in the first half of 2025. Stay tuned for more updates from AMD and IBM in the coming months.
To learn more about IBM’s GPU and Accelerator offerings, visit: https://www.ibm.com/cloud/gpu
+ Related content:
amd.com: [Excerpt] AMD Instinct™ MI300X Accelerators
AMD Instinct™ MI300X accelerators are designed to deliver leadership performance for Generative AI workloads and HPC applications.
amd.com (12/6/23): [Excerpt] AMD Delivers Leadership Portfolio of Data Center AI Solutions with AMD Instinct MI300 Series.
🙂
☑️ #13 Nov 15, 2024
Statement from President Joe Biden on Final CHIPS Award for TSMC
whitehouse.gov: [Excerpt] Today’s final agreement with TSMC – the world’s leading manufacturer of advanced semiconductors – will spur $65 billion dollars of private investment to build three state-of-the-art facilities in Arizona and create tens of thousands of jobs by the end of the decade. This is the largest foreign direct investment in a greenfield project in the history of the United States. The first of TSMC’s three facilities is on track to fully open early next year, which means that for the first time in decades an America manufacturing plant will be producing the leading-edge chips used in our most advanced technologies – from our smartphones, to autonomous vehicles, to the data centers powering artificial intelligence.
+ Related content:
whitehouse.gov (8/9/22): FACT SHEET: CHIPS and Science Act Will Lower Costs, Create Jobs, Strengthen Supply Chains, and Counter China.
pr.tsmc.com: Multimedia Gallery - Fabs Inside.
🙂
☑️ #12 Nov 15, 2024
The Influence of Bell Labs
Construction Physics: [Excerpt] There seem to have been several modes of influence. For one, the transistor itself ushered in a new world of semiconductor devices. Semiconductors were a product of deep, scientific understanding of the physical nature of matter. Anyone who wanted to compete in the new market by developing their own semiconductor-based products would need to acquire the relevant scientific expertise.
The transistor also demonstrated that “basic” research could result in enormously successful, world-changing products. The world had just seen the amazing power of scientific research in wartime achievements like radar and the atomic bomb, and the transistor showed that the fruits of such research weren’t limited to enormous government projects. The success of nylon, another successful product that was the result of basic research by DuPont, also reinforced this perception.
🙂
☑️ #11 Nov 14, 2024
GPU supply constrains: no problem!
AI Snack Bytes: Kai-Fu Lee (author of AI Super Powers) says GPU supply constraints are forcing Chinese AI companies to innovate, meaning they can train a frontier model for $3 million contrasted with GPT-5's $1 billion, and deliver inference costs of 10c/million tokens, 1/30th of what an American company charges
Export Controls don’t work.
+ Related content:
aisuperpowers.com : [Excerpt] AI Superpowers. China, Silicon Valley and the new world order.
The United States has long been the leader in Artificial Intelligence. But Dr. Kai-Fu Lee—one of the world’s most respected experts on AI—reveals that China has caught up to the US at an astonishingly rapid pace. As Sino-American competition in AI heats up, Lee envisions China and the US forming a powerful duopoly in AI. In this provocative new book, he outlines the upheaval of traditional jobs, how the suddenly unemployed will find new ways of making their lives meaningful, and how the Chinese and American governments will have to cope with the changing economic landscape... Read more
🙂
☑️ #10 Nov 10, 2024
The AI Semiconductor Landscape
Generative Value: [Excerpt] An overview of the technology, market, and trends in AI semiconductors.
Why are GPUs so good for AI?
The base unit of most AI models is the neural network, a series of layers with nodes in each layer. These neural networks represent scenarios by weighing each node to most accurately represent the data it's being trained on.
Once the model is trained, new data can be given to the model, and it can predict what the outputted data should be (inference).
This “passing through of data” requires many, many small calculations in the form of matrix multiplications [(one layer, its nodes, and weights) times (another layer, its nodes, and weights)].
This matrix multiplication is a perfect application for GPUs and their parallel processing capabilities.
🙂
☑️ #9 Nov 8, 2024
You might put a GPU in a liquid bath and then by doing that, you're able to cool that whole component much better
@techfund: [Excerpt] Microsoft datacenter architect on the four phases of liquid cooling, mentions a reduction of 80% in GPU failure rates with liquid cooling as from phase 2:
We actually work with a lot of liquid cooling. There's four phases when it comes to liquid cooling. You've got Phase one, which is direct-to-chip, and that's where basically you're running liquid or refrigerant across the chip or across the chill plate in order to basically put that cold plate and cool that individual component. Phase two is where you actually immerse that entire component.
You might put a GPU in a liquid bath and then by doing that, you're able to cool that whole component much better. Phase three is where you actually immerse the entire rack. You put not just the GPUs, but also the cabling, well, the motherboards and the storage and everything like that. Phase four is where you actually immerse the entire data center. We've done that in the Orkney Islands with Project Natick that we did out there, and we learned a lot from that, too.
Honestly, we work with a lot of those vendors to see what they're doing with from a liquid cooling standpoint. A lot of what we found is that when you use immersion cooling, which is Phase two, it actually reduces failure rates of components by 80% on average. We've been working with hard drive vendors, with memory vendors, to see what they can do to do immersion cooling, and the reason being is that when we order GPUs, they're going to burn out over a period of time, but we could guarantee that it's going to reduce failures by about 80%.
Therefore, we don't need to have extra equipment sitting around just for, in case hardware fails, we can use that additional four GPUs and deploy those out and therefore, serve that demand out there. We haven't had any issues with regard to liquid cooling. We were actually the proponents of liquid cooling with the H100s, and we were the ones who told NVIDIA that they need to do liquid cooling on Blackwell and on Hopper, and that's a big reason why they built out liquid cooling.
+ Related content:
🙂
☑️ #8 Nov 8, 2024 🟠 opinión work in progress
@mackhawk: NEW: What does Trump’s win mean for the Chips Act?
It’s extremely unlikely that the program goes away
Companies, already anxious to finalize awards, are now more anxious
Republicans are eyeing reforms of what they consider social provisions
bloomberg.com/news/articles/2024-11-08/trump-s-win-sets-off-race-to-complete-chips-act-subsidy-deals
⚡️
mackhawk: This story is based on interviews with around 20 people in government, industry, policy shops, etc.
All of them said that despite Trump/Johnson’s comments on the campaign trail, the Chips Act is here to stay. It aligns with Trump priorities & has broad bipartisan support:
x.com/mackhawk/status/1854705261069426899
(1) (2)
⚡️
@mackhawk: What could change are rules re: environment, labor, etc. Rs already discussing pursuing such reforms in budget reconciliation.
Companies’ concern is not that such measures would meaningfully affect how much $ they get. It’s that any changes could mean a delay — & they want $ ASAP
⚡️
@mackhawk: The best way to prevent a delay is signing a contract. Only one small company has, but more will soon.
TSMC & GlobalFoundries are basically there. Intel, Samsung & Micron have outstanding issues.
Others are in earlier stages, including those still negotiating preliminary deals.
(1) (2)
⚡️
@mackhawk: Anyway, there’s more detail in the story, but the main takeaway bears repeating: There are lots of Biden programs that Rs will try to tear down. The Chips Act really isn’t likely to be one of them Evergreen reminder that my Signal is mackhawk.71
🙂
☑️ #7 Nov 7, 2024
Rainwater Could Help Satisfy AI’s Water Demands
scientificamerican.com: [Excerpt] A few dozen ChatGPT queries cost a bottle’s worth of water. Tech firms should consider simpler solutions, such as harvesting rainwater, to meet AI’s needs.
Data centers require massive amounts of water for liquid cooling systems to absorb and dissipate the heat generated by servers. Researchers at the University of California, Riverside, have found that between five and 50 ChatGPT requests can consume up to 500 milliliters of water (close to the amount in a 16-ounce bottle). Those gulps add up. Google used 20 percent more water in 2022 compared to 2021 as it ramped up AI development. Microsoft's water use rose by 34 percent over the same period. By 2027 the amount of water AI uses in one year worldwide is projected to be on par with what a small European nation consumes. Worse, large numbers of data centers are located in water-stressed regions. Recently, a Google-owned data center in The Dalles, Oregon commanded one third of the city’s water supply amid drought conditions.
🙂
☑️ #6 Nov 7, 2024
500 petabytes
scmp.com: [Excerpt] The popular lifestyle social platform Xiaohongshu spent a full year moving 500 petabytes of data to China’s largest cloud provider.
Alibaba Group Holding’s cloud computing arm is now home to 500,000 terabytes worth of data from Chinese lifestyle platform Xiaohongshu after what the companies called the largest data migration ever, a case that could enhance the leading position of one of the country’s largest tech firms in the domestic cloud market.
The migration of the 500-petabyte “data lake” – a repository that stores, processes and secures large amounts of structured and unstructured data – started last November, taking a year of 1,500 staff members from Xiaohongshu working with teams at Alibaba to complete, according to a statement from Alibaba Cloud. Alibaba owns the South China Morning Post.
🙂
☑️ #5 Nov 7, 2024
OpenAI’s comments to the NTIA on data center growth, resilience, and security
openai.com: [Excerpt] This comment was submitted in response to a request for information from the National Telecommunications and Information Administration (NTIA).
OpenAI recently engaged outside experts to forecast potential job gains and GDP growth that would result from building 5GW data centers in a sampling of states. We found(opens in a new window) that constructing and operating a single 5GW data center could create or support about 40,000 jobs – in construction and maintenance, restaurants and retail, and other industries that would serve the new workers — and contribute between $17 billion and $20 billion to a state’s GDP.
These numbers highlight the importance of getting data center policy right. To that end, we provide the following feedback to help policymakers invest in AI infrastructure growth, resilience and security.
Read OpenAI’s full comment letter here(opens in a new window).
+ Related content:
ntia.gov: [Extracto] [Traducido] The Department of Commerce’s National Telecommunications and Information Administration is the President’s principal advisor on information technology and telecommunications policy. In this role, the agency will help develop the policies necessary to verify that AI systems work as they claim – and without causing harm. Our initiative will help build an ecosystem of AI audits, assessments, certifications, and other policies to support AI system assurance and create earned trust.
🙂
☑️ #4 Nov 7, 2024 🟠 opinion
4th Revolution
EXISTO Substack: [Original text] What is needed and how long will this fourth industrial revolution last?
The answer could be a lot of a Monte Carlo simulation and predicting it would be equivalent to a game at the height of a fairground raffle.
Neither Perplexity nor SearchGPT are prepared for the occasion because they copy Wikipedia and beyond, and all those information "portals" will end up named Klaus Schwab.
So here we will also name it.
But probably reaching Artificial General Intelligence (AGI) would be the closest stage of that unpredictable calendar.
For AGI (and later the ASI) to be a planetary reality it will take an impressive amount of computing and we will know that we will find ourselves in one of those stages when predicting trends and patterns is as difficult as typing with a keyboard.
The certainty of the term coined as "fourth industrial revolution", and that some strive to grant it to Schwab determined to turn it into the slogan of his book with the same title, is that it requires a large number of raw materials as well as processing and calculation of data in industrial quantities never seen before.
If the first industrial revolution seems easy to understand when you read about the steam engines that caused the industrialization of commodities such as cotton, where cities like Liverpool hooked the cutting-edge technologies of the time leaving for encyclopedias cities like Antwerp, the textile guilds and the wool trade on the banks of the Scheldt River, it is very likely that this process will be repeated.
Cities that were or are the centers of the universe will be forgotten and new urban environments will attract the talent and technologies necessary to make an industrial revolution great again. And materials and resources that were abundant will become scarce and in high demand.
The data centers, which with your permission we will henceforth call the datacenters, become one of the cornerstones of this gigantic and unfinished industrial process currently in its fourth revolution (phase). In the substack written in English by Jamin Ball we have detailed the eleven components necessary for the datacenters to be successful in their mission to accelerate the fourth phase.
"Real Estate" and cement
I would like to add the amount of cement that will be necessary to convert those industrial properties designed to bring computing capacity closer to the points of demand into physical realities.
This last fact, which is intrinsic to the real estate activity that requires the expansion of data centers globally, is less commented, but when you make numbers you will be surprised.
Water and electricity?, at another time.
+ Related content:
🙂
☑️ #3 Nov 8, 2024 🟠 opinion
Intel Corporation is Too Big To Fail
EXISTO Substack: [Original text] NVIDIA Corporation today reaches the "Dow Jones Industrial Average" replacing Intel Corporation.
The identity crisis suffered by Intel Corporation is not an isolated case and roughly resembles other recent ones in the valley of indifference that seem to cross some of the leading industries in the United States and its most prominent corporations.
General Electric that knew how to survive the worst years that its accounting books remember seems to reinvent itself in recent years, -it can't fall-, Boeing that does not finish jumping from one situation to another where damage control is permanently required going from CEO to CEO, United States Steel Corporation (the X in the New York Stock Exchange since 1901 and a true book of history) can fall into the hands of Nippon Steel if the regulatory authorities do not see inconveniences, and now Intel Corporation, are good examples to describe the depth of the valley.
Great organizations created in the culture of innovation and that are so much, the big ones, that it is better not to make them fall. Similar to what we humans do with the great cultural icons, we allow them to advance in the timeline as if they guarded treasures to be transmitted to the next generations. In general, those types of corporate icons only close their doors if their debts are already too unbearable and it is better to cut up and distribute.
There is no room for sentimentality here.
Searching for the causes of the current situation of Intel Corporation takes time of analysis and each financial thesis you read will show you new nuances. The more you advance by wanting to delve into the causes of your financial situation, the more intricate the summary is.
A little about Intel's history
Intel Corporation emerged from a corollary between Mountain View and Palo Alto and whose stock ticker, INTC, was probably the most widespread on Nasdaq until the consecration of companies such as Tesla and Nvidia.
The same year that the Nasdaq electronic market appears, in 1971 Intel Corporation also began to be listed. A multinational company born years earlier in 1968 with the name of NM Electronics Inc. and that can be considered as one of those collateral effects that caused in those times the particular way of directing and managing a business in the manner of William Bradford Shockley, or so narrated by the different sources of the time.
This Californian born in Palto Alto tried to organize his new business Shockley Semiconductors Laboratory in an unconventional way and without knowing it, by establishing his research laboratory in 1955 in Mountain View and counting on the financial support of "Beckman Instruments", it would become an unexpected accelerator of the current "Silicon Valley".
Those were the years of the silicon fever that brought such good decades later to the American West.
But we save that story for later.
🙂
☑️ #2 Oct 31, 2024
Speed, scale and reliability: 25 years of Google data-center networking evolution
cloud.google.com: [Excerpt] Rome wasn’t built in a day, and neither was Google’s network. But 25 years in, we’ve built out network infrastructure with scale and technical sophistication that’s nothing short of remarkable.
It’s all the more impressive because in the beginning, Google’s network infrastructure was relatively simple. But as our user base and the demand for our services grew exponentially, we realized that we needed a network that could handle an unprecedented scale of data and traffic, and that could adapt to dynamic traffic patterns as our workloads changed over time. This ignited a 25-year journey marked by numerous engineering innovations and milestones, ultimately leading to our current fifth-generation Jupiter data center network architecture, which now scales to 13 Petabits/sec of bisectional bandwidth. To put this data rate in perspective, this network could support a video call (@1.5 Mb/s) for all 8 billion people on Earth!
2015 - Jupiter, the first Petabit network.
2022 - Enabling 6 Petabit per second.
2023 - 13 Petabit per second network.
2024 and beyond - Extreme networking in the age of AI.
+ Related content:
u/cletus: This mentions Jupiter generations, which I think is about 10-15 years old at this point. It doesn't really talk about what existed before so it's not really 25 years of history here. I want to say "Watchtower" was before Jupiter? but honestly it's been about a decade since I read anything about it.
Google's DC networking is interesting because of how deeply integrated it is into the entire software stack. Click on some of the links and you'll see it mentions SDN (Software Defined Network). This is so Borg instances can talk to each other within the same service at high throughput and low latency. 8-10 years ago this was (IIRC) 40Gbps connections. It's probably 100Gbps now but that's just a guess.
But the networking is also integrated into global services like traffic management to handle, say, DDoS attacks.
Anyway, from reading this it doesn't sound like Google is abandoning their custom TPU silicon (ie it talks about the upcoming A3 Ultra and Trillium). So where does NVidia ConnectX fit in? AFAICT that's just the NIC they're plugging into Jupiter. That's probably what enables (or will enable) 100Gbps connections between servers. Yes, 100GbE optical NICs have existed for a long time. I would assume that NVidia produce better ones in terms of price, performance, size, power usage and/or heat produced.
Disclaimer: Xoogler. I didn't work in networking though.
u/Cavisne: The past few years there has been a weird situation where Google and AWS have had worse GPU's than smaller providers like Coreweave + Lambda Labs. This is because they didn't want to buy into Nvidias proprietary Infiniband stack for GPU-GPU networking, and instead wanted to make it work on top of their ethernet (but still pretty proprietary) stack.
The outcome was really bad GPU-GPU latency & bandwidth between machines. My understanding is ConnectX is Nvidias supported (and probably still very profitable) way for these hyperscalers to use their proprietary networks without buying Infiniband switches and without paying the latency cost of moving bytes from the GPU to the CPU.
🙂
☑️ #0 Oct 30, 2024
TPUs Trillium de Google acelerando la IA
@googlecloud: To deliver the next frontier of models and enable you to do the same, we’re excited to announce Trillium, our sixth-generation TPU, the most performant and most energy-efficient TPU to date.
More than a decade ago, Google recognized the need for a first-of-its-kind chip for machine learning. In 2013, we began work on the world’s first purpose-built AI accelerator, TPU v1, followed by the first Cloud TPU in 2017. Without TPUs, many of Google’s most popular services — such as real-time voice search, photo object recognition, and interactive language translation, along with the state-of-the-art foundation models such as Gemini, Imagen, and Gemma — would not be possible.
Trillium TPUs achieve an impressive 4.7X increase in peak compute performance per chip compared to TPU v5e. We doubled the High Bandwidth Memory (HBM) capacity and bandwidth, and also doubled the Interchip Interconnect (ICI) bandwidth over TPU v5e. Additionally, Trillium is equipped with third-generation SparseCore, a specialized accelerator for processing ultra-large embeddings common in advanced ranking and recommendation workloads. Trillium TPUs make it possible to train the next wave of foundation models faster and serve those models with reduced latency and lower cost. Critically, our sixth-generation TPUs are also our most sustainable: Trillium TPUs are over 67% more energy-efficient than TPU v5e.
https://cloud.google.com/blog/product...
+ Related content:
cloud.google.com [Excerpt] Powerful infrastructure innovations for your AI-first future. This year we’ve made enhancements throughout the AI Hypercomputer stack to improve performance, ease of use, and cost efficiency for customers, and today at the App Dev & Infrastructure Summit, we’re pleased to announce:
Our sixth-generation TPU, Trillium, is now available in preview
A3 Ultra VMs powered by NVIDIA H200 Tensor Core GPUs will be available in preview next month
Hypercompute Cluster, our new highly scalable clustering system, will be available starting with A3 Ultra VMs
C4A VMs, based on Axion, our custom Arm processors, are now generally available
AI workload-focused enhancements to Jupiter, our data center network, and Titanium, our host offload capability
Hyperdisk ML, our AI/ML-focused block storage service, is generally available
blog.google (10/30/24): [Excerpt] Ask a Techspert: What's the difference between a CPU, GPU and TPU?
What do the different acronyms stand for?
Even though CPUs, GPUs and TPUs are all processors, they're progressively more specialized. CPU stands for Central Processing Unit. These are general-purpose chips that can handle a diverse range of tasks. Similar to your brain, some tasks may take longer if the CPU isn’t specialized in that area.
Then there’s the GPU, or Graphics Processing Unit. GPUs have become the workhorse of accelerated compute tasks, from graphic rendering to AI workloads. They’re what’s known as a type of ASIC, or application-specific integrated circuit. Integrated circuits are generally made using silicon, so you might hear people refer to chips as "silicon” — they’re the same thing (and yes, that’s where the term “Silicon Valley” comes from!). In short, ASICs are designed for a single, specific purpose.
The TPU, or Tensor Processing Unit, is Google’s own ASIC. We designed TPUs from the ground up to run AI-based compute tasks, making them even more specialized than CPUs and GPUs. TPUs have been at the heart of some of Google’s most popular AI services, including Search, YouTube and DeepMind’s large language models.
🙂
🔊 Silicon’s Playlist
Project PLATEAU
The Initiative of Digital Twin in Japan: mlit.go.jp/en/toshi/daisei/plateau_en_2.html