Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card
Delivers industry-leading performance efficiency and enables 700B-parameter models on a single PCIe card — without GPU clusters or intensive cooling
HSINCHU and NEW YORK and SAN FRANCISCO, April 23, 2026 /PRNewswire/ -- Ahead of COMPUTEX 2026, Skymizer Taiwan Inc., a pioneer in AI inference solutions, today previewed a major advancement in on-premise AI deployment with its HTX301 inference chip, which integrates HyperThought™ — a software/hardware co-design platform first introduced at COMPUTEX 2025. The HTX301 is the first reference chip of the HyperThought IP, which defines a long-horizon architecture for AI inference. This first silicon delivers superior performance efficiency while dramatically simplifying the infrastructure required for ultra-large model inference.
Dismantling the GPU Monopoly on Ultra-Large Model Inference
Deploying ultra-large models on-premise has historically required massive GPU clusters, high-speed interconnects like NVLink/NVSwitch, and intensive cooling systems — resulting in prohibitive cost and operational complexity.
For the first time in the industry, Skymizer is making this possible.
With a single PCIe card — powered by six HTX301 chips and 384 GB of memory — enterprises can now run 700B-parameter model inference locally at just ~240W per card.
HyperThought is architected for flexible scaling across different form factors — packaged as an SoC or card, from edge to mini data center. Scaling from 1 chip to 6 chips on a single card, with memory capacity ranging from 32GB to 384 GB, HyperThought serves models from 4B to 700B parameters — letting enterprises right-size their deployment to actual workload requirements without over-provisioning.
"Inference has become the dominant AI workload, and infrastructure needs to reflect that reality."
"The era of needing superscalar GPU clusters for ultra-large LLMs is over. HyperThought shifts AI from hyperscaler-only complexity to single-card simplicity for every enterprise."
— William Wei, Chief Marketing Officer, Skymizer
Eliminating the Hidden Tax on Enterprise AI
The result: enterprises gain data privacy, low latency, and full operational control — without the infrastructure burden of GPU clusters.
On-prem inference also eliminates the per-token spending anxiety that has become the silent tax on enterprise AI adoption. Cloud-based inference forces teams to ration queries and throttle agents. HyperThought removes that constraint: once deployed, enterprises run unlimited inference at a fixed infrastructure cost.
HyperThought complements existing GPU infrastructure rather than replacing it. By offloading decode-heavy inference from GPUs, enterprises improve overall cluster utilization and power efficiency.
Powering Agentic AI Workflows Across the Enterprise
HyperThought and the HTX301 are designed for the agentic AI workloads that are rapidly becoming the backbone of enterprise automation. Combined with agent harness frameworks such as OpenClaw — HTX301 delivers the inference throughput these systems demand, with full data sovereignty and deterministic latency.
This enables agentic workflow and automation across industries and sectors, including:
Financial services (compliance, fraud detection, portfolio reasoning); healthcare & life sciences (clinical decision support, drug interaction analysis); manufacturing (predictive maintenance, quality inspection); legal & professional services (contract review, confidential knowledge retrieval); government & defense (sovereign AI, classified analysis); retail (service automation, inventory reasoning); software engineering (private code copilots, autonomous CI/CD); and semiconductor & IC design (on-prem RTL copilots, verification agents, design-knowledge retrieval over proprietary IP).
Spotlight: On-Prem AI Coding. AI-assisted coding is already table stakes for modern software teams, and demand is accelerating fastest in domains where source code is the company's crown jewel. IC design houses cannot send proprietary RTL to cloud-based assistants without risking exposure of multi-billion-dollar silicon IP; software companies face the same calculus with confidential codebases and customer data. HTX301 delivers the throughput needed to run private code copilots, RTL generators, and verification agents entirely on-premise — eliminating cloud-exposure risk while preserving the full productivity gains of AI-assisted engineering.
Beyond agentic workloads, a single HTX301 chip supports on-device inference — transcription, translation, visual understanding, and multimodal AI — across edge servers, AI workstations, smart NAS systems, and intelligent endpoints.
Powered by LISA™ and HyperThought™
HyperThought is powered by LISA™ (Language Instruction Set Architecture), Skymizer's proprietary, language-centric ISA optimized for transformer inference. LISA drives performance, power efficiency, and scalability from edge devices to enterprise clusters.
The on-prem HTX301 card shares the same LISA architectural foundation as HyperThought's on-device LPU — one ISA, one deployment workflow, edge to data center.
Prefill/Decode Disaggregation: The HyperThought P/D Strategy
LLM inference consists of two fundamentally different phases: prefill (processing the input prompt, compute-bound) and decode (generating tokens one at a time, memory-bandwidth-bound). GPU-centric infrastructure forces both onto the same silicon, stranding either compute or bandwidth at any given moment. HyperThought disaggregates these phases by design.
Hardware Stack — Decode-First Silicon. The HTX301 is purpose-built for decode — the memory-bandwidth-intensive token generation that dominates real-world inference latency. Existing GPUs handle compute-dense prefill; HTX301 cards handle decode. Each silicon matched to its phase.
Software Stack — Unified P/D Orchestration. Skymizer's unified software stack — KV-cache manager, phase-aware scheduler, and dynamic placement engine — orchestrates prefill and decode pools, carrying KV-cache state across nodes and rebalancing P:D ratios in real time as workloads shift.
"Purpose-built decode hardware paired with an intelligent software stack that orchestrates every inference workload — that's how you disaggregate P/D at scale."
— Luba Tang, Chief Technology Officer, Skymizer
Defining the Next Era of AI Deployment
As models surge from billions to trillions of parameters, the industry's dependence on brute-force GPU scaling is hitting a wall. Skymizer is built to move past it — combining deep compiler expertise with decode-optimized silicon to define the next era of AI infrastructure.
Details on HyperThought's extended platform roadmap will be shared at Skymizer's press conference at COMPUTEX 2026.
Request early access to HTX301: skymizer.ai/press
About Skymizer Taiwan Inc.
Founded in 2013, Skymizer is an AI inference company. Its flagship HyperThought platform pairs a compiler-driven software stack with transformer-optimized hardware to deliver high-efficiency inference across on-device, edge, and on-prem environments.
SOURCE Skymizer Taiwan Inc.
Share this article