Simple Matter of Software (SMOS)? The Technology and the Business of Integrating Servers – and other systems – from Standard Components, presented February 2, 2002 at SAN-1 Workshop at the 8th IEEE International Conference on High Performance Computer Architecture held in Cambridge, MA.
Trends in System Area Networking, presented February 12, 2002 at IEEE International Conference on Network Computing Applications held in Cambridge, MA.
Apsara: The Quest for the Perfect Server for Network Computing Applications, presented April 17, 2003 at IEEE International Conference on Network Computing Applications held in Cambridge, MA.
Servers from Heaven,
presented first at University of California, Riverside, on May 20, 2002.
(Expanded version presented to NonStop Division management on May 29, 2002.
Co-presented with Sam Fineberg at an HP Labs seminar on August 15, 2002.) Latest
presentation at Ohio State University on October 11, 2002.
Abstract:
This talk begins with an analysis
of server-side architecture trends for processors, memory, I/O (storage &
networking), compilers, language run-time systems and Operating Systems. We will
spend some time characterizing the workload of future -- web service --
applications. The talk will conclude with identification of key technical
problems, opportunities and early solutions. In the process, I will outline the
Apsara architecture for shared programmable infrastructure. Apsara makes several
departures from traditional methods of server design: 1. It
extends the concept of System Area Networking by making network components --
interfaces, switches and routers -- more "system like", as well as by managing
and scaling the system in a more "network like" fashion. 2. It extends the
concept of a Programmable Computer by opening up more of the system to
application programs. Numerous examples of this will be shown. 3. It eliminates
the distinction between memory and I/O. Three key techniques used to accomplish
this are (3a.) the replacement of the conventional driver-adapter model of I/O
with an "I/O as a service" model best exemplified by iSCSI; (3b) the adoption of
RDMA-enabled networking; and (3c) strong use of "memory-semantic" communications
across the board. Several examples of this will be shown as well. Along the way,
I will argue why the Apsara architecture makes business sense, I will project
possible cost and performance benefits vis a
vis conventional architecture, and I will
analyze its technological feasibility.
Accommodating Availability and Scalability in the Design of Low-Latency Switched Interconnects: Powerful New Topologies from Multi-Fabric Design (MFD)
(Expanded version presented at HP Labs Computer Systems Colloquium on December 11, 2002. Early work presented at Illinois Computer Affiliates Program (ICAP) meeting, Urbana, IL, April 1999 and at Compaq Western Research Lab, July 1998.)
EXTENDED ABSTRACT:
A brand new class of interconnection networks is described. The topologies resulting from this method exhibit some of the best latency parameters for all topologies at a given network size. As latency-critical Tier 3 applications move to the UDC, there is a potential for using some of these ideas to design low-latency interconnects without sacrificing either scale or availability.
The core concept of MFD is to create bushiness at the periphery of a network through the use of multiple network interfaces at end nodes. The basic network-design methodology is summarized in a few steps below. I will illustrate MFD through certain generic multi-fabric topologies, as well as two very interesting special cases: i.) two asymmetric fabrics; and
ii.) crossbar-only interconnects.
MFD in 4 steps:
Step 1. The starting point is a combinatorial design, generally a BIBD (Balanced Incomplete Block Design) -- 2-(v,b,r,k,lambda) -- where small values of r are preferred. (v items grouped into b blocks of size k such that each item is in exactly r blocks and each set of 2 items, i.e. each pair, appears together in at least lambda groups.)
Step 2. (optional) Partitioning the logical design of Step 1, if it is a partitionable BIBD. Graph-theoretic techniques are used when b=2; combinatorial techniques, when b>2.
Step 3. Each mathematical "item" from the previous steps is mapped into a "class." A class may either be a singleton computer node or may have internal structure. If latter, the "class switches" may be shared between the different fabrics that the class connects into. Classes may also be assembled from disjoint subclasses, interconnectivity between which is deferred until Step 4. Recursive application of MFD is optional.
Step 4. The "blocks" from Steps 1 and 2 -- a.k.a. logical fabrics -- are mapped into physical fabrics. Each fabric is therefore partial, in that not all the nodes of the topology are reachable through it. A fabric may be as simple as either a single link between a pair of classes or a singleton switch that connects all of the links that need to be connected. Generally, it is a network, possibly designed through recursive application of MFD.
Notes: If class sharing is used in Step 3, then the resulting topology will have fewer physical fabrics than logical ones. When there are only two physical fabrics but b>2, the special case of asymmetric fabrics occurs. Asymmetric fabrics are superior to the two identical fabrics used in NonStop systems today. Otherwise, when classes are implemented using singleton nodes in Step 3, and when singleton crossbar switches are used to realize physical fabrics in Step 4, the special case of crossbar-only interconnects (COIs) occurs. COI topologies uniquely extend the size of the largest system in which every pair of nodes is interconnected via a single crossbar switch.
Instruction mix, Cycle breakdowns, Cache and TLB miss rates for a B-Tree Application on Itanium 2 + preliminary Java results for SpecJBB on gcj for Linux compared-and-contrasted with corresponding numbers from the record-breaking SpecJBB on HotSpot for HP-UX, co-presented (with Rahul Nim) to the architects and developers of Yosemite, the IA-64 based NonStop server platform, November 14, 2002.
StarBurst 3 Router Programming, presented to the architects of the StarBurst 3 switch, March 2002.
Impact of Large-Memory IA-64 Servers on Enterprise Storage Strategy, presented at the roadmap planning meeting between ESS and ISS HA teams, Houston, December 2001.
An InfiniBand Management Framework for Windows, presented at ISS Group’s internal IB Management Workshop, Houston, June 2001.
A Product Vision Encompassing InfiniBand Switches, Routers and LAN Bridges or The Slippery Slope of Switching, presented at ISS Group’s internal IB Management Workshop, Houston, January 2001.
InfiniBand Application Working Group Session (Introduction to Services; Networked Services Framework; Console Service Protocol), presented and chaired the session at the InfiniBand Developers Conference, Las Vegas, NV, October 2000.
The Architecture of Commodity Clusters: Looking Beyond Beowulf, presented at the National Aerodynamics Simulation (NAS) Facility, NASA Ames Research Center, Mountain View, CA, March 1999.
[Pre-1998 work not listed.]