HPC networks are generally much different from the internet or typical local area networks, both in terms of the hardware and the software involved. Software and Hardware layers in HPC typically need to provide more stringent guarantees for data delivery. Routing between nodes in a supercomputer, for example, is usually statically determined.
Topology (how nodes are interconnected) is one of the major performance impacting concerns in the design of HPC networks. They must take into account various factors, including packaging constraings and wire length. Most importantly, they must provide good performance for a wide array of applications (unless designing an application-specific system). That is, a good physical network topology will accomodate many application topologies.
A topology should be chosen such that it has good path diversity, i.e. there is more than one minimal path between any given source and destination.
High-Performance networks are typically statically routed. Adaptive routing is rare because of performance and the introduction of deadlock (and livelock). It is typically only used for fault tolerance
HPC network hardware tends to be custom designed. The top-tier systems even have proprietary interconnects. This includes custom software and hardware. One example is the new Cray Gemini interconnect. System boards have custom ASIC router chips sitting on them, which are connected to the processor with something like PCIe or HyperTransport.
A big point here is radix of the router, i.e. number of I/O ports. Recently, high-radix routers are becoming more popular with technology advancements. Also, how many input/output buffers per port (i.e. virtual channels). What is the interconnect between ports (crossbar, bus, etc.) Are the router stages pipelined? (i.e. Route computation, arbitration, input buffer allocation). On-Chip routers typically forward a flit (flow control digit) in 2 cycles. They also speculate on the route computation (which can cause stalls in the worst case).
Large scale parallel processing usually involves message-passing, since the big machines tend to be designed in the distributed-memory fashion. Most of this message passing is done in libraries, the most prominent of which is, without a doubt, MPI (Message Passing Interface). MPI is just that, an interface. There are many implementations, e.g. OpenMPI, MPICH/2. Recently there has been a push for distributed-shared memory. I.e. something like a shared memory model with message passing going on behind the curtains (UPC, Chapel). The languages that have this model cooked in are often called PGAS languages (Partitioned Global Address Space).
MPI