Bruce Eckel's Thinking in Java Contents | Prev | Next

Datagrams

The examples you’ve seen so far use the Transmission Control Protocol (TCP, also known as stream-based sockets ), which is designed for ultimate reliability and guarantees that the data will get there. It allows retransmission of lost data, it provides multiple paths through different routers in case one goes down, and bytes are delivered in the order they are sent. All this control and reliability comes at a cost: TCP has a high overhead.

There’s a second protocol, called User Datagram Protocol (UDP), which doesn’t guarantee that the packets will be delivered and doesn’t guarantee that they will arrive in the order they were sent. It’s called an “unreliable protocol” (TCP is a “reliable protocol”), which sounds bad, but because it’s much faster it can be useful. There are some applications, such as an audio signal, in which it isn’t so critical if a few packets are dropped here or there but speed is vital. Or consider a time-of-day server, where it really doesn’t matter if one of the messages is lost. Also, some applications might be able to fire off a UDP message to a server and can then assume, if there is no response in a reasonable period of time, that the message was lost.

The support for datagrams in Java has the same feel as its support for TCP sockets, but there are significant differences. With datagrams, you put a DatagramSocket on both the client and server, but there is no analogy to the ServerSocket that waits around for a connection. That’s because there is no “connection,” but instead a datagram just shows up. Another fundamental difference is that with TCP sockets, once you’ve made the connection you don’t need to worry about who’s talking to whom anymore; you just send the data back and forth through conventional streams. However, with datagrams, the datagram packet must know where it came from and where it’s supposed to go. That means you must know these things for each datagram packet that you load up and ship off.

A DatagramSocket sends and receives the packets, and the DatagramPacket contains the information. When you’re receiving a datagram, you need only provide a buffer in which the data will be placed; the information about the Internet address and port number where the information came from will be automatically initialized when the packet arrives through the DatagramSocket. So the constructor for a DatagramPacket to receive datagrams is:

DatagramPacket(buf, buf.length)

in which buf is an array of byte. Since buf is an array, you might wonder why the constructor couldn’t figure out the length of the array on its own. I wondered this, and can only guess that it’s a throwback to C-style programming, in which of course arrays can’t tell you how big they are.

You can reuse a receiving datagram; you don’t have to make a new one each time. Every time you reuse it, the data in the buffer is overwritten.

The maximum size of the buffer is restricted only by the allowable datagram packet size, which limits it to slightly less than 64Kbytes. However, in many applications you’ll want it to be much smaller, certainly when you’re sending data. Your chosen packet size depends on what you need for your particular application.

When you send a datagram, the DatagramPacket must contain not only the data, but also the Internet address and port where it will be sent. So the constructor for an outgoing DatagramPacket is:

DatagramPacket(buf, length, inetAddress, port)

This time, buf (which is a byte array) already contains the data that you want to send out. The length might be the length of buf, but it can also be shorter, indicating that you want to send only that many bytes. The other two arguments are the Internet address where the packet is going and the destination port within that machine. [64]

You might think that the two constructors create two different objects: one for receiving datagrams and one for sending them. Good OO design would suggest that these should be two different classes, rather than one class with different behavior depending on how you construct the object. This is probably true, but fortunately the use of DatagramPackets is simple enough that you’re not tripped up by the problem, as you can see in the following example. This example is similar to the MultiJabberServer and MultiJabberClient example for TCP sockets. Multiple clients will send datagrams to a server, which will echo them back to the same client that sent the message.

To simplify the creation of a DatagramPacket from a String and vice-versa, the example begins with a utility class, Dgram, to do the work for you:

//: Dgram.java
// A utility class to convert back and forth
// Between Strings and DataGramPackets.
import java.net.*;

public class Dgram {
  public static DatagramPacket toDatagram(
    String s, InetAddress destIA, int destPort) {
    // Deprecated in Java 1.1, but it works:
    byte[] buf = new byte[s.length() + 1];
    s.getBytes(0, s.length(), buf, 0);
    // The correct Java 1.1 approach, but it's
    // Broken (it truncates the String):
    // byte[] buf = s.getBytes();
    return new DatagramPacket(buf, buf.length, 
      destIA, destPort);
  }
  public static String toString(DatagramPacket p){
    // The Java 1.0 approach:
    // return new String(p.getData(), 
    //  0, 0, p.getLength());
    // The Java 1.1 approach:
    return 
      new String(p.getData(), 0, p.getLength());
  }
} ///:~ 

The first method of Dgram takes a String, an InetAddress, and a port number and builds a DatagramPacket by copying the contents of the String into a byte buffer and passing the buffer into the DatagramPacket constructor. Notice the “+1” in the buffer allocation – this was necessary to prevent truncation. The getBytes( ) method of String is a special operation that copies the chars of a String into a byte buffer. This method is now deprecated; Java 1.1 has a “better” way to do this but it’s commented out here because it truncates the String. So you’ll get a deprecation message when you compile it under Java 1.1, but the behavior will be correct. (This bug might be fixed by the time you read this.)

The Dgram.toString( ) method shows both the Java 1.0 approach and the Java 1.1 approach (which is different because there’s a new kind of String constructor).

Here is the server for the datagram demonstration:

//: ChatterServer.java
// A server that echoes datagrams
import java.net.*;
import java.io.*;
import java.util.*;

public class ChatterServer {
  static final int INPORT = 1711;
  private byte[] buf = new byte[1000];
  private DatagramPacket dp = 
    new DatagramPacket(buf, buf.length);
  // Can listen & send on the same socket:
  private DatagramSocket socket;

  public ChatterServer() {
    try {
      socket = new DatagramSocket(INPORT);
      System.out.println("Server started");
      while(true) {
        // Block until a datagram appears:
        socket.receive(dp);
        String rcvd = Dgram.toString(dp) +
          ", from address: " + dp.getAddress() +
          ", port: " + dp.getPort();
        System.out.println(rcvd);
        String echoString = 
          "Echoed: " + rcvd;
        // Extract the address and port from the
        // received datagram to find out where to
        // send it back:
        DatagramPacket echo = 
          Dgram.toDatagram(echoString,
            dp.getAddress(), dp.getPort());
        socket.send(echo);
      }
    } catch(SocketException e) {
      System.err.println("Can't open socket");
      System.exit(1);
    } catch(IOException e) {
      System.err.println("Communication error");
      e.printStackTrace();
    }
  }
  public static void main(String[] args) {
    new ChatterServer();
  }
} ///:~ 

The ChatterServer contains a single DatagramSocket for receiving messages, instead of creating one each time you’re ready to receive a new message. The single DatagramSocket can be used repeatedly. This DatagramSocket has a port number because this is the server and the client must have an exact address where it wants to send the datagram. It is given a port number but not an Internet address because it resides on “this” machine so it knows what its Internet address is (in this case, the default localhost). In the infinite while loop, the socket is told to receive( ), whereupon it blocks until a datagram shows up, and then sticks it into our designated receiver, the DatagramPacket dp . The packet is converted to a String along with information about the Internet address and socket where the packet came from. This information is displayed, and then an extra string is added to indicate that it is being echoed back from the server.

Now there’s a bit of a quandary. As you will see, there are potentially many different Internet addresses and port numbers that the messages might come from – that is, the clients can reside on any machine. (In this demonstration they all reside on the localhost, but the port number for each client is different.) To send a message back to the client that originated it, you need to know that client’s Internet address and port number. Fortunately, this information is conveniently packaged inside the DatagramPacket that sent the message, so all you have to do is pull it out using getAddress( ) and getPort( ), which are used to build the DatagramPacket echo that is sent back through the same socket that’s doing the receiving. In addition, when the socket sends the datagram, it automatically adds the Internet address and port information of this machine, so that when the client receives the message, it can use getAddress( ) and getPort( ) to find out where the datagram came from. In fact, the only time that getAddress( ) and getPort( ) don’t tell you where the datagram came from is if you create a datagram to send and you call getAddress( ) and getPort( ) before you send the datagram (in which case it tells the address and port of this machine, the one the datagram is being sent from). This is an essential part of datagrams: you don’t need to keep track of where a message came from because it’s always stored inside the datagram. In fact, the most reliable way to program is if you don’t try to keep track, but instead always extract the address and port from the datagram in question (as is done here).

To test this server, here’s a program that makes a number of clients, all of which fire datagram packets to the server and wait for the server to echo them back.

//: ChatterClient.java
// Tests the ChatterServer by starting multiple 
// clients, each of which sends datagrams.
import java.lang.Thread;
import java.net.*;
import java.io.*;

public class ChatterClient extends Thread {
  // Can listen & send on the same socket:
  private DatagramSocket s;
  private InetAddress hostAddress;
  private byte[] buf = new byte[1000];
  private DatagramPacket dp = 
    new DatagramPacket(buf, buf.length);
  private int id;

  public ChatterClient(int identifier) {
    id = identifier;
    try {
      // Auto-assign port number:
      s = new DatagramSocket();
      hostAddress = 
        InetAddress.getByName("localhost");
    } catch(UnknownHostException e) {
      System.err.println("Cannot find host");
      System.exit(1);
    } catch(SocketException e) {
      System.err.println("Can't open socket");
      e.printStackTrace();
      System.exit(1);
    } 
    System.out.println("ChatterClient starting");
  }
  public void run() {
    try {
      for(int i = 0; i < 25; i++) {
        String outMessage = "Client #" +
          id + ", message #" + i;
        // Make and send a datagram:
        s.send(Dgram.toDatagram(outMessage,
          hostAddress, 
          ChatterServer.INPORT));
        // Block until it echoes back:
        s.receive(dp);
        // Print out the echoed contents:
        String rcvd = "Client #" + id +
          ", rcvd from " + 
          dp.getAddress() + ", " + 
          dp.getPort() + ": " +
          Dgram.toString(dp);
        System.out.println(rcvd);
      }
    } catch(IOException e) {
      e.printStackTrace();
      System.exit(1);
    }
  }
  public static void main(String[] args) {
    for(int i = 0; i < 10; i++)
      new ChatterClient(i).start();
  }
} ///:~ 

ChatterClient is created as a Thread so that multiple clients can be made to bother the server. Here you can see that the receiving DatagramPacket looks just like the one used for ChatterServer. In the constructor, the DatagramSocket is created with no arguments since it doesn’t need to advertise itself as being at a particular port number. The Internet address used for this socket will be “this machine” (for the example, localhost) and the port number will be automatically assigned, as you will see from the output. This DatagramSocket, like the one for the server, will be used both for sending and receiving.

The hostAddress is the Internet address of the host machine you want to talk to. The one part of the program in which you must know an exact Internet address and port number is the part in which you make the outgoing DatagramPacket. As is always the case, the host must be at a known address and port number so that clients can originate conversations with the host.

Each thread is given a unique identification number (although the port number automatically assigned to the thread would also provide a unique identifier). In run( ), a message String is created that contains the thread’s identification number and the message number this thread is currently sending. This String is used to create a datagram that is sent to the host at its address; the port number is taken directly from a constant in ChatterServer. Once the message is sent, receive( ) blocks until the server replies with an echoing message. All of the information that’s shipped around with the message allows you to see that what comes back to this particular thread is derived from the message that originated from it. In this example, even though UDP is an “unreliable” protocol, you’ll see that all of the datagrams get where they’re supposed to. (This will be true for localhost and LAN situations, but you might begin to see some failures for non-local connections.)

When you run this program, you’ll see that each of the threads finishes, which means that each of the datagram packets sent to the server is turned around and echoed to the correct recipient; otherwise one or more threads would hang, blocking until their input shows up.

You might think that the only right way to, for example, transfer a file from one machine to another is through TCP sockets, since they’re “reliable.” However, because of the speed of datagrams they can actually be a better solution. You simply break the file up into packets and number each packet. The receiving machine takes the packets and reassembles them; a “header packet” tells the machine how many to expect and any other important information. If a packet is lost, the receiving machine sends a datagram back telling the sender to retransmit.


[64] TCP and UDP ports are considered unique. That is, you can simultaneously run a TCP and UDP server on port 8080 without interference.

Contents | Prev | Next