Voice over IP (VoIP) is the descriptor for the technology used to carry digitised voice over an IP data network. VoIP requires two classes of protocols: a signaling protocol such as SIP, H.323 or MGCP that is used to set up, disconnect and control the calls and telephony features; and a protocol to carry speech packets. The Real-Time Transport Protocol (RTP) carries speech transmission. RTP is an IETF standard introduced in 1995 when H.323 was standardised. RTP will work with any signaling protocol. It is the commonly used protocol among IP PBX vendors.
An IP phone or softphone generates a voice packet every 10, 20, 30 or 40ms, depending on the vendor's implementation. The 10 to 40ms of digitised speech can be uncompressed, compressed and even encrypted. This does not matter to the RTP protocol. As you have already figured out, it takes many packets to carry one word.
The shorter the packet, the shorter the delay
End-to-end (phone-to-phone) delay needs to be limited. The shorter the packet creation delay, the more network delay the VoIP call can tolerate. Shorter packets cause less of a problem if the packet is lost. Short packets require more bandwidth, however, because of increased packet overhead (this is discussed below). Longer packets that contain more speech bytes reduce the bandwidth requirements but produce a longer construction delay and are harder to fix if lost. Many vendors have chosen 20 or 30ms size packets.
RTP packet format
The RTP header field contains the digitised speech sample (20 or 30ms of a word) time stamp and sequence number and identifies the content of each voice packet. The content descriptor defines the compression technique (if there is one) used in the packet. The RTP packet format for VoIP over Ethernet is shown below.
RTP can be carried on frame relay, ATM, PPP and other networks with only the far right header and left trailer varying by protocol. The digitised voice field, RTP, UDP and IP headers remain the same.
Each of these packets will contain part of a digitised spoken word. The packet rate is 50 packets per second for 20ms and 33.3 packets per second for 30ms voice samples.The voice packets are transmitted at these fixed rates. The digitised voice field can contain as few as 10 bytes of compressed voice or as many as 320 bytes of uncompressed voice.
The UDP header carries the sending and receiving port numbers for the call. The IP header carries the sending and receiving IP addresses for the call plus other control information. The Ethernet header carries the LAN MAC addresses of the sending and receiving devices. The Ethernet trailer is used for error detection purposes. The Ethernet header is replaced with a frame relay, ATM or PPP header and trailer when the packet enters a WAN.
'Shipping and handling'
In reality, there is no Voice over IP. It is really voice over RTP, over UDP, over IP and usually over Ethernet. The headers and trailers are required fields for the networks to carry the packets. The header and trailer overhead can be called the shipping and handling cost.
The RTP plus UDP plus IP headers will add on 40 bytes. The Ethernet header and trailer account for another 18 bytes of overhead, for a total of at least 58 bytes of overhead before there are any voice bytes in the packet. These headers, plus the Ethernet header, produce the overhead for shipping the packets. This overhead can range from 20% to 80% of the bandwidth consumed over the LAN and WAN. Many implementations of RTP have no encryption, or the vendor has provided its own encryption facilities. An IP PBX vendor may offer a standardised secure version of RTP (SRTP).
Shorter packets have higher overhead. There are 54 bytes of overhead carrying the voice bytes. As the size of the voice field gets larger with longer packets, the percentage of overhead decreases -- therefore the needed bandwidth decreases. In other words, bigger packets are more efficient than smaller packets.
Cisco has created a header compression technique that is now the standard called RTP header compression. This technique actually compresses the RTP, UDP and IP headers and significantly reduces the RTP, UDP and IP overhead from 40 bytes to between 4 and 6 bytes. The bandwidth consumption for compressed voice packets can be reduced by nearly 60%. This technique has less value for large uncompressed voice packets. The header compression technique is not recommended for the LAN implementations because there is typically more than enough bandwidth for voice calls. The header compression technique should be considered for the WAN implementations, where bandwidth is limited and much more expensive.
How voice compression technology works and how it reduces bandwidth requirements will be discussed in the next tip: "How voice compression saves bandwidth."
About the author: Gary Audin has more than 40 years of computer, communications and security experience. He has planned, designed, specified, implemented and operated data, LAN and telephone networks. These included local area, national and international networks, as well as VoIP and IP convergent networks in the U.S., Canada, Europe, Australia and Asia.
This was first published in June 2007