VoIP bandwidth fundamentals

News

VoIP bandwidth fundamentals

SearchVoIP ANZ Staff

Bandwidth requirements for Voice over IP can be a tricky beast to tame until you look at the method and factors involved. This guide investigates what bandwidth means for VoIP, how to calculate bandwidth consumption for a VoIP network and how bandwidth can be saved by using voice compression.

What about bandwidth for VoIP?

Voice over IP (VoIP) is the descriptor for the technology used to carry digitised voice over an IP data network. VoIP requires two classes of protocols: a signalling protocol such as SIP, H.323 or MGCP that is used to set up, disconnect and control the calls and telephony features; and a protocol to carry speech packets. The Real-Time Transport Protocol (RTP) carries speech transmission. RTP is an IETF standard introduced in 1995 when H.323 was standardised. RTP will work with any signalling protocol. It is the commonly used protocol among IP PBX vendors.

An IP phone or softphone generates a voice packet every 10, 20, 30 or 40ms, depending on the vendor's implementation. The 10 to 40ms of digitised speech can be uncompressed, compressed and even encrypted. This does not matter to the RTP protocol. As you have already figured out, it takes many packets to carry one word.

The shorter the packet, the shorter the delay

End-to-end (phone-to-phone) delay needs to be limited. The shorter the packet creation delay, the more network delay the VoIP call can tolerate. Shorter packets

    Requires Free Membership to View

    By submitting your registration information to SearchCIO.com.au you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchCIO.com.au is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

cause less of a problem if the packet is lost. Short packets require more bandwidth, however, because of increased packet overhead (this is discussed below). Longer packets that contain more speech bytes reduce the bandwidth requirements but produce a longer construction delay and are harder to fix if lost. Many vendors have chosen 20 or 30ms size packets.

RTP packet format

The RTP header field contains the digitised speech sample (20 or 30ms of a word) time stamp and sequence number and identifies the content of each voice packet. The content descriptor defines the compression technique (if there is one) used in the packet. The RTP packet format for VoIP over Ethernet is shown below.

Ethernet Trailer Digitised Header UDP Header IP Header Ethernet Header


 

RTP can be carried on frame relay, ATM, PPP and other networks with only the far right header and left trailer varying by protocol. The digitised voice field, RTP, UDP and IP headers remain the same.

Each of these packets will contain part of a digitised spoken word. The packet rate is 50 packets per second for 20ms and 33.3 packets per second for 30ms voice samples. The voice packets are transmitted at these fixed rates. The digitised voice field can contain as few as 10 bytes of compressed voice or as many as 320 bytes of uncompressed voice.

The UDP header carries the sending and receiving port numbers for the call. The IP header carries the sending and receiving IP addresses for the call plus other control information. The Ethernet header carries the LAN MAC addresses of the sending and receiving devices. The Ethernet trailer is used for error detection purposes. The Ethernet header is replaced with a frame relay, ATM or PPP header and trailer when the packet enters a WAN.

'Shipping and handling'

In reality, there is no Voice over IP. It is really voice over RTP, over UDP, over IP and usually over Ethernet. The headers and trailers are required fields for the networks to carry the packets. The header and trailer overhead can be called the shipping and handling cost.

The RTP plus UDP plus IP headers will add on 40 bytes. The Ethernet header and trailer account for another 18 bytes of overhead, for a total of at least 58 bytes of overhead before there are any voice bytes in the packet. These headers, plus the Ethernet header, produce the overhead for shipping the packets. This overhead can range from 20% to 80% of the bandwidth consumed over the LAN and WAN. Many implementations of RTP have no encryption, or the vendor has provided its own encryption facilities. An IP PBX vendor may offer a standardised secure version of RTP (SRTP).

Shorter packets have higher overhead. There are 54 bytes of overhead carrying the voice bytes. As the size of the voice field gets larger with longer packets, the percentage of overhead decreases -- therefore the needed bandwidth decreases. In other words, bigger packets are more efficient than smaller packets.

Header compression

Cisco has created a header compression technique that is now the standard called RTP header compression. This technique actually compresses the RTP, UDP and IP headers and significantly reduces the RTP, UDP and IP overhead from 40 bytes to between 4 and 6 bytes. The bandwidth consumption for compressed voice packets can be reduced by nearly 60%. This technique has less value for large uncompressed voice packets. The header compression technique is not recommended for the LAN implementations because there is typically more than enough bandwidth for voice calls. The header compression technique should be considered for the WAN implementations, where bandwidth is limited and much more expensive.

Calculating bandwidth consumption for VoIP

The bandwidth needed for VoIP transmission will depend on a few factors: the compression technology, packet overhead, network protocol used and whether silence suppression is used. This tip investigates the first three considerations. Silence suppression will be covered in a later tip.

There are two primary strategies for improving IP network performance for voice: Allocate more VoIP bandwidth (reduce utilisation) or implement QoS.

How much bandwidth to allocate depends on:

  • Packet size for voice (10 to 320 bytes of digital voice)
  • CODEC and compression technique (G.711, G.729, G.723.1, G.722, proprietary)
  • Header compression (RTP + UDP + IP), which is optional
  • Layer 2 protocols, such as point-to-point protocol (PPP), Frame Relay and Ethernet
  • Silence suppression/voice activity detection

Calculating the bandwidth for a VoIP call is not difficult once you know the method and the factors to include. The chart below, "Calculating one-way voice bandwidth," demonstrates the overhead calculation for 20 and 40 byte compressed voice (G.729) being transmitted over a Frame Relay WAN connection. Twenty bytes of G.729 compressed voice is equal to 20 ms of a word. Forty bytes of G.729 compressed voice is equal to 40 ms of a word.

The results of this method of calculation are contained in the next table, "Packet voice transmission requirements." The table demonstrates these points:

  • Bandwidth requirements reduce with compression, G.711 vs. G.729.
  • Bandwidth requirements reduce when longer packets are used, thereby reducing overhead.
  • Even though the voice compression is an 8 to 1 ratio, the bandwidth reduction is about 3 or 4 to 1. The overhead negates some of the voice compression bandwidth savings.
  • Compressing the RTP, UDP and IP headers (cRTP) is most valuable when the packet also carries compressed voice.
Packet voice transmission requirements
(Bits per second per voice channel)
Codec Voice bit rate Sample time Voice payload Packets per second Ethernet
PPP or Frame Relay
RTP cRTP
G.711 64 Kbps 20 msec 160 bytes 50 87.2 Kbps 82.4 Kbps 68.0 Kbps
G.711 64 Kbps 30 msec 240 bytes 33.3 79.4 Kbps 76.2 Kbps 66.6 Kbps
G.711 64 Kbps 40 msec 320 bytes 25 75.6 Kbps 73.2 Kbps 66.0 Kbps
G.729A 8 Kbps 20 msec 20 bytes 50 31.2 Kbps 26.4 Kbps 12.0 Kbps
G.729A 8 Kbps 30 msec 30 bytes 33.3 23.4 Kbps 20.2 Kbps 10.7 Kbps
G.729A 8 Kbps 40 msec 40 bytes 25 19.6 Kbps 17.2 Kbps 10.0 Kbps
Note: RTP assumes 40-octets RTP/UDP/IP overhead per packet
Compressed RTP (cRTP) assumes 4-octets RTP/UDP/IP overhead per packet
Ethernet overhead adds 18-octets per packet
PPP/Frame Relay overhead adds 6-octets per packet


 

This table provided courtesy of Michael Finneran.

The varying designs of packet size, voice compression choice and header compression make it difficult to determine the bandwidth to calculate for a continuous speech voice call. The IP PBX or IP phone vendor should be able to provide tables like the one above for their products. Many vendors have selected 30 ms for the payload size of their VoIP implementations. A good rule of thumb is to reserve 24 Kbps of IP network bandwidth per call for 8 Kbps (G.729-like) compressed voice. If G.711 is used, then reserve 80 Kbps of bandwidth.

If silence suppression/voice activity detection is used, the bandwidth consumption may drop 50% -- to 8 Kbps total per VoIP call. But the assumption that everyone will alternate between voice and silence without conflicting with each other is not always realistic. Silence suppression will be discussed in a later tip.

Most enterprise designers do not perform these calculations. The vendor provides the necessary information. The designer does have some freedom, such as selecting the compression technique for voice payloads and headers, and may be able to vary the packet size.

How can voice compression save bandwidth?
The Public Switched Telephone Network (PSTN) started with the transmission of analog speech. This worked well for decades until the areas under city streets became saturated with copper cables, one copper pair per call. Starting in the 1950s, AT&T Bell Labs developed a technique to carry more voice calls over copper wire. They developed digitised voice technology through which 24 digital calls can be carried on two pairs of copper wire, thereby increasing the carrying capacity of the cables twelvefold. The voice is digitised into streams of 64,000 bps per call. The technology is called a T1 circuit and the bandwidth for the 24 calls is 1.544 Mbps. This worked well for domestic connections. The T1 technology then became the mechanism for long-distance domestic transmission.

Most of the early voice compression technologies were designed for undersea cables, where bandwidth was limited and expensive. Voice