Real Time Protocol (RFC 3550) OSI stack 7-layer OSI stack

BL Study Plan2110 Topo

What you will learn on this page


This lesson explains Real Time Protocol and how it fits into the SMPTE ST 2110 stack.

Why isn't UDP enough?   • RTP Packets   • Secure RTP   • Maximum Transmission Unit   • Mapping MPEG Transport Streams to RTP  


RTP (Real-time Transport Protocol) is one of the most fundamental elements in modern IP-based media systems, including SMPTE ST 2110, VoIP, streaming, and conferencing. It’s defined in RFC 3550 (originally published by the IETF in 2003) and standardized as part of the Internet protocol suite.

SMPTE Standard
pebble whitepaper

RTP provides end-to-end transport for real-time data, such as:

It is not responsible for guaranteed delivery. That’s intentional. Instead, it focuses on:

It’s usually paired with UDP as its underlying transport because UDP allows continuous packet flow without retransmission delays.

Why isn't UDP enough?

Bottom Line   Short answer: UDP moves packets; RTP makes media usable.
UDP deliberately lacks almost everything real-time media needs.

What UDP provides (and why it’s not enough)
UDP gives you only:

UDP does NOT provide:

For real-time media, that’s a problem.

What UDP is missing — and how RTP fixes it

Capability UDP RTP
Transport ✅ (over UDP)
Ordering
Timing
Loss detection
Payload type
Stream ID
A/V sync ✅ (via RTCP)
Media awareness

Why not just “add this in the application”?
Bottom Line   Because:
 • Everyone would do it differently
 • Interoperability would collapse
 • Monitoring and tooling would be impossible

RTP is a standardized media contract on top of UDP.

UDP delivers packets; RTP delivers time-based media.

While RTP (Real-time Transport Protocol) is found at the OSI Layer 5 – Session layer, it is most accurately described as operating between the Transport layer (Layer 4) and the Application layer (Layer 7) in practice.

RTP resides in Layer 5 (Session)?
RTP provides session management features for real-time media streams:

These functions align with the OSI Session layer's responsibilities: establishing, managing, and terminating sessions/dialogues between applications.

Bottom Line   The actual ST2110 payload resides in layer 6 (Presentation), so, RTP is the “shim” ST 2110 uses to make UDP packets behave like real-time media.

Quick Summary
• OSI model (academic/theoretical) → Layer 5 (Session)
• TCP/IP model (real-world/practical) → Application layer (above UDP)
• Most precise modern answer → Layer 5, because RTP manages real-time media sessions rather than being a pure end-user application like HTTP or SMTP.

So if you're studying for networking certifications (CCNA, CompTIA Network+, etc.) or discussing OSI layers strictly, say: RTP operates at Layer 5 (Session layer).


RTP Packets


Packetization: Encapsulates media (like video frames or audio samples) into packets that fit within network MTUs.

Version (V) 2 bits
Indicates the RTP version. Currently version 2 is used everywhere.

Padding (P) 1 bit
If set, extra padding bytes are added at the end of the packet (useful for alignment).

Extension (X) 1 bit
Indicates the presence of an optional extension header that follows the CSRC list (for extra metadata).

CSRC Count (CC) 4 bits
Specifies how many Contributing Source (CSRC) identifiers are included. Usually 0 unless mixing multiple sources (like in conferencing).

Marker (M) 1 bit
A flag that can mark special events such as the start of a video frame or the end of an audio talk-spurt.

Payload Type (PT) 7 bits
Identifies the format of the media payload (e.g., PCM audio, H.264, JPEG-XS). Both sender and receiver must agree on this mapping.

Sequence Number 16 bits
Increments by 1 for each RTP packet sent. Used by the receiver to detect packet loss or out-of-order arrival.

Timestamp 32 bits
Represents the sampling instant of the first byte in the payload. Used to synchronize playback and align media streams (e.g., audio/video lip-sync).

How Timestamps and Sequence Numbers Work Together
Sequence Number: increments per packet (packet order integrity).
Timestamp: increases by the number of samples or frame duration (playback timing).

Example for audio: Sequence: 1001, 1002, 1003 → identifies packet order.
Timestamp: 48000, 48160, 48320 → aligns with the 48 kHz sample clock.

PTP (IEEE 1588) provides the timebase so timestamps are accurate and synchronized.

Synchronization source (SSRC): A unique ID per stream, ensuring receivers can distinguish between multiple media sources.

Contributing source (CSRC): Used when a stream is mixed (e.g., multiple talkers in a conference).

Minimal control overhead: Only 12 bytes of base header — very efficient for high-rate video/audio.

SSRC (Synchronization Source) 32 bits
Unique ID chosen by the sender to identify the stream. Receivers use it to distinguish multiple concurrent RTP sources.

CSRC List (Optional) 0–15 entries, 32 bits each
Lists contributing sources if the payload is a mix (e.g., an audio mixer combining multiple microphones).

RTP Header Extension
Where it lives: Immediately after the fixed 12-byte RTP header — but only if the X bit (Extension) in the header is set to 1.

Purpose: Used to carry optional metadata that doesn’t fit into the standard header, Such as:

Profile-specific ID: Identifies the meaning/format of the extension (e.g., SMPTE 2110-21 uses 0xABAC).
Length: Number of 32-bit words in the extension data.
Extension Data: The custom information itself.

In 2110 systems, the extension is often used for flow timing metadata or ancillary synchronization.

Structure:

Header end


Payload Variable length (up to MTU limit)
The actual media data — such as video frame segments, audio samples, or ancillary data. The type and interpretation are defined by the Payload Type (PT) field.

RTP Padding (P bit)
Where it lives: At the end of the packet (payload section).

Purpose: Used when the payload must be padded to a specific byte boundary. For example:

How it works:

Example:
[Payload Data][xx][xx][xx][03]
The final byte (03) means there are 3 padding bytes in total.

RTP Control Protocol (RTCP)
RFC 3550 also defines a companion protocol: RTCP (RTP Control Protocol).

How RTP Works in Real Time

  1. Sender:
    1. Captures video/audio samples.
    2. Encodes them (e.g., H.264, PCM).
    3. Splits them into RTP packets with sequence numbers and timestamps.
    4. Sends over UDP to a multicast or unicast destination.
  2. Network:
    1. Switches forward packets based on multicast groups (e.g., 239.x.x.x).
    2. Some may be duplicated for redundancy (ST 2022-7).
  3. Receiver:
    1. Receives RTP packets.
    2. Uses sequence numbers to reorder, detect loss.
    3. Uses timestamps to rebuild timing and clock synchronization.
    4. Decodes the payload and plays it out.

RTP Is Transport, Not Reliability
RTP does not:

This makes it perfect for live, real-time content, where a missing frame is better than a delayed one.

SMPTE ST 2110 uses RTP as the encapsulation layer for all essence types (video, audio, metadata).

ST 2022-7 can be used underneath for redundancy.

RTP (RFC 3550) is a lightweight, real-time transport protocol designed for synchronization, sequencing, and timing of audio/video data across IP networks, which are the backbone of modern IP broadcast and streaming systems.

Secure RTP

SRTP Master Key Identifier (MKI)
Where it lives: In Secure RTP (SRTP) packets, optionally appended after the encrypted payload but before the authentication tag.

Purpose: Identifies which cryptographic key was used to encrypt the RTP payload. This is important when multiple SRTP keys are in use, for example, during rekeying events or multi-party conferences.

Structure (optional, variable length):
[Encrypted Payload][MKI][Auth Tag]

MKI length is negotiated during session setup (via SDP). It tells the receiver which master key to use to decrypt the payload. Used mainly in environments where keys are rotated frequently for security compliance.

Authentication Fields (SRTP Auth Tag)
Where it lives: At the end of the SRTP packet — after the payload and optional MKI.

Purpose: Provides integrity and authenticity for the RTP packet. It ensures that the packet wasn’t tampered with and came from a trusted sender.

How it works: Calculated using an HMAC (Hash-based Message Authentication Code) algorithm like HMAC-SHA1.
Common tag lengths: 80 bits (10 bytes) or 32 bits (4 bytes).
The receiver recomputes the tag and compares it — mismatch means discard.

Example structure (end of packet):
[Encrypted Payload][MKI (optional)][Auth Tag]

Putting It All Together
Here’s how these optional fields stack up in the full RTP/SRTP packet:


Maximum Transmission Unit



Mapping MPEG Transport Streams to RTP

 


UPDATED
2/21/26
V260221-1.0