<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Audio/Video | Ziyang Lin</title><link>https://ziyanglin.netlify.app/en/tags/audio/video/</link><atom:link href="https://ziyanglin.netlify.app/en/tags/audio/video/index.xml" rel="self" type="application/rss+xml"/><description>Audio/Video</description><generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Thu, 26 Jun 2025 01:00:00 +0000</lastBuildDate><image><url>https://ziyanglin.netlify.app/img/icon-192.png</url><title>Audio/Video</title><link>https://ziyanglin.netlify.app/en/tags/audio/video/</link></image><item><title>WebRTC Technical Guide: Web-Based Real-Time Communication Framework</title><link>https://ziyanglin.netlify.app/en/post/webrtc-documentation/</link><pubDate>Thu, 26 Jun 2025 01:00:00 +0000</pubDate><guid>https://ziyanglin.netlify.app/en/post/webrtc-documentation/</guid><description>&lt;h2 id="1-introduction">1. Introduction&lt;/h2>
&lt;p>WebRTC (Web Real-Time Communication) is an open-source technology that enables real-time voice and video communication in web browsers. It allows direct peer-to-peer (P2P) audio, video, and data sharing between browsers without requiring any plugins or third-party software.&lt;/p>
&lt;p>The main goal of WebRTC is to provide high-quality, low-latency real-time communication, making it easy for developers to build rich communication features into web applications.&lt;/p>
&lt;h3 id="core-advantages">Core Advantages&lt;/h3>
&lt;ul>
&lt;li>&lt;strong>Cross-platform and browser compatibility&lt;/strong>: WebRTC is an open standard by W3C and IETF, widely supported by major browsers (Chrome, Firefox, Safari, Edge).&lt;/li>
&lt;li>&lt;strong>No plugins required&lt;/strong>: Users can use real-time communication features directly in their browsers without downloading or installing any extensions.&lt;/li>
&lt;li>&lt;strong>Peer-to-peer communication&lt;/strong>: When possible, data is transmitted directly between users, reducing server bandwidth pressure and latency.&lt;/li>
&lt;li>&lt;strong>High security&lt;/strong>: All WebRTC communications are mandatorily encrypted (via SRTP and DTLS), ensuring data confidentiality and integrity.&lt;/li>
&lt;li>&lt;strong>High-quality audio and video&lt;/strong>: WebRTC includes advanced signal processing components like echo cancellation, noise suppression, and automatic gain control to provide excellent audio/video quality.&lt;/li>
&lt;/ul>
&lt;h2 id="2-core-concepts">2. Core Concepts&lt;/h2>
&lt;p>WebRTC consists of several key JavaScript APIs that work together to enable real-time communication.&lt;/p>
&lt;h3 id="21-rtcpeerconnection">2.1. &lt;code>RTCPeerConnection&lt;/code>&lt;/h3>
&lt;p>&lt;code>RTCPeerConnection&lt;/code> is the core interface of WebRTC, responsible for establishing and managing connections between two peers. Its main responsibilities include:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Media negotiation&lt;/strong>: Handling parameters for audio/video codecs, resolution, etc.&lt;/li>
&lt;li>&lt;strong>Network path discovery&lt;/strong>: Finding the best connection path through the ICE framework.&lt;/li>
&lt;li>&lt;strong>Connection maintenance&lt;/strong>: Managing the connection lifecycle, including establishment, maintenance, and closure.&lt;/li>
&lt;li>&lt;strong>Data transmission&lt;/strong>: Handling the actual transmission of audio/video streams (SRTP) and data channels (SCTP/DTLS).&lt;/li>
&lt;/ul>
&lt;p>An &lt;code>RTCPeerConnection&lt;/code> object represents a WebRTC connection from the local computer to a remote peer.&lt;/p>
&lt;h3 id="22-mediastream">2.2. &lt;code>MediaStream&lt;/code>&lt;/h3>
&lt;p>The &lt;code>MediaStream&lt;/code> API represents streams of media content. A &lt;code>MediaStream&lt;/code> object can contain one or more media tracks (&lt;code>MediaStreamTrack&lt;/code>), which can be:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Audio tracks (&lt;code>AudioTrack&lt;/code>)&lt;/strong>: Audio data from a microphone.&lt;/li>
&lt;li>&lt;strong>Video tracks (&lt;code>VideoTrack&lt;/code>)&lt;/strong>: Video data from a camera.&lt;/li>
&lt;/ul>
&lt;p>Developers typically use the &lt;code>navigator.mediaDevices.getUserMedia()&lt;/code> method to obtain a local &lt;code>MediaStream&lt;/code>, which prompts the user to authorize access to their camera and microphone. The obtained stream can then be added to an &lt;code>RTCPeerConnection&lt;/code> for transmission to the remote peer.&lt;/p>
&lt;h3 id="23-rtcdatachannel">2.3. &lt;code>RTCDataChannel&lt;/code>&lt;/h3>
&lt;p>In addition to audio and video, WebRTC supports the transmission of arbitrary binary data between peers through the &lt;code>RTCDataChannel&lt;/code> API. This provides powerful functionality for:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>File sharing&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Real-time text chat&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Online game state synchronization&lt;/strong>&lt;/li>
&lt;li>&lt;strong>Remote desktop control&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>The &lt;code>RTCDataChannel&lt;/code> API is designed similarly to WebSockets, offering reliable and unreliable, ordered and unordered transmission modes that developers can choose based on application requirements. It uses the SCTP protocol (Stream Control Transmission Protocol) for transmission and is encrypted via DTLS.&lt;/p>
&lt;h2 id="3-connection-process-in-detail">3. Connection Process in Detail&lt;/h2>
&lt;p>Establishing a WebRTC connection is a complex multi-stage process involving signaling, session description, and network path discovery.&lt;/p>
&lt;h3 id="31-signaling">3.1. Signaling&lt;/h3>
&lt;p>Interestingly, the WebRTC API itself does not include a signaling mechanism. Signaling is the process of exchanging metadata between peers before establishing communication. Developers must choose or implement their own signaling channel. Common technologies include WebSocket or XMLHttpRequest.&lt;/p>
&lt;p>The signaling server acts as an intermediary, helping two clients who want to communicate exchange three types of information:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Session control messages&lt;/strong>: Used to open or close communication.&lt;/li>
&lt;li>&lt;strong>Network configuration&lt;/strong>: Information about the client's IP address and port.&lt;/li>
&lt;li>&lt;strong>Media capabilities&lt;/strong>: Codecs and resolutions supported by the client.&lt;/li>
&lt;/ol>
&lt;p>This process typically follows these steps:&lt;/p>
&lt;ol>
&lt;li>Client A sends a &amp;ldquo;request call&amp;rdquo; message to the signaling server.&lt;/li>
&lt;li>The signaling server forwards this request to client B.&lt;/li>
&lt;li>Client B agrees to the call.&lt;/li>
&lt;li>Afterward, clients A and B exchange SDP and ICE candidates through the signaling server until they find a viable connection path.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">sequenceDiagram
participant ClientA as Client A
participant SignalingServer as Signaling Server
participant ClientB as Client B
ClientA-&amp;gt;&amp;gt;SignalingServer: Initiate call request (join room)
SignalingServer-&amp;gt;&amp;gt;ClientB: Forward call request
ClientB--&amp;gt;&amp;gt;SignalingServer: Accept call
SignalingServer--&amp;gt;&amp;gt;ClientA: B has joined
loop Offer/Answer &amp;amp; ICE Exchange
ClientA-&amp;gt;&amp;gt;SignalingServer: Send SDP Offer / ICE Candidate
SignalingServer-&amp;gt;&amp;gt;ClientB: Forward SDP Offer / ICE Candidate
ClientB-&amp;gt;&amp;gt;SignalingServer: Send SDP Answer / ICE Candidate
SignalingServer-&amp;gt;&amp;gt;ClientA: Forward SDP Answer / ICE Candidate
end
&lt;/code>&lt;/pre>
&lt;h3 id="32-session-description-protocol-sdp">3.2. Session Description Protocol (SDP)&lt;/h3>
&lt;p>SDP (Session Description Protocol) is a standard format for describing multimedia connection content. It doesn't transmit media data itself but describes the connection parameters. An SDP object includes:&lt;/p>
&lt;ul>
&lt;li>Session unique identifier and version.&lt;/li>
&lt;li>Media types (audio, video, data).&lt;/li>
&lt;li>Codecs used (e.g., VP8, H.264, Opus).&lt;/li>
&lt;li>Network transport information (IP addresses and ports).&lt;/li>
&lt;li>Bandwidth information.&lt;/li>
&lt;/ul>
&lt;p>WebRTC uses the &lt;strong>Offer/Answer model&lt;/strong> to exchange SDP information:&lt;/p>
&lt;ol>
&lt;li>The &lt;strong>Caller&lt;/strong> creates an &lt;strong>Offer&lt;/strong> SDP describing the communication parameters it desires and sends it to the receiver through the signaling server.&lt;/li>
&lt;li>The &lt;strong>Callee&lt;/strong> receives the Offer and creates an &lt;strong>Answer&lt;/strong> SDP describing the communication parameters it can support, sending it back to the caller through the signaling server.&lt;/li>
&lt;li>Once both parties accept each other's SDP, they have reached a consensus on the session parameters.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">sequenceDiagram
participant Caller
participant SignalingServer as Signaling Server
participant Callee
Caller-&amp;gt;&amp;gt;Caller: createOffer()
Caller-&amp;gt;&amp;gt;Caller: setLocalDescription(offer)
Caller-&amp;gt;&amp;gt;SignalingServer: Send Offer
SignalingServer-&amp;gt;&amp;gt;Callee: Forward Offer
Callee-&amp;gt;&amp;gt;Callee: setRemoteDescription(offer)
Callee-&amp;gt;&amp;gt;Callee: createAnswer()
Callee-&amp;gt;&amp;gt;Callee: setLocalDescription(answer)
Callee-&amp;gt;&amp;gt;SignalingServer: Send Answer
SignalingServer-&amp;gt;&amp;gt;Caller: Forward Answer
Caller-&amp;gt;&amp;gt;Caller: setRemoteDescription(answer)
&lt;/code>&lt;/pre>
&lt;h3 id="33-interactive-connectivity-establishment-ice">3.3. Interactive Connectivity Establishment (ICE)&lt;/h3>
&lt;p>Since most devices are behind NAT (Network Address Translation) or firewalls and don't have public IP addresses, establishing direct P2P connections becomes challenging. ICE (Interactive Connectivity Establishment) is a framework specifically designed to solve this problem.&lt;/p>
&lt;p>The ICE workflow is as follows:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Gather candidate addresses&lt;/strong>: Each client collects its network address candidates from different sources:
&lt;ul>
&lt;li>&lt;strong>Local addresses&lt;/strong>: The device's IP address within the local network.&lt;/li>
&lt;li>&lt;strong>Server Reflexive Address&lt;/strong>: The device's public IP address and port discovered through a STUN server.&lt;/li>
&lt;li>&lt;strong>Relayed Address&lt;/strong>: A relay address obtained through a TURN server. When P2P direct connection fails, all data will be forwarded through the TURN server.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Exchange candidates&lt;/strong>: Clients exchange their collected ICE candidate lists through the signaling server.&lt;/li>
&lt;li>&lt;strong>Connectivity checks&lt;/strong>: Clients pair up the received candidate addresses and send STUN requests for connectivity checks (called &amp;ldquo;pings&amp;rdquo;) to determine which paths are available.&lt;/li>
&lt;li>&lt;strong>Select the best path&lt;/strong>: Once a viable address pair is found, the ICE agent selects it as the communication path and begins transmitting media data. P2P direct connection paths are typically prioritized because they have the lowest latency.&lt;/li>
&lt;/ol>
&lt;pre>&lt;code class="language-mermaid">graph TD
subgraph Client A
A1(Start) --&amp;gt; A2{Gather Candidates};
A2 --&amp;gt; A3[Local Address];
A2 --&amp;gt; A4[STUN Address];
A2 --&amp;gt; A5[TURN Address];
end
subgraph Client B
B1(Start) --&amp;gt; B2{Gather Candidates};
B2 --&amp;gt; B3[Local Address];
B2 --&amp;gt; B4[STUN Address];
B2 --&amp;gt; B5[TURN Address];
end
A2 --&amp;gt; C1((Signaling Server));
B2 --&amp;gt; C1;
C1 --&amp;gt; A6(Exchange Candidates);
C1 --&amp;gt; B6(Exchange Candidates);
A6 --&amp;gt; A7{Connectivity Checks};
B6 --&amp;gt; B7{Connectivity Checks};
A7 -- STUN Request --&amp;gt; B7;
B7 -- STUN Response --&amp;gt; A7;
A7 --&amp;gt; A8(Select Best Path);
B7 --&amp;gt; B8(Select Best Path);
A8 --&amp;gt; A9((P2P Connection Established));
B8 --&amp;gt; B9((P2P Connection Established));
&lt;/code>&lt;/pre>
&lt;h2 id="4-nat-traversal-stun-and-turn">4. NAT Traversal: STUN and TURN&lt;/h2>
&lt;p>To achieve P2P connections, WebRTC heavily relies on STUN and TURN servers to solve NAT-related issues.&lt;/p>
&lt;h3 id="41-stun-servers">4.1. STUN Servers&lt;/h3>
&lt;p>STUN (Session Traversal Utilities for NAT) servers are very lightweight, with a simple task: telling a client behind NAT what its public IP address and port are.&lt;/p>
&lt;p>When a WebRTC client sends a request to a STUN server, the server checks the source IP and port of the request and returns them to the client. This way, the client knows &amp;ldquo;what it looks like on the internet&amp;rdquo; and can share this public address as an ICE candidate with other peers.&lt;/p>
&lt;p>Using STUN servers is the preferred approach for establishing P2P connections because they are only needed during the connection establishment phase and don't participate in actual data transmission, resulting in minimal overhead.&lt;/p>
&lt;h3 id="42-turn-servers">4.2. TURN Servers&lt;/h3>
&lt;p>However, in some complex network environments (such as symmetric NAT), peers cannot establish direct connections even if they know their public addresses. This is where TURN (Traversal Using Relays around NAT) servers come in.&lt;/p>
&lt;p>A TURN server is a more powerful relay server. When P2P connection fails, both clients connect to the TURN server, which then forwards all audio, video, and data between them. This is no longer true P2P communication, but it ensures that connections can still be established under the worst network conditions.&lt;/p>
&lt;p>Using TURN servers increases latency and server bandwidth costs, so they are typically used as a last resort.&lt;/p>
&lt;h2 id="5-security">5. Security&lt;/h2>
&lt;p>Security is a core principle in WebRTC design, with all communications mandatorily encrypted and unable to be disabled.&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Signaling security&lt;/strong>: The WebRTC standard doesn't specify a signaling protocol but recommends using secure WebSocket (WSS) or HTTPS to encrypt signaling messages.&lt;/li>
&lt;li>&lt;strong>Media encryption&lt;/strong>: All audio/video streams use &lt;strong>SRTP (Secure Real-time Transport Protocol)&lt;/strong> for encryption. SRTP prevents eavesdropping and content tampering by encrypting and authenticating RTP packets.&lt;/li>
&lt;li>&lt;strong>Data encryption&lt;/strong>: All &lt;code>RTCDataChannel&lt;/code> data is encrypted using &lt;strong>DTLS (Datagram Transport Layer Security)&lt;/strong>. DTLS is a protocol based on TLS that provides the same security guarantees for datagrams.&lt;/li>
&lt;/ul>
&lt;p>Key exchange is automatically completed during the &lt;code>RTCPeerConnection&lt;/code> establishment process through the DTLS handshake. This means a secure channel is established before any media or data exchange occurs.&lt;/p>
&lt;h2 id="6-practical-application-cases">6. Practical Application Cases&lt;/h2>
&lt;p>With its powerful features, WebRTC has been widely applied in various scenarios:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Video conferencing systems&lt;/strong>: Such as Google Meet, Jitsi Meet, etc., allowing multi-party real-time audio/video calls.&lt;/li>
&lt;li>&lt;strong>Online education platforms&lt;/strong>: Enabling remote interactive teaching between teachers and students.&lt;/li>
&lt;li>&lt;strong>Telemedicine&lt;/strong>: Allowing doctors to conduct video consultations with patients remotely.&lt;/li>
&lt;li>&lt;strong>P2P file sharing&lt;/strong>: Using &lt;code>RTCDataChannel&lt;/code> for fast file transfers between browsers.&lt;/li>
&lt;li>&lt;strong>Cloud gaming and real-time games&lt;/strong>: Providing low-latency instruction and data synchronization for games.&lt;/li>
&lt;li>&lt;strong>Online customer service and video support&lt;/strong>: Businesses providing real-time video support services to customers through web pages.&lt;/li>
&lt;/ul>
&lt;h2 id="7-conclusion">7. Conclusion&lt;/h2>
&lt;p>WebRTC is a revolutionary technology that brings real-time communication capabilities directly into browsers, greatly lowering the barrier to developing rich media applications. Through the three core APIs of &lt;code>RTCPeerConnection&lt;/code>, &lt;code>MediaStream&lt;/code>, and &lt;code>RTCDataChannel&lt;/code>, combined with powerful signaling, ICE, and security mechanisms, WebRTC provides a complete, robust, and secure real-time communication solution.&lt;/p>
&lt;p>As network technology develops and 5G becomes more widespread, WebRTC's application scenarios will become even broader, with its potential in emerging fields such as IoT, augmented reality (AR), and virtual reality (VR) gradually becoming apparent. For developers looking to integrate high-quality, low-latency communication features into their applications, WebRTC is undoubtedly one of the most worthwhile technologies to focus on and learn about today.&lt;/p></description></item></channel></rss>