Building a C# Softphone With Call Recording — Step-by-Step TutorialThis tutorial walks you through designing and implementing a C# softphone application that can make and receive VoIP calls and record call audio. It covers architecture, required components, key implementation steps, handling recordings, and deployment considerations. Code examples use .NET (Core / 6/7/8) and focus on clarity over production hardening; adapt for your environment, licensing, and security needs.
What you’ll build
A desktop softphone that:
- Registers to a SIP server (or PBX)
- Makes and receives audio calls using RTP (via a SIP session)
- Streams microphone audio and plays incoming audio to speakers
- Records each call to a local file (WAV or compressed format)
- Provides a simple UI for dialing, answering, and playback of recordings
Prerequisites
- Familiarity with C# and .NET (recommended .NET 6+).
- Basic understanding of SIP and RTP concepts.
- A SIP account or test PBX (Asterisk, FreeSWITCH, Zoiper-compatible servers, or a cloud SIP provider).
- A code editor (Visual Studio / VS Code) and NuGet package management.
- Microphone and speakers (or virtual audio devices) on the development machine.
Architecture overview
High-level components:
- UI layer: dialer, call controls, recording management.
- SIP signaling: SIP registration, INVITE handling, BYE, etc.
- Media engine: capture microphone audio, encode/packetize RTP, decode/play incoming RTP.
- Recording subsystem: write raw PCM or encoded audio to files with timestamps/metadata.
- Storage/management: organize recordings, optional upload to cloud or encryption.
Key choices:
- Use an existing SIP/media stack vs. implementing SIP and RTP yourself. For reliability and speed, prefer a maintained library (see options below).
- Choose audio formats: raw PCM/WAV is simplest; Opus provides better quality/size but requires codec handling and licensing considerations.
- Decide whether to record locally only or also stream to a remote storage.
Recommended libraries (open-source and commercial options):
- PJSIP / PJSUA2 (C library with C# bindings): robust, widely used, supports codecs and NAT traversal.
- SIPSorcery (C#): pure C# SIP and RTP stack — easy to integrate into .NET apps.
- Ozeki VoIP SDK (commercial, C#): high-level, includes recording and GUI components.
- Oai-SIP / Linphone SDKs: other alternatives with native libraries and bindings.
- NAudio (C#): audio capture/playback and WAV handling — useful for recording and audio I/O.
- Opus.NET or Concentus (C# Opus): if you want Opus codec handling.
For this tutorial we’ll use:
- SIPSorcery for SIP/RTP signaling and sessions (pure C#, simplifies integration).
- NAudio for microphone/speaker capture and WAV file writing (though SIPSorcery can handle RTP; NAudio manages system audio devices and WAV formatting).
- Concentus (optional) for Opus if you want compressed recordings.
Install via NuGet:
dotnet add package SIPSorcery dotnet add package NAudio dotnet add package Concentus.Opus # optional
Step 1 — Project setup
- Create a .NET desktop project (WPF, WinForms, or console for testing). Example (console):
dotnet new console -n SoftphoneWithRecording cd SoftphoneWithRecording dotnet add package SIPSorcery dotnet add package NAudio
- Add configuration for SIP account (SIP URI, username, password, SIP server, STUN/TURN if needed). Store secrets securely (avoid hardcoding in production).
Example appsettings.json keys:
- sipServer
- username
- password
- displayName
- localAudioPort
- recordingsPath
Step 2 — Initialize SIP transport and register
Using SIPSorcery, set up SIP transport, user agent and register to your SIP provider.
Example (simplified):
using SIPSorcery.SIP; using SIPSorcery.SIP.App; var sipTransport = new SIPTransport(); // Add UDP/DTLS/TCP channels if needed; SIPSorcery defaults suffice for many cases. var userAgent = new SIPUserAgent(sipTransport, null); var registerClient = new SIPRegistrationUserAgent(sipTransport, username, password, sipServer, null); await registerClient.Start();
Handle incoming call events:
userAgent.OnIncomingCall += async (ua, req) => { Console.WriteLine("Incoming call from " + req.Header.From.FriendlyDescription()); // Accept call and pass control to media setup };
Note: real apps need to manage SIP timers, re-registration, and error handling.
SIPSorcery has RTP session helpers. The basic flow:
- Negotiate SDP during SIP INVITE/200 OK exchange; agree on codecs and ports.
- Create an RTP session bound to local UDP ports that will receive/send audio.
- Capture microphone audio, package into RTP and send to remote RTP endpoint.
- Receive incoming RTP, decode, and play via speakers.
Example: simplified RTP and audio capture with NAudio
using NAudio.Wave; using SIPSorcery.Media; // Setup NAudio capture var waveIn = new WaveInEvent { WaveFormat = new WaveFormat(8000, 16, 1) // match chosen codec/SDP (e.g., PCMU uses 8000 Hz) }; waveIn.DataAvailable += (s, a) => { byte[] pcm = a.Buffer; // raw PCM bytes // If using G.711 (PCMU/PCMA), you must encode PCM to μ-law/A-law before sending RTP // Use SIPSorcery.Codecs or implement encoding SendRtpPacket(pcm); }; waveIn.StartRecording(); // Playback incoming RTP to speakers var waveOut = new WaveOutEvent(); var waveProvider = new BufferedWaveProvider(new WaveFormat(8000, 16, 1)); waveOut.Init(waveProvider); waveOut.Play(); // When RTP packet arrives: void OnRtpPacketReceived(byte[] payload) { // If payload is μ-law G711, decode to PCM; else convert appropriately waveProvider.AddSamples(pcmDecoded, 0, pcmDecoded.Length); }
Codec considerations:
- For simplicity start with G.711 (PCMU/PCMA) — no heavy licensing and simple to encode/decode. SIPSorcery includes G.711 helpers.
- For better quality/efficiency use Opus; you’ll need encoding/decoding libraries and adjust sample rates and packetization.
Step 4 — Recording calls
Two common approaches:
- Record from the audio engine (mix of send and receive) to a single file per call.
- Record separate streams (local mic and remote audio) for later mixing or per-party files.
Simplest: mix incoming and outgoing PCM in memory and write to a WAV file in real time.
Using NAudio WaveFileWriter:
using NAudio.Wave; string recordingPath = Path.Combine(recordingsPath, $"call_{DateTime.Now:yyyyMMdd_HHmmss}.wav"); var waveFormat = new WaveFormat(8000, 16, 1); using var writer = new WaveFileWriter(recordingPath, waveFormat); // When microphone DataAvailable: void MicDataAvailable(object s, WaveInEventArgs a) { writer.Write(a.Buffer, 0, a.BytesRecorded); } // When incoming audio decoded to PCM: void OnIncomingPcm(byte[] pcm, int offset, int count) { writer.Write(pcm, offset, count); }
This naive approach appends both streams sequentially — you need to mix the two streams sample-by-sample to produce a single coherent stereo/mono file. Example mixing approach:
- Convert both streams to float arrays, sum samples, clamp to [-1.0, 1.0], convert back to PCM16.
- For stereo output, consider writing local audio to left channel and remote to right channel, or mix to mono.
Mixing example (mono):
short MixSamples(short localSample, short remoteSample) { int mixed = localSample + remoteSample; if (mixed > short.MaxValue) mixed = short.MaxValue; if (mixed < short.MinValue) mixed = short.MinValue; return (short)mixed; }
Better approach: use NAudio’s MixingSampleProvider to mix IWaveProviders and feed a single WaveFileWriter.
Using MixingSampleProvider:
var mixer = new MixingSampleProvider(WaveFormat.CreateIeeeFloatWaveFormat(8000, 1)); mixer.Read(...) // NAudio handles mixing; connect sources for mic and remote // Then convert floats to PCM16 and write to WaveFileWriter or use WaveFileWriter.CreateWaveFile16 after conversion
Important: ensure timestamps and sample rates match; drop or resample as needed.
- WAV (PCM16): simplest, ideal for legal/compliance recording and easy playback. Larger files.
- Opus in Ogg/OGG-Opus: smaller and high quality; you’ll need to encode with Opus and wrap in Ogg.
- MP3/AAC: requires encoding libraries and possibly licensing.
Metadata to store:
- Caller/callee SIP URIs, display names.
- Call start/stop timestamps, duration.
- Call direction (inbound/outbound).
- Call ID / SIP Call-Id header for traceability.
- Optional notes or tags.
Store metadata as a sidecar JSON file, or embed into filename: call_20250828_132501_from_alice_tobob+123456.wav and call_20250828_132501.json (metadata).
Step 6 — UI and user flow
Core UI elements:
- Registration status (connected/registered).
- Dial pad, destination input, call button.
- Incoming call popup with Accept/Decline.
- In-call controls: mute, hold, transfer, record on/off, hang up.
- Recordings list with play, download, delete, and metadata.
UX considerations:
- Visual indication when recording (red dot).
- Allow automatic recording per policy or user toggle.
- Secure access to recordings (encryption, access control).
- Warning/consent messages as required by law.
Step 7 — NAT traversal, security, and reliability
NAT and firewall:
- Use STUN/TURN for clients behind NAT. SIPSorcery supports STUN. TURN might be necessary for restrictive networks (requires a TURN server).
- Use ICE for media negotiation where possible.
Security:
- Use TLS for SIP signaling (SIPS).
- Use SRTP for media encryption (SIPSorcery supports SRTP).
- Protect stored recordings: encrypt at rest, secure filesystem permissions.
- Securely store SIP credentials and rotate periodically.
Reliability:
- Re-register periodically; handle registration failures gracefully.
- Use jitter buffers for RTP to handle network variability.
- Implement reconnection logic and logging.
Step 8 — Testing and debugging
- Use test PBX (Asterisk/FreeSWITCH) locally for controlled calls.
- Use SIP test accounts from cloud providers for real-world testing.
- Wireshark RTP/SIP capture for protocol-level debugging.
- Log SIP messages, SDP, and error events.
- Test with different codecs and network conditions (packet loss, delay).
Step 9 — Legal and compliance notes
- Recording laws vary by jurisdiction. Always obtain required consent before recording calls.
- Log consent events and store them with the recording metadata.
- For regulated industries (finance, healthcare), ensure storage and access controls meet regulations (e.g., encryption, retention policies, audit logs).
Example simplified end-to-end flow (outline code)
- Start SIP transport and register.
- When outgoing call requested, create SIP INVITE with SDP listing supported audio codecs and local RTP port.
- On 200 OK, complete SIP handshake and start RTP session.
- Start capturing microphone audio, encode to chosen codec, and send RTP.
- On receiving RTP, decode and play via speaker.
- Simultaneously feed both mic and remote PCM into a mixer and write to a WAV file until call ends.
- On BYE, stop RTP, close streams, finalize WAV header, and save metadata.
Example resources and next steps
- SIPSorcery docs and GitHub for examples and RTP helpers.
- NAudio docs for advanced audio capture/playback and mixing.
- Concentus (Opus) for modern codec support.
- Asterisk/FreeSWITCH for local PBX testing.
Conclusion
This tutorial provided an end-to-end roadmap to build a C# softphone with call recording: choose a SIP/media stack, manage audio capture and playback, implement recording (mixing streams), and handle NAT/security/compliance. Start with G.711 and WAV for simplicity, then iterate to add Opus, SRTP, and resilient network handling.