Skip to content

fix(sls): restore macOS build + security, concurrency & build hardening, tests, docs#14

Merged
datagutt merged 34 commits into
mainfrom
advisor/execute-all
Jun 21, 2026
Merged

fix(sls): restore macOS build + security, concurrency & build hardening, tests, docs#14
datagutt merged 34 commits into
mainfrom
advisor/execute-all

Conversation

@datagutt

@datagutt datagutt commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Restores the macOS build (regressed by the eventfd change in 0167cc38) and lands a batch of audited fixes across security, memory/concurrency, correctness, build reproducibility, tooling, and docs. Also introduces the project's first automated test suite.

Work is organised as plans under plans/. Each plan was executed and reviewed independently; this PR is the combined result.

What's included

Build portability (000)

  • Portable epoll wakeup: eventfd on Linux, self-pipe elsewhere, so the core builds on macOS again. Linux behaviour is unchanged.
  • Portable pthread_t log formatting (sls_tid helper), ::bind qualification, and macOS link libraries (std::filesystem via libc++, CoreFoundation/Security for the TLS cert loader).

Verification baseline (001)

  • doctest harness + CTest, 47 unit tests covering streamid parsing, the recycle-array ring buffer, the auth-reject cache, bitrate limiting, and the audio gap filler.
  • SLS_BUILD_TESTS, SLS_SANITIZE (ASan/UBSan), and SLS_TSAN build options.
  • GitHub Actions CI matrix (debug, asan/ubsan, tsan) building the SRT fork and running ctest.

Security (002)

  • Bound the SPS/PPS copy in the TS parser to the destination buffer (prevents a heap overflow reachable from publisher data).
  • Negative-cache malformed player-key webhook responses so a misbehaving auth backend cannot exhaust the deferred-accept slots.
  • Constant-time comparison for the control-plane API key.
  • Null-check strdup on the accept path, clamp the webhook max_players override, key the auth-reject cache on the canonical streamid, and reject URL-significant characters in streamid components.

Memory and concurrency (003)

  • disconnect_stream now kicks the publisher via the existing cross-thread atomic flag instead of freeing the SRT socket from the HTTP thread (fixes a use-after-free).
  • Stop leaking stat_info_t on every accepted role.
  • Make the recycle-array byte counter atomic, take a write lock in setSize, and fix the reload-loop iterator invalidation.
  • Non-silent IPv6 handling in the IP ACL (see Notes), and socket/addrinfo cleanup on the libsrt_setup error paths.

Correctness and performance (004)

  • Fix the TSFileTimeReader loop-wrap (wrong buffer), a NULL-deref guard, and malformed log calls.
  • is_exist takes a read lock; per-packet map lookups avoid a std::string allocation via transparent comparators; per-packet trace calls compile out in release builds.

Build and supply chain (005)

  • Pin the SRT belabox fork by commit, pin the Alpine base image, pin vendored submodules to releases where tags exist, and fix the misleading submodule branch label.
  • Remove dead code (sls_format, the av_* helpers; sls_mkdir_p now uses std::filesystem).
  • Add .clang-format, .clang-tidy, and -Wextra -Wshadow.

Docs (006)

  • README points to the belabox SRT fork and lists prerequisites; new CONFIGURATION.md catalogs the IRL directives; ChangeLog.md fork history backfilled; new CLAUDE.md.

Architecture design plans (007)

  • Design/spike plans 008..012 (MapData lock rework, listener-handler decomposition, common.cpp/manager split, relay-manager dedup, worker affinity). No production code in this PR; these are scoped for follow-up.

Verification

Clean build green on macOS against the belabox SRT fork; ctest passes (47 tests), ASan/UBSan clean, TSan compiles, and srt_server runs. CI will exercise the same on Linux once this runs.

Notes for review

  • Auth fail-open is intentional and unchanged: publisher-auth on backend 5xx/transport failure stays fail-open by design (a circuit-breaker was deliberately not added).
  • IPv6 ACL limitation: sls_ip_access_t stores only an IPv4 address, so specific IPv6 allow/deny rules are not enforced. IPv6 peers match only the wildcard entry and otherwise hit the default path, with a warning logged. Documented in CONFIGURATION.md; full IPv6 matching is a follow-up.
  • Two latent bugs surfaced during the design spikes and recorded (not fixed here): CSLSManager::start() does not roll back partially-initialised listeners on failure (plan 010), and the puller connect_loop may not retry its only configured upstream after a failed sweep (plan 011).
  • Public repo: this PR describes the security fixes above. Merge promptly if the disclosure window matters.

Summary by CodeRabbit

  • New Features

    • Added GitHub Actions CI pipeline for automated testing across debug, address sanitizer, and thread sanitizer configurations.
    • Added comprehensive unit test suite for core functionality.
    • Enhanced API security with constant-time authorization checks.
  • Bug Fixes

    • Fixed cross-thread teardown race condition in stream disconnection.
    • Fixed socket and resource leaks in initialization error paths.
    • Improved input validation for configuration parameters and webhook responses.
    • Fixed IPv6 peer detection and ACL matching behavior.
  • Documentation

    • Added detailed configuration reference and contributor onboarding guides.
    • Updated build/test instructions and dependency documentation.
    • Added release history and architecture planning documents.
  • Chores

    • Updated build toolchain with code formatting and static analysis configurations.
    • Refreshed Docker build pinning and Alpine base image versions.
    • Updated submodule dependencies.
    • Enhanced compiler warnings and sanitizer support.

advisor and others added 30 commits June 21, 2026 02:30
eventfd is linux-only and broke the macos build. wrap the linux path
behind __linux__ and use a self-pipe (read end registered with the srt
epoll, write end used by wake()) on other platforms. public interface
unchanged: wake_fd() still returns the epoll-registered read end and
drain_wake_fd() reads it.
pthread_t is an integer on linux but an opaque pointer on macos; spdlog
/ fmt refuses to format raw non-void pointers. add a sls_tid() helper in
common.hpp that c-style casts through uintptr_t (compiles for both
integer and pointer pthread_t) and use it at every spdlog call site that
formats m_th_id.
common.hpp has 'using namespace std;', which pulls std::bind into every
translation unit. on macos clang, name lookup picks the std template over
the posix bind() syscall and the build fails. qualify the one syscall
site with ::bind.
libstdc++fs is a linux/libstdc++ standalone; libc++ on macos ships
std::filesystem in the main library, so the explicit link breaks on
mac. guard it to non-apple/non-windows. add corefoundation + security
frameworks on apple so cpp-httplib's tls system-cert loader can link.
vendor doctest v2.4.11 as a single header in lib/doctest. wire
SLS_BUILD_TESTS / SLS_SANITIZE / SLS_TSAN cmake options and a
tests/ subdirectory that builds sls_tests against sls_core. add
unit coverage for sls_sid, SLSRecycleArray (incl. wrap and reader
overrun), AuthRejectCache (incl. ttl boundary), SLSBitrateLimit
(OK/violation/disconnect transitions and window aging) and
SLSAudioGapFiller (frame_pts_duration math, PTS wrap, MAX_PTS_GAP
and MAX_GAP_FRAMES bounds, ADTS detect). 43 test cases register.
three-config matrix (debug, asan-ubsan, tsan) on ubuntu-22.04. each
job builds the irlserver/srt belabox fork (cached across runs) per
the Dockerfile pattern, configures with SLS_BUILD_TESTS=ON, builds
and runs ctest. vendored submodules and the SRT install are cached
so steady-state ci spends its time on the actual build + tests.
A publisher-crafted H.264 NAL larger than 188 bytes could overflow
ti->sps / ti->pps in sls_parse_spspps. Reject (do not store a truncated
SPS) at all four copy sites (two SPS, two PPS, including the last-nal
block). Uses sizeof(ti->sps)/sizeof(ti->pps) so the bound tracks the
buffer.
When the webhook returned HTTP 200 with a body that failed JSON parse,
lacked stream_id, or carried an unusable stream_id, the three early
returns skipped cache_negative(), so reconnects re-fired the webhook
and parked in deferred-accept until the pending slot pool drained.
Insert a short-TTL negative cache entry in each case, matching the
transport-failure / non-200 handling above.

Also clamp the webhook-supplied max_players_per_stream: -1 (unlimited)
and any non-negative value are accepted, anything < -1 is dropped.
== short-circuits on the first differing byte, leaking the matched
prefix length on the /stats and /disconnect API-key checks. Add a
file-local sls_ct_equal that iterates over the longer string and ORs
in every difference, and use it at both comparison sites.
strdup can return NULL under memory pressure; passing that into strtok
crashes the listener thread on every accept. Mirror the surrounding
close-and-return error pattern when the allocation fails.
The negative auth cache was keyed on the raw wire streamid, so byte
variants of the same logical stream (trailing whitespace, reordered
k/v fields, bare vs. #!:: form) hashed to different entries and let a
misbehaving client bypass the rejection. Add sls_canonical_sid_key
which reduces a streamid to h/sls_app/r when all three parse, and use
it on both sides: record_failure in SLSRole and is_blocked in the
publisher listen callback. Falls back to the raw streamid when input
does not parse so behavior is unchanged for unparseable input.
sls_is_safe_name only blocked path separators and control bytes, so a
player-chosen stream_name containing ?, #, &, =, %, or space was
concatenated verbatim into the outbound srt:// URL built by
SLSPullerManager, splicing query parameters into the relay leg. Add
those six characters to the per-character rejection. Deliberately omit
':' and '@' (URL-significant only in the authority, not the query)
to avoid locking out opaque identifiers operators may already accept.
Make CSLSRecycleArray::m_nDataCount std::atomic<int64_t> so the put() increment and the bFirst-snapshot read are correctly synchronised with concurrent get() readers; widen SLSRecycleArrayID::nDataCount to match. Take the read lock around the bFirst snapshot so the (write head, byte counter) pair is captured atomically. setSize() now takes a write lock (it reallocates the buffer) and resets m_nDataCount so the counter stays meaningful across realloc. Drop the stray arg in the oversized-len error log so it no longer prints '*** LOG ERROR *** invalid format specifier'.
The previous loop incremented an iterator that erase() had already invalidated whenever a manager was retired, which is undefined behaviour as soon as ≥2 managers retire in one pass. Switch to the standard erase/advance idiom.
Calling publisher_role->close() from the HTTP control thread deleted m_srt while the owning worker was still dereferencing it on the data path — a use-after-free reachable from the admin /disconnect endpoint. Use the existing request_kick() handoff so teardown runs on the socket-owning worker, and upgrade the kick flag store/load to release/acquire so state set up before the kick is visible to the worker before it sees the flag.
sls_ip_access_t only carries an IPv4 ip_address, so the old loop matched IPv6 peers only against the wildcard entry — specific deny rules silently never applied to them. Extract a shared helper used by both the publisher and player ACL paths: IPv6 peers honour wildcard rules, otherwise fall through with a one-shot warning to the documented default. Add CSLSSrt::libsrt_is_ipv6_peer() so the matcher can tell IPv4 from IPv6 peers.
Each srt_setsockopt failure between srt_create_socket() and srt_bind() returned SLS_ERROR without releasing the SRT socket or the addrinfo, leaking an SRT socket and a small heap allocation per failed listener setup. Route every sockopt-failure exit through a single cleanup lambda.
AuthRejectCache stores expiry at wall-clock second resolution (time(nullptr)) and is_blocked() returns expiry > now, so any sleep < TTL + 1s can land on either side of a second boundary. The previous tests slept 1100ms with TTL=1 (or TTL=2 then re-checked after 1100ms past refresh), which flaked roughly 1 in 6 runs. Sleep TTL + 1.5s for any post-expiry CHECK_FALSE, and assert the post-refresh CHECK(is_blocked) immediately after record_failure with no sleep in between.
- use rts_data instead of caller buffer when refilling after loop reopen
- fix inverted null-guard (&& -> ||) in generate_rts_file
- fix spdlog placeholder/argument mismatches in get error paths
- CSLSMapData::is_exist now takes a read lock (was write); body only finds
- give the per-key maps a transparent comparator (std::less<>) so the hot
  put/get paths look up by std::string_view{key} without constructing a
  std::string per packet
Convert per-packet spdlog::trace(...) call sites in RecycleArray put/get,
Role read/write handlers, and Group worker egress to SPDLOG_TRACE(...) so
their format-argument evaluation can be stripped by the compile-time
active level. Pin SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_INFO in CMakeLists so
trace/debug arguments are not evaluated in release builds. Non-hot trace
sites (epoll add/remove, destructors) are left as spdlog::trace.
cpp-httplib -> v0.48.0, nlohmann/json -> v3.12.0. Also drop the
misleading branch = 1.9.2 line from .gitmodules for spdlog (pins are
the source of truth). spdlog stays on its current commit because the
irlserver fork carries no release tags. CxxUrl stays on its current
commit (7 commits past v0.3 with cmake fixes the build relies on).
sls_format was a stub returning empty string and had no callers.
The av_malloc/av_free/av_strdup/av_str_replace/av_tolower/av_strncasecmp
family existed only to back sls_mkdir_p; that function now uses
std::filesystem::create_directories with an explicit permissions(0755)
pass to preserve the prior mkdir(2) mode under arbitrary umasks.
Advisory baseline only. The tree is not reformatted in this change to
keep future diffs reviewable. -Wextra and -Wshadow are enabled on all
three compiler-flag branches without -Werror, so the existing backlog
surfaces (~10 new warnings) without breaking the build.
advisor-exec and others added 3 commits June 21, 2026 05:00
spike deliverables from plan 007:
- 008: cslsmapdata per-entry routing (lock rework via shared_ptr handle)
- 009: decompose csllistener::handler with test seam + phased extraction
- 010: split common.cpp into per-concern modules + vectorise slsmanager
- 011: dedup pullermanager/pushermanager via base-class scaffolding
- 012: worker affinity to eliminate the 10ms cross-worker forwarding gap

each plan is staged: characterization tests first, prototype/measurement,
then the refactor. each names its STOP conditions and open questions.
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@datagutt, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 48 minutes and 10 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f6d4b4c-0cdc-4c28-a779-44c0842b77cd

📥 Commits

Reviewing files that changed from the base of the PR and between bff489a and 87d86f4.

📒 Files selected for processing (2)
  • src/core/SLSListenerConfig.cpp
  • src/core/util.hpp

Walkthrough

This PR establishes a verification baseline and implements multiple improvement plans for the IRL SRT server fork. It adds a GitHub Actions CI pipeline, macOS portability for the epoll wake mechanism, seven categories of targeted fixes (security hardening, concurrency/memory correctness, hot-path performance, socket leak cleanup, iterator UB, and TSFileTimeReader correctness), a doctest/CTest unit-test harness with five new test suites, documentation rewrites, and thirteen engineering plan documents.

Changes

IRL SRT Server improvements

Layer / File(s) Summary
Build system, CI pipeline, and macOS portability
.clang-format, .clang-tidy, CMakeLists.txt, .github/workflows/ci.yml, Dockerfile, .gitmodules, lib/cpp-httplib, lib/json, src/core/CMakeLists.txt, src/core/SLSEpollThread.hpp, src/core/SLSEpollThread.cpp, src/core/SLSThread.cpp
Adds sanitizer/test CMake options, spdlog compile-level definition, .clang-format/.clang-tidy configs, a three-matrix (debug/asan-ubsan/tsan) CI workflow with SRT dependency and lib/ caching, Alpine image pinning and SRT commit pinning in the Dockerfile, submodule metadata cleanup, platform-conditional sls_core link deps (CoreFoundation/Security on Apple), and an eventfd-vs-pipe portable wake mechanism in CSLSEpollThread.
Security hardening: bounds, safe-name, cleanup, and TSFileTimeReader
src/core/common.hpp, src/core/common.cpp, src/core/SLSSrt.cpp, src/core/SLSSrt.hpp, src/core/SLSListenerHandler.cpp, src/core/TSFileTimeReader.cpp, src/core/TCPRole.cpp
Adds SPS/PPS NAL bounds-checked copies, tightens sls_is_safe_name against URL-significant characters, removes dead sls_format, replaces sls_mkdir_p with std::filesystem, adds sls_tid thread-id formatter, null-checks strdup on the accept path, consolidates SRT socket/addrinfo cleanup into a setup_fail lambda, exposes libsrt_is_ipv6_peer(), fixes null-before-strlen in generate_rts_file, and corrects the loop-wrap buffer write and log formatting in TSFileTimeReader.
IP ACL refactor and listener handler memory safety
src/core/SLSListenerHandler.cpp, src/core/SLSListenerAuth.cpp
Introduces AclMatch enum and sls_check_ip_acl helper with IPv4/IPv6-aware semantics replacing inline ACL loops in publisher and player paths; switches stat_info_t from heap to stack in both paths; adds negative caching for malformed webhook JSON responses; and range-validates max_players_per_stream overrides.
SID canonicalization and auth-reject cache keying
src/core/sls_sid.hpp, src/core/sls_sid.cpp, src/core/SLSRole.cpp
Adds sls_canonical_sid_key normalizing stream IDs to h/sls_app/r form; updates the publisher listen callback and check_http_passed negative-cache path to use the canonical key; tightens request_kick/get_state to memory_order_release/acquire.
Ring buffer atomics, MapData hot-path, disconnect kick, iterator UB, and constant-time auth
src/core/SLSRecycleArray.hpp, src/core/SLSRecycleArray.cpp, src/core/SLSMapData.hpp, src/core/SLSMapData.cpp, src/core/SLSManager.cpp, src/srt-live-server.cpp, src/core/SLSGroup.cpp, src/core/SLSRole.cpp
Converts m_nDataCount to std::atomic<int64_t> with write-lock in setSize(); switches CSLSMapData to std::less<> transparent comparator with std::string_view lookups and is_exist read-lock; replaces publisher_role->close() with request_kick() in disconnect_stream; fixes SIGHUP reload iterator UB; adds a constant-time string comparison helper used on /stats and /disconnect auth; converts hot-path trace calls to SPDLOG_TRACE.
Doctest/CTest test harness and unit tests
tests/CMakeLists.txt, tests/test_main.cpp, tests/test_recycle_array.cpp, tests/test_auth_reject_cache.cpp, tests/test_bitrate_limit.cpp, tests/test_audio_gap_filler.cpp, tests/test_sls_sid.cpp
Adds CMake wiring for sls_tests linked against sls_core/doctest/srt and registered with CTest; and five test suites covering ring-buffer behaviors, auth-reject cache TTL/refresh, bitrate limiter violation state machine, audio gap filler PTS/ADTS detection, and SID parsing/validation/canonicalization.
Documentation and engineering plans
README.md, CLAUDE.md, CONFIGURATION.md, ChangeLog.md, plans/README.md, plans/000-macos-portability.md, plans/001-verification-baseline.md, ..., plans/012-worker-affinity.md
Rewrites README for the IRL fork, adds CONFIGURATION.md directive reference and CLAUDE.md contributor orientation, backfills ChangeLog.md with fork history, and adds thirteen engineering plans (macOS portability, verification baseline, security quick wins, memory/concurrency, correctness/perf, build supply chain, documentation, architecture refactors, MapData lock rework, listener decomposition, common split, relay dedup, worker affinity) with a status-tracking README.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • irlserver/irl-srt-server#7: Directly related—both modify SLSListenerAuth.cpp player-key webhook response handling, including negative-cache insertion and max_players_per_stream validation.
  • irlserver/irl-srt-server#8: Directly related—both touch the per-key max_players_per_stream override logic and player-key authentication paths in SLSListener.
  • irlserver/irl-srt-server#12: Directly related—both implement SLSAudioGapFiller and wire it into CSLSMapData audio-gap fill/stats, and this PR adds the corresponding unit tests.

Poem

🐇 Hoppity-hop through the code we go,
Fixing races with atomic flow,
Safe-name guards and constant-time keys,
Epoll pipes on macOS, if you please!
Tests for every ring and SID,
A plan for all the things we did. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: restoring macOS build support, security hardening, concurrency fixes, build improvements, test infrastructure, and documentation updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch advisor/execute-all

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@datagutt datagutt merged commit a1dd80c into main Jun 21, 2026
6 of 7 checks passed
@datagutt datagutt deleted the advisor/execute-all branch June 21, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant