fix(sls): restore macOS build + security, concurrency & build hardening, tests, docs#14
Conversation
eventfd is linux-only and broke the macos build. wrap the linux path behind __linux__ and use a self-pipe (read end registered with the srt epoll, write end used by wake()) on other platforms. public interface unchanged: wake_fd() still returns the epoll-registered read end and drain_wake_fd() reads it.
pthread_t is an integer on linux but an opaque pointer on macos; spdlog / fmt refuses to format raw non-void pointers. add a sls_tid() helper in common.hpp that c-style casts through uintptr_t (compiles for both integer and pointer pthread_t) and use it at every spdlog call site that formats m_th_id.
common.hpp has 'using namespace std;', which pulls std::bind into every translation unit. on macos clang, name lookup picks the std template over the posix bind() syscall and the build fails. qualify the one syscall site with ::bind.
libstdc++fs is a linux/libstdc++ standalone; libc++ on macos ships std::filesystem in the main library, so the explicit link breaks on mac. guard it to non-apple/non-windows. add corefoundation + security frameworks on apple so cpp-httplib's tls system-cert loader can link.
vendor doctest v2.4.11 as a single header in lib/doctest. wire SLS_BUILD_TESTS / SLS_SANITIZE / SLS_TSAN cmake options and a tests/ subdirectory that builds sls_tests against sls_core. add unit coverage for sls_sid, SLSRecycleArray (incl. wrap and reader overrun), AuthRejectCache (incl. ttl boundary), SLSBitrateLimit (OK/violation/disconnect transitions and window aging) and SLSAudioGapFiller (frame_pts_duration math, PTS wrap, MAX_PTS_GAP and MAX_GAP_FRAMES bounds, ADTS detect). 43 test cases register.
three-config matrix (debug, asan-ubsan, tsan) on ubuntu-22.04. each job builds the irlserver/srt belabox fork (cached across runs) per the Dockerfile pattern, configures with SLS_BUILD_TESTS=ON, builds and runs ctest. vendored submodules and the SRT install are cached so steady-state ci spends its time on the actual build + tests.
A publisher-crafted H.264 NAL larger than 188 bytes could overflow ti->sps / ti->pps in sls_parse_spspps. Reject (do not store a truncated SPS) at all four copy sites (two SPS, two PPS, including the last-nal block). Uses sizeof(ti->sps)/sizeof(ti->pps) so the bound tracks the buffer.
When the webhook returned HTTP 200 with a body that failed JSON parse, lacked stream_id, or carried an unusable stream_id, the three early returns skipped cache_negative(), so reconnects re-fired the webhook and parked in deferred-accept until the pending slot pool drained. Insert a short-TTL negative cache entry in each case, matching the transport-failure / non-200 handling above. Also clamp the webhook-supplied max_players_per_stream: -1 (unlimited) and any non-negative value are accepted, anything < -1 is dropped.
== short-circuits on the first differing byte, leaking the matched prefix length on the /stats and /disconnect API-key checks. Add a file-local sls_ct_equal that iterates over the longer string and ORs in every difference, and use it at both comparison sites.
strdup can return NULL under memory pressure; passing that into strtok crashes the listener thread on every accept. Mirror the surrounding close-and-return error pattern when the allocation fails.
The negative auth cache was keyed on the raw wire streamid, so byte variants of the same logical stream (trailing whitespace, reordered k/v fields, bare vs. #!:: form) hashed to different entries and let a misbehaving client bypass the rejection. Add sls_canonical_sid_key which reduces a streamid to h/sls_app/r when all three parse, and use it on both sides: record_failure in SLSRole and is_blocked in the publisher listen callback. Falls back to the raw streamid when input does not parse so behavior is unchanged for unparseable input.
sls_is_safe_name only blocked path separators and control bytes, so a player-chosen stream_name containing ?, #, &, =, %, or space was concatenated verbatim into the outbound srt:// URL built by SLSPullerManager, splicing query parameters into the relay leg. Add those six characters to the per-character rejection. Deliberately omit ':' and '@' (URL-significant only in the authority, not the query) to avoid locking out opaque identifiers operators may already accept.
Make CSLSRecycleArray::m_nDataCount std::atomic<int64_t> so the put() increment and the bFirst-snapshot read are correctly synchronised with concurrent get() readers; widen SLSRecycleArrayID::nDataCount to match. Take the read lock around the bFirst snapshot so the (write head, byte counter) pair is captured atomically. setSize() now takes a write lock (it reallocates the buffer) and resets m_nDataCount so the counter stays meaningful across realloc. Drop the stray arg in the oversized-len error log so it no longer prints '*** LOG ERROR *** invalid format specifier'.
The previous loop incremented an iterator that erase() had already invalidated whenever a manager was retired, which is undefined behaviour as soon as ≥2 managers retire in one pass. Switch to the standard erase/advance idiom.
Calling publisher_role->close() from the HTTP control thread deleted m_srt while the owning worker was still dereferencing it on the data path — a use-after-free reachable from the admin /disconnect endpoint. Use the existing request_kick() handoff so teardown runs on the socket-owning worker, and upgrade the kick flag store/load to release/acquire so state set up before the kick is visible to the worker before it sees the flag.
sls_ip_access_t only carries an IPv4 ip_address, so the old loop matched IPv6 peers only against the wildcard entry — specific deny rules silently never applied to them. Extract a shared helper used by both the publisher and player ACL paths: IPv6 peers honour wildcard rules, otherwise fall through with a one-shot warning to the documented default. Add CSLSSrt::libsrt_is_ipv6_peer() so the matcher can tell IPv4 from IPv6 peers.
Each srt_setsockopt failure between srt_create_socket() and srt_bind() returned SLS_ERROR without releasing the SRT socket or the addrinfo, leaking an SRT socket and a small heap allocation per failed listener setup. Route every sockopt-failure exit through a single cleanup lambda.
AuthRejectCache stores expiry at wall-clock second resolution (time(nullptr)) and is_blocked() returns expiry > now, so any sleep < TTL + 1s can land on either side of a second boundary. The previous tests slept 1100ms with TTL=1 (or TTL=2 then re-checked after 1100ms past refresh), which flaked roughly 1 in 6 runs. Sleep TTL + 1.5s for any post-expiry CHECK_FALSE, and assert the post-refresh CHECK(is_blocked) immediately after record_failure with no sleep in between.
- use rts_data instead of caller buffer when refilling after loop reopen - fix inverted null-guard (&& -> ||) in generate_rts_file - fix spdlog placeholder/argument mismatches in get error paths
- CSLSMapData::is_exist now takes a read lock (was write); body only finds
- give the per-key maps a transparent comparator (std::less<>) so the hot
put/get paths look up by std::string_view{key} without constructing a
std::string per packet
Convert per-packet spdlog::trace(...) call sites in RecycleArray put/get, Role read/write handlers, and Group worker egress to SPDLOG_TRACE(...) so their format-argument evaluation can be stripped by the compile-time active level. Pin SPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_INFO in CMakeLists so trace/debug arguments are not evaluated in release builds. Non-hot trace sites (epoll add/remove, destructors) are left as spdlog::trace.
cpp-httplib -> v0.48.0, nlohmann/json -> v3.12.0. Also drop the misleading branch = 1.9.2 line from .gitmodules for spdlog (pins are the source of truth). spdlog stays on its current commit because the irlserver fork carries no release tags. CxxUrl stays on its current commit (7 commits past v0.3 with cmake fixes the build relies on).
sls_format was a stub returning empty string and had no callers. The av_malloc/av_free/av_strdup/av_str_replace/av_tolower/av_strncasecmp family existed only to back sls_mkdir_p; that function now uses std::filesystem::create_directories with an explicit permissions(0755) pass to preserve the prior mkdir(2) mode under arbitrary umasks.
Advisory baseline only. The tree is not reformatted in this change to keep future diffs reviewable. -Wextra and -Wshadow are enabled on all three compiler-flag branches without -Werror, so the existing backlog surfaces (~10 new warnings) without breaking the build.
spike deliverables from plan 007: - 008: cslsmapdata per-entry routing (lock rework via shared_ptr handle) - 009: decompose csllistener::handler with test seam + phased extraction - 010: split common.cpp into per-concern modules + vectorise slsmanager - 011: dedup pullermanager/pushermanager via base-class scaffolding - 012: worker affinity to eliminate the 10ms cross-worker forwarding gap each plan is staged: characterization tests first, prototype/measurement, then the refactor. each names its STOP conditions and open questions.
|
Warning Review limit reached
More reviews will be available in 48 minutes and 10 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThis PR establishes a verification baseline and implements multiple improvement plans for the IRL SRT server fork. It adds a GitHub Actions CI pipeline, macOS portability for the epoll wake mechanism, seven categories of targeted fixes (security hardening, concurrency/memory correctness, hot-path performance, socket leak cleanup, iterator UB, and TSFileTimeReader correctness), a doctest/CTest unit-test harness with five new test suites, documentation rewrites, and thirteen engineering plan documents. ChangesIRL SRT Server improvements
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Restores the macOS build (regressed by the
eventfdchange in0167cc38) and lands a batch of audited fixes across security, memory/concurrency, correctness, build reproducibility, tooling, and docs. Also introduces the project's first automated test suite.Work is organised as plans under
plans/. Each plan was executed and reviewed independently; this PR is the combined result.What's included
Build portability (000)
eventfdon Linux, self-pipe elsewhere, so the core builds on macOS again. Linux behaviour is unchanged.pthread_tlog formatting (sls_tidhelper),::bindqualification, and macOS link libraries (std::filesystemvia libc++, CoreFoundation/Security for the TLS cert loader).Verification baseline (001)
SLS_BUILD_TESTS,SLS_SANITIZE(ASan/UBSan), andSLS_TSANbuild options.ctest.Security (002)
strdupon the accept path, clamp the webhookmax_playersoverride, key the auth-reject cache on the canonical streamid, and reject URL-significant characters in streamid components.Memory and concurrency (003)
disconnect_streamnow kicks the publisher via the existing cross-thread atomic flag instead of freeing the SRT socket from the HTTP thread (fixes a use-after-free).stat_info_ton every accepted role.setSize, and fix the reload-loop iterator invalidation.libsrt_setuperror paths.Correctness and performance (004)
is_existtakes a read lock; per-packet map lookups avoid astd::stringallocation via transparent comparators; per-packet trace calls compile out in release builds.Build and supply chain (005)
sls_format, theav_*helpers;sls_mkdir_pnow usesstd::filesystem)..clang-format,.clang-tidy, and-Wextra -Wshadow.Docs (006)
CONFIGURATION.mdcatalogs the IRL directives;ChangeLog.mdfork history backfilled; newCLAUDE.md.Architecture design plans (007)
008..012(MapData lock rework, listener-handler decomposition, common.cpp/manager split, relay-manager dedup, worker affinity). No production code in this PR; these are scoped for follow-up.Verification
Clean build green on macOS against the belabox SRT fork;
ctestpasses (47 tests), ASan/UBSan clean, TSan compiles, andsrt_serverruns. CI will exercise the same on Linux once this runs.Notes for review
sls_ip_access_tstores only an IPv4 address, so specific IPv6 allow/deny rules are not enforced. IPv6 peers match only the wildcard entry and otherwise hit the default path, with a warning logged. Documented inCONFIGURATION.md; full IPv6 matching is a follow-up.CSLSManager::start()does not roll back partially-initialised listeners on failure (plan 010), and the pullerconnect_loopmay not retry its only configured upstream after a failed sweep (plan 011).Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores