Security#15
Conversation
a player/preview/probe pointed at the ingest port with a valid publish streamid registers as a publisher (classification is by streamid host/app only; SRT exposes no reliable direction). it sends no media but takes over the key via last-writer-wins, grey-screening the real broadcaster. valid authed key, so rotating the key is the only operator workaround today. two guards: - first-data probation: reap an accepted publisher that delivers no media by (negotiated latency + publisher_first_data_grace, default 3000ms). the latency term is required because SRT TSBPD holds the first packet for the full RCVLATENCY window, so a high-latency encoder (moblin ~3000ms, belabox 2000-3000ms) must not be reaped while still buffering. - active-incumbent guard: do not evict a CSLSPublisher that is actively delivering; refuse the newcomer instead. scoped via is_takeover_protected() so puller/relay incumbents stay evictable (a local publisher can still take over a pulled stream). the pure reap decision is lifted into sls_idle.hpp (sls_should_reap_role) with unit tests in test_idle.cpp. new conf directive publisher_first_data_grace (0 = default 3000ms, -1 = disabled) documented in CONFIGURATION.md and sls.conf.
(cherry picked from commit a6faf62)
(cherry picked from commit e905e32)
(cherry picked from commit 6458f6e)
(cherry picked from commit 5ae2687)
(cherry picked from commit 29eb140)
(cherry picked from commit 00c6ebd)
(cherry picked from commit 1512a8a)
…drinfo (cherry picked from commit 169e16c)
(cherry picked from commit c006fc3)
(cherry picked from commit 3e2b79d)
the spspps-injection gate checked pat_len twice instead of pat_len && pmt_len, so a stream with pat but no pmt could still trip the gate. ported from ceralive eadfb12 (typo half only; check_ts_info return change deferred to the length-driven parser port).
two of the three medium findings from ceralive 610eb37 (idle-reap half deferred to the listener refactor batch): - CSLSLock now retries EAGAIN briefly then aborts on any rwlock/mutex lock failure instead of silently running the critical section unlocked (which would race a reader's get() against a writer's realloc -> uaf). - AuthRejectCache enforces a 50000-entry ceiling, evicting the entry closest to expiry, so a streamid-rotating flood cannot grow the map unbounded.
(cherry picked from commit c350a7c)
completes ceralive eadfb12: check_ts_info iterated and broke when sps/pps/pat/ pmt were all parsed but still returned SLS_ERROR unconditionally, so the caller's check_spspps-ok info log never fired. return SLS_OK in that case. behaviour is otherwise unchanged (the caller gates only a log line on it).
new optional directive to restrict the stats/disconnect API to a specific bind address (e.g. 127.0.0.1 for loopback-only). default keeps the historical all-interfaces behavior, and ?publisher=<key> stats reads stay unauthenticated (holding the stream key already implies access). adapted from ceralive 1f2c7c2, dropping that commit's publisher api-key gate and loopback/cors-off defaults.
replaces the hardcoded 100 MB / 128000-packet per-socket window (a pre-auth memory flood amplifier) with a value derived from the configured bitrate ceiling and max latency, scaling SRTO_FC at 1024 packets/MB and clamping to 8..100 MB. unlike ceralive 948b23e's flat 8 MB default, this keeps high-latency bonding streams (belabox/moblin) from starving while still bounding pre-auth memory. rcv_buf_mb pins an explicit size; new rcv_sizing_max_bitrate_kbps / rcv_sizing_max_latency_ms tune the derivation. applies to listener and relay sockets via the shared sls_derive_rcv_buf_mb helper.
…onnect UAF (cherry picked from commit 590be71)
…r (UAF) (cherry picked from commit 0da2287)
…ory caps (cherry picked from commit af3185e)
idle-reap half of ceralive 610eb37: a role accepted by a listener but not adopted by a worker within HANDOFF_ADMISSION_TTL (5s, because the worker is at worker_connections) would pin its socket/ring forever. CSLSGroup::idle_check now calls SLSRoleList::reap_unadopted to drop them. complements cd4e827's first-data probation (which reaps adopted-but-silent publishers): different list, different trigger. the commit's peer-IP reject-cache keying and rate-limiter ctx were dropped (entangled with the skipped per-IP rate limit).
(cherry picked from commit 540b411)
(cherry picked from commit 00ddf47)
…ror) (cherry picked from commit 7adc57a)
ported from ceralive bc8c7d2, adapted: pin the alpine:3.21 base image by digest in both build stages for reproducible builds, and add a security checklist header to sls.conf. corrected the control-plane note to match our defaults (HTTP API binds all interfaces and CORS is '*' by default; http_bind_addr restricts to loopback). dropped the trivy/sbom build-check.yml workflow and dependabot config (not present in this repo) and kept apk upgrade (no trivy gate to catch base CVEs).
… string (cherry picked from commit 3c613ef)
…os worker-confinement Todo 16 — cross-thread stats scalar audit (m_data_len, m_data_pos, m_need_reconnect). Thread trace: m_data_len / m_data_pos: referenced only in SLSRole.cpp — initialized in the ctor, then written and read solely by the owning worker (handler_write_data / update_egress_arming). The stats/HTTP thread never reads them (the /stats path uses m_kbitrate, libsrt SRT_TRACEBSTATS and the atomic ring/backpressure counters). Genuinely worker-confined -> left as plain int with a comment; keeping them off the atomic path avoids cost on the handler_write_data hot loop (constraint: no locking/atomics added there). m_need_reconnect: set once in the CSLSRelay ctor (true) / base ctor (false) on the relay-building thread (listener/manager via create_relay, or a worker on reconnect), and read on the owning worker via is_reconnect() in CSLSGroup::check_invalid_sock. Writer and reader are distinct threads -> std::atomic<bool>, relaxed (advisory flag, set before role publication), matching the file's existing cross-thread relay flags (m_kick_requested, m_relay_manager). A standalone TSan harness confirms the methodology: a plain scalar shared across threads is flagged as a data race; the same field as std::atomic with relaxed ordering is clean. Stats JSON shape unchanged (todo 37 fixture green).
Todo 17. CSLSGroup::check_invalid_sock() refreshed the published stats every m_stat_post_interval by first clearing m_stat_info under m_mutex_stat, then push_back'ing one role at a time, each push re-taking the lock. The reader, CSLSGroup::get_stat_info(), takes the same m_mutex_stat — so a /stats poll landing mid-refresh observed m_stat_info empty (right after clear) or half-filled (mid-loop), i.e. a stats snapshot missing roles. Fix: collect the pass into a worker-local std::vector<stat_info_t>, then swap it into m_stat_info under a single m_mutex_stat hold AFTER the role iteration. The reader now always sees the previous complete snapshot until the new one is swapped in atomically. The lock is never held across the role iteration or the per-role get_stat_info() call (constraint honored). Stats JSON shape unchanged (same stat_info_t fields; todo 37 fixture green). TSan ctest green (CERALIVE/srt reorderfreeze prefix).
…last_read_time
Todo 18.
Reconcile: the bFirst snapshot of (m_nWritePos, m_nDataCount) is already taken
under the read lock in get() (SLSRecycleArray.cpp), so a concurrent put() can
no longer tear the pair. This commit LOCKS that correct behavior with a
characterization test and tightens one memory ordering — it does NOT change the
overrun-detection math or the single-writer contract.
(a) New doctest case "bFirst snapshot stays consistent under concurrent
put/get": one writer thread hammers put() while N readers take their first
snapshot concurrently and then drain. Under TSan it is a race detector for
the snapshot — moving the m_nWritePos read out of the read lock would flag
against put()'s write-locked update. Functionally it asserts every aligned
get() stays non-negative and CHUNK-aligned. doctest macros are not
thread-safe, so threads record into atomics and all CHECKs run on the main
thread after join.
(b) m_last_read_time: the reader store in get() is now release and the
idle-check load in get_last_read_time() is acquire (was relaxed on both), so
the group idle/timeout thread observing a fresh timestamp happens-after the
reader progress that produced it. The constructor's initial store stays
relaxed (pre-publication, no observer).
TSan ctest green against the CERALIVE/srt reorderfreeze prefix; recycle_array
suite 7/7 incl the new concurrent case.
… assert/guard Roles are already std::shared_ptr<CSLSRole> (SLSRoleList). Re-derived the cross-thread teardown story from HEAD and recorded the thread of every invalid_srt() caller: during normal operation every delete m_srt runs on the one worker that owns the socket; a cross-thread kick (publisher takeover, /disconnect) only flips m_kick_requested and the owner performs the teardown in get_state(). No real cross-thread delete m_srt exists, so no runtime mutex is added. Added a debug-only single-owner tripwire: an std::atomic<std::thread::id> m_invalidating_tid and a compare_exchange in invalid_srt() that asserts no second thread is concurrently inside the delete path (the double-free a broken model would cause). Compiled out in release (NDEBUG) — zero production cost. Verified by a TSan connect/kick/disconnect storm (worker_threads>0): the tripwire fired 0 times across 20 teardowns + 8 cross-thread kicks. Evidence: .omo/evidence/task-19-srt-quality-hardening.log.
Guard the bitrate divide m_kbitrate = m_stat_bitrate_datacount*8 / d at both SLSRole.cpp sites (handler_read_data, handler_write_data): add d > 0 to the 'if (d >= m_stat_bitrate_interval)' condition. m_stat_bitrate_interval is an int that can be <=0 (misconfig / unset default) and a non-monotonic clock can make d zero/negative, either of which divided by zero. A scratch test forcing interval<=0 trips UBSan 'division by zero' on the original form and is clean on the guarded form. Did NOT touch the hash-mode div in SLSMapRelay.cpp (3e2b79d). Add defensive bounds asserts after the ring wrap arithmetic in CSLSRecycleArray::get() (copy_data_len and nReadPos in range) so a bad ring index trips in debug; the existing runtime clamp still corrects it in release. The optional strdup->std::string streamid split is deferred: strtok collapses consecutive '/' delimiters and the existing strlcpy+trim+part_count semantics depend on that, so the swap is not provably behavior-identical without the characterization test the task gates it on. ASan + ctest ring/profile suites green. Evidence: .omo/evidence/task-24-srt-quality-hardening.log.
ported from ceralive 3522af0 (code only; the stats-shape.json golden fixture and its test were not ported). exposes the SRT-native pktSentNAKTotal/pktRecvNAKTotal /pktRetransTotal/pktRcvRetrans on publisher stats — useful loss/retransmit visibility independent of the FEC profile work that motivated them upstream.
…tf truncation
Replace all 10 role-name sprintf(m_role_name, "literal") sites with
snprintf(m_role_name, sizeof(m_role_name), "literal") across SLSPuller,
SLSClient, SLSPublisher, SLSPusher, SLSPlayer, SLSRelay and SLSListenerCore
(6 sites). grep -rn 'sprintf(' src/ is now empty. The literals all fit, so the
formatted output is byte-identical.
SLSListenerHandler: after the negotiated-latency libsrt_getsockopt, validate the
optlen (SRTO_RCVLATENCY/PEERLATENCY are int32) — a width libsrt did not write
back as sizeof(int) is treated as a failed read and falls back to latency_min,
so a partial/garbage value is never trusted. The '<host>/<app>' routing key now
checks the snprintf return and rejects the connection on truncation rather than
routing on a clipped key. No formatted output content changed.
ctest profile suites + ASan green. Evidence:
.omo/evidence/task-23-srt-quality-hardening.log.
…eady fixed) The teardown ordering (0da2287) is unchanged: uninit() still detaches child relays first, drops the publisher from the maps, then frees manager-then-SRI. Net-new is the ALLOC side: wrap m_dynamic_pusher_sri (SLS_RELAY_INFO) and m_dynamic_pusher_manager (CSLSPusherManager) in std::unique_ptr. new -> std::make_unique; set_relay_conf(m_dynamic_pusher_sri.get()); the two delete X; X=NULL teardown lines become reset() in the SAME order (manager then SRI), preserving the detach-before-reset sequence. Declaration order keeps the manager before the SRI so reverse-order member destruction frees the manager (which reads the SRI) first. The single-spawn guard (!m_dynamic_pusher_manager) is preserved. Verified sole ownership on HEAD: neither manager destructor frees m_sri, so the unique_ptr swap is a safe drop-in. Forward-declared SLS_RELAY_INFO in the header (the old struct-ptr form declared it implicitly). Compiles under ASan; no publisher/pusher leak across startup+shutdown. Evidence: .omo/evidence/task-21-srt-quality-hardening.log.
Add a clang-only SLS_FUZZ CMake flavor and the first fuzz harness, which feeds arbitrary bytes straight into sls_parse_ts_info (PAT/PMT/PES path) and the standalone sls_parse_pmt_for_audio entry. Because libFuzzer poisons the bytes past the exact-sized input, any regression to an out-of-bounds read trips AddressSanitizer — this locks the length-driven TS parser hardening (c350a7c). The harness adds no parser logic. SLS_FUZZ: - clang-only (FATAL_ERROR otherwise; libFuzzer is a Clang feature), - mutually exclusive with SLS_SANITIZE/SLS_TSAN/SLS_COVERAGE, - excluded from the exploit-mitigation block like the other sanitizers, - instruments the whole tree with -fsanitize=fuzzer-no-link,address, undefined (no main injected, so srt_server/srt_client still link); each tests/fuzz target adds -fsanitize=fuzzer for the driver + main. UBSan is left recoverable, matching the existing SLS_SANITIZE policy: ASan stays fatal (the memory-safety lock), while a pre-existing benign signed left-shift in the FFmpeg-derived ff_parse_pes_pts (common.cpp:620, computed in int before the int64_t cast) only prints. It is documented in tests/fuzz/ubsan_suppressions.txt and the parser is intentionally left unchanged. Seeds: committed corpus/ts/* built by gen_seed_corpus.py, whose TS packets mirror the test_ts_parser.cpp builders byte-for-byte. Verified: cmake -S . -B build-fuzz -DSLS_FUZZ=ON -DCMAKE_CXX_COMPILER=clang++ builds the target; a 30s run over the seed corpus does 4.57M execs, exit 0, no crashes.
Feed arbitrary bytes (null-terminated as a C string, exactly as a streamid arrives on the SRT handshake) into sls_parse_streamid, the pre-accept safety gate sls_validate_sid_format, sls_canonical_sid_key, and the peer-scoped sls_reject_cache_key. This locks the streamid validation contract the publisher/player listen callbacks depend on: path separators, control bytes, bare "."/"..", and url-significant characters stay rejected, and byte-variants of one logical streamid keep collapsing to one cache key. The harness adds no validation logic. Seeds: corpus/streamid/* (valid std/bare/whitespace, path-traversal, control-byte, url-injection, missing-key) generated by gen_seed_corpus.py. Verified: 30s run over the seed corpus does 196k execs, exit 0, no crashes.
Feed arbitrary text into the config-input parsers that sit on the operator boundary: the listen port-list parser sls_parse_port_list (primary target), the strtok-based tokenizer sls_conf_string_split behind portlist/ipset/ upstreams, and the scalar/string value setters (int/double/bool/string/ portlist/upstreams), each into a correctly-typed, offset-0 destination so the strtol/strtod range checks and the memcpy bounds run on a real buffer. This locks the validation contract: reversed ranges, out-of-range, and non-numeric specs stay rejected; no setter writes out of bounds. The harness adds no parser logic. sls_parse_port_list dedups via a linear std::find per port, so one VALID range like "1-65535" expands O(n^2) — a single unit costs ~tens of seconds of pure CPU. That is an algorithmic property of operator-trusted, parse-once config (not a network-boundary input), and the parser is off-limits to change here, so the harness upper-bounds the expansion itself (port_expansion_within_bound, >8192 ports => skip both expanding entry points) and keeps its signal on the validation + memory-safety paths. Seeds: corpus/conf/* (single/list/range/dedupe/whitespace, reversed-range, out-of-range, non-numeric, dash-only, bool/int/double/quoted/upstreams) generated by gen_seed_corpus.py. Verified: 30s run over the seed corpus does 7002 execs, slowest unit 0s, exit 0, no crashes; the prior 58s worst-case unit is now 0.08s.
that peer-scoped reject-cache key (ceralive 610eb37) was not ported to this fork (it was entangled with the per-IP rate limit we skipped). the harness still fuzzes sls_parse_streamid, sls_validate_sid_format and sls_canonical_sid_key.
from ceralive 7ffa1e9 (CMakeLists part; the clang-tidy CI job itself landed with the combined ci workflow). emits compile_commands.json so clang-tidy/clangd have a compilation database. caller can still override with -DCMAKE_EXPORT_COMPILE_COMMANDS=OFF.
(cherry picked from commit ed2796d)
… already shared_ptr) Replace the four raw new[]/delete[] map arrays (CSLSMapData, CSLSMapPublisher, two CSLSMapRelay) with std::vector<>, and the raw new/delete CSLSRoleList *container* with std::unique_ptr, so cleanup is exception-safe. Role ELEMENTS were already std::shared_ptr<CSLSRole>; element ownership is untouched. The map element types hold a CSLSMutex (non-movable), so start() assigns a freshly constructed vector (default-insert, no element moves) rather than resize(); the vectors are sized once and never resized, so the &m_map_*[i] pointers handed to listeners stay stable for the manager's lifetime. stop() clear()s the vectors / reset()s the rolelist (after the worker loop, which holds raw pointers into them); the dtor frees the rest by RAII. ASan startup+shutdown: map arrays + rolelist leak-clean. (The residual conf.cpp:695 new-delete-type-mismatch is a pre-existing config-system non-virtual-dtor bug, reproduced on a pre-change build; not in this diff.) Evidence: .omo/evidence/task-20-srt-quality-hardening.log. (cherry picked from commit 19acb52)
… the 169e16c fix) Convert the manual setup_fail lambda (which 169e16c added to plug the fd/ addrinfo leak) to RAII so a FUTURE early-return cannot regress it: - addrinfo owned by std::unique_ptr<addrinfo, decltype(&freeaddrinfo)> - the SRT socket owned by a small SrtFdGuard scope-guard Each sockopt-failure branch now just returns SLS_ERROR; the guards release on scope exit. The successful-bind path is unchanged in behavior: s->fd = fd then fd_guard.armed = false hands the live socket to s->fd, and ai_guard frees the addrinfo at return exactly as the old explicit freeaddrinfo did. srt_profile + listener_profile ctests (which bind real L1/L2/L3 listeners and read the sockopts back) PASS -> success path intact. ASan with a forced bind failure (privileged port as non-root): no fd/addrinfo leak. Evidence: .omo/evidence/task-22-srt-quality-hardening.log.
ported from ceralive bd2d017 (the FEC profile/e2e harness it sat alongside was not ported). stats_snapshot.sh runs a real srt loopback (one publisher + player) and diffs the normalized /stats, /healthz and /disconnect JSON against committed fixtures, locking the control-plane API contract. fixtures were regenerated against this fork's server, so they capture our shape: the NAK/retransmit counters, no FEC profile fields, and the open ?publisher= read. added a CI job to run it against the irlserver/srt belabox fork.
…s is the type set in API-socket-options.md
the client-supplied stream name is spliced into the upstream streamid of pull/push relay urls. unencoded, a streamid carrying a query (e.g. a legacy auth token) lets a publisher inject relay socket options (streamid, passphrase, latency) into our leg. encode it at the sink so the token stays a single opaque value; cxxurl decodes it, so the upstream still receives the literal id and normal names are unchanged. update sid-format tests to match: '?'/'=' are now allowed in a component (handled at the relay sink), while &/#/%/space stay rejected.
|
Important Review skippedToo many files! This PR contains 153 files, which is 3 over the limit of 150. To get a review, narrow the scope: Upgrade to a paid plan to raise the limit. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (5)
📒 Files selected for processing (153)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
No description provided.