Conversation
…torch#1002) Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@gmail.com> Co-authored-by: Nicolas Hug <nicolashug@meta.com>
Co-authored-by: Molly Xu <mollyxu@fb.com>
Co-authored-by: Molly Xu <mollyxu@fb.com>
Co-authored-by: Molly Xu <mollyxu@fb.com>
Co-authored-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@gmail.com> Co-authored-by: Nicolas Hug <nicolashug@meta.com>
Co-authored-by: Molly Xu <mollyxu@fb.com> Closes meta-pytorch#1009
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Differential Revision: D87109526 Pull Request resolved: meta-pytorch#1059
Co-authored-by: Molly Xu <mollyxu@fb.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates TorchCodec’s packaging/build/test infrastructure (including FFmpeg variant handling and wheel testing), introduces new core decoding/encoding capabilities (notably NVDEC cache controls, transforms, and WAV decoding), and refreshes documentation/examples to match the new APIs and hosting.
Changes:
- Added multi-FFmpeg-version CMake package config and refactored FFmpeg target creation utilities.
- Expanded core/runtime functionality (stable-ABI refactors, decoder/encoder enhancements, NVDEC cache capacity API, transforms/rotation/resize/crop plumbing, WAV decoder support).
- Updated CI workflows, docs theme and site links, examples, benchmarks, and added new test/reference resources.
Reviewed changes
Copilot reviewed 128 out of 154 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| test/resources/test_non_zero_start.mp4.stream0.all_frames_info.json | Adds new reference frame timing data for tests |
| test/resources/sine_16ch_s16.wav.stream0.all_frames_info.json | Adds new audio reference frame timing data |
| test/resources/nasa_13013_rotated.mp4.stream0.all_frames_info.json | Adds rotated-video reference timing data |
| test/generate_reference_resources.py | Improves frame generation filtergraph construction and adds new NASA reference generators |
| test/conftest.py | Adds ffmpeg-cli marker and fbcode collection behavior |
| test/MetadataTest.cpp | Adds C++ unit tests for StreamMetadata fallback logic |
| test/CMakeLists.txt | Bumps C++ standard and wires up new C++ test target |
| src/torchcodec/transforms/init.py | Exposes decoder transform classes in public module |
| src/torchcodec/share/cmake/TorchCodec/ffmpeg_versions.cmake | Adds CMake helpers to define FFmpeg-versioned targets |
| src/torchcodec/share/cmake/TorchCodec/TorchCodecConfig.cmake | Adds install-time CMake package config for consumers |
| src/torchcodec/samplers/_time_based.py | Modernizes typing and makes timestamp tensors float64 for precision |
| src/torchcodec/samplers/_index_based.py | Modernizes typing with PEP 604 unions |
| src/torchcodec/samplers/_common.py | Switches to collections.abc and PEP 604 union types |
| src/torchcodec/encoders/_audio_encoder.py | Modernizes typing (PEP 604) in encoder API |
| src/torchcodec/decoders/_wav_decoder.py | Adds a Python WAV-only decoder binding |
| src/torchcodec/decoders/_decoder_utils.py | Adds NVDEC cache capacity API wrappers; removes generic decoder factory from this layer |
| src/torchcodec/decoders/_audio_decoder.py | Refactors AudioDecoder creation through core decoder utils |
| src/torchcodec/decoders/init.py | Exposes new CUDA/NVDEC cache APIs and CpuFallbackStatus |
| src/torchcodec/_samplers/video_clip_sampler.py | Typing updates and core API import refactor |
| src/torchcodec/_internally_replaced_utils.py | Adds robust shared-library loading across FFmpeg variants with tracebacks |
| src/torchcodec/_frame.py | Typing refactor to collections.abc and PEP 604 unions |
| src/torchcodec/_core/pybind_ops.cpp | Adjusts include paths for in-tree headers |
| src/torchcodec/_core/fetch_and_expose_non_gpl_ffmpeg_libs.cmake | Adds Linux aarch64 FFmpeg fetch support and reuses shared ffmpeg target helper |
| src/torchcodec/_core/_decoder_utils.py | Introduces core-level decoder factories (audio/video/wav) and transform spec handling |
| src/torchcodec/_core/init.py | Updates exported ops and adds NVDEC cache/wav functions |
| src/torchcodec/_core/WavDecoder.h | Adds C++ WAV decoder interface |
| src/torchcodec/_core/ValidationUtils.h | Adds uint64->streampos validation API |
| src/torchcodec/_core/ValidationUtils.cpp | Implements streampos validation and uses stable ABI checks |
| src/torchcodec/_core/Transform.h | Expands transform system (crop center mode, rotation, validation signature changes) |
| src/torchcodec/_core/Transform.cpp | Implements crop validation updates and adds rotation transform logic |
| src/torchcodec/_core/SwScale.h | Adds SwScale wrapper for two-pass conversion+resize pipeline |
| src/torchcodec/_core/SwScale.cpp | Implements SwScale conversion pipeline |
| src/torchcodec/_core/StreamOptions.h | Expands encoding options and adopts stable ABI device type |
| src/torchcodec/_core/StableABICompat.h | Adds stable-ABI utilities/helpers and visibility macros |
| src/torchcodec/_core/NVDECCacheConfig.h | Adds global NVDEC cache capacity config interface |
| src/torchcodec/_core/NVDECCacheConfig.cpp | Implements NVDEC cache capacity config with eviction |
| src/torchcodec/_core/NVDECCache.h | Refactors NVDEC cache to LRU multimap and stable device types |
| src/torchcodec/_core/NVDECCache.cpp | Implements LRU eviction and cross-device eviction helpers |
| src/torchcodec/_core/NVCUVIDRuntimeLoader.cpp | Converts checks to stable-ABI-safe macro and adjusts includes |
| src/torchcodec/_core/Metadata.h | Extends metadata model and adds computed fallback methods and color metadata |
| src/torchcodec/_core/Metadata.cpp | Implements StreamMetadata fallback/computed methods |
| src/torchcodec/_core/Frame.h | Converts outputs to torch::stable::Tensor and stable device types |
| src/torchcodec/_core/Frame.cpp | Updates tensor allocation to stable ABI APIs |
| src/torchcodec/_core/FilterGraph.h | Renames filter context struct to FiltersConfig |
| src/torchcodec/_core/FilterGraph.cpp | Refactors filtergraph setup, adds more robust checks and naming |
| src/torchcodec/_core/FFMPEGCommon.h | Adds UniqueAVDictionary, rotation extraction, and SwsConfig changes |
| src/torchcodec/_core/Encoder.h | Refactors encoders to stable ABI tensors; adds MultiStreamEncoder API |
| src/torchcodec/_core/DeviceInterface.h | Refactors device interface to stable ABI types and adds encoding hooks |
| src/torchcodec/_core/DeviceInterface.cpp | Updates device parsing and stable tensor conversion, improves checks |
| src/torchcodec/_core/CudaDeviceInterface.h | Updates interface for stable ABI and encoding support |
| src/torchcodec/_core/CpuDeviceInterface.h | Adds audio init/flush, SwScale, encoding conversion hooks, and config caching |
| src/torchcodec/_core/Cache.h | Refactors cache to stable ABI device and checks |
| src/torchcodec/_core/CUDACommon.h | Refactors CUDA helpers to stable ABI tensor/device and cudaStream_t |
| src/torchcodec/_core/BetaCudaDeviceInterface.h | Refactors to stable ABI and adds rotation support surface |
| src/torchcodec/_core/AVIOTensorContext.h | Refactors tensor I/O to stable ABI and exports visibility |
| src/torchcodec/_core/AVIOTensorContext.cpp | Updates stable tensor reads/writes and uses stableCat |
| src/torchcodec/_core/AVIOFileLikeContext.h | Adjusts includes for in-tree header layout |
| src/torchcodec/_core/AVIOFileLikeContext.cpp | Converts checks to stable-ABI-safe macro |
| src/torchcodec/_core/AVIOContextHolder.h | Adjusts includes for in-tree header layout |
| src/torchcodec/_core/AVIOContextHolder.cpp | Converts checks to stable-ABI-safe macro and improves errors |
| src/torchcodec/init.py | Exposes encoders/transforms at package top-level |
| setup.py | Adds env flag for Homebrew rpath handling and adjusts version file logic |
| pyproject.toml | Bumps minimum Python to 3.10 and updates tooling/pytest config |
| packaging/remove_src.sh | Adds helper script to ensure tests run against installed wheel |
| packaging/install_torchcodec_wheel.sh | Adds helper script to locate/install built wheel artifacts |
| packaging/install_test_dependencies.sh | Adds helper script for Python test dependencies |
| packaging/install_pytorch.sh | Adds helper script to install PyTorch from appropriate channel |
| packaging/install_ffmpeg.sh | Adds helper script to install FFmpeg via conda-forge after asserting absence |
| mypy.ini | Enables follow_untyped_imports |
| examples/encoding/audio_encoding.py | Adds sphinx-gallery thumbnail metadata |
| examples/decoding/sampling.py | Updates Optional typing to PEP 604 |
| examples/decoding/parallel_decoding.py | Typing modernization and adds sphinx-gallery thumbnail metadata |
| examples/decoding/file_like.py | Adds sphinx-gallery thumbnail metadata |
| examples/decoding/custom_frame_mappings.py | Adds thumbnail metadata and reformats ffprobe command |
| examples/decoding/basic_example.py | Updates Optional typing to PEP 604 |
| examples/decoding/basic_cuda_example.py | Updates README links and adds CPU fallback detection section |
| examples/decoding/audio_decoding.py | Adds thumbnail metadata |
| examples/decoding/approximate_mode.py | Documentation wording improvements and adds thumbnail metadata |
| docs/source/index.rst | Updates links/structure, adds new examples and API ref entrypoints |
| docs/source/conf.py | Migrates to pytorch_sphinx_theme2 and adds sitemap/mermaid/version switcher config |
| docs/source/api_ref_transforms.rst | Adds transforms API reference page |
| docs/source/api_ref_encoders.rst | Adds VideoEncoder to encoder API docs |
| docs/source/api_ref_decoders.rst | Adds metadata summaries, NVDEC cache APIs, and CpuFallbackStatus docs |
| docs/source/api_ref.rst | Adds central API Reference toctree |
| docs/source/_templates/layout.html | Removes old theme layout override |
| docs/source/_static/css/custom_torchcodec.css | Removes custom CSS for old theme |
| docs/requirements.txt | Updates doc build dependencies for new Sphinx/theme stack |
| benchmarks/encoders/benchmark_encoders.py | Adds encoder benchmarking script |
| benchmarks/decoders/memprofile_decoders.py | Updates imports split between _core and _core.ops |
| benchmarks/decoders/gpu_benchmark.py | Refactors to use direct _core/_core.ops imports |
| benchmarks/decoders/benchmark_transforms.py | Updates transforms benchmark to use VideoDecoder transforms interface |
| benchmarks/decoders/benchmark_decoders_library.py | Refactors imports split between _core and _core.ops |
| README.md | Updates docs links/FFmpeg guidance/compat table and CPU/CUDA install instructions |
| CONTRIBUTING.md | Updates FFmpeg install guidance and adds local docs serving instructions |
| CMakeLists.txt | Defines LINUX variable globally for internal builds |
| .pre-commit-config.yaml | Adds pyupgrade hook for py310+ in src/test |
| .github/workflows/windows_wheel.yaml | Refactors wheel test job to use new packaging scripts and removes src/ |
| .github/workflows/reference_resources.yaml | Refactors reference generation job to use packaging scripts |
| .github/workflows/macos_wheel.yaml | Refactors wheel test job to use packaging scripts and removes src/ |
| .github/workflows/linux_wheel.yaml | Refactors wheel test jobs, adds third-party-interface job, uses packaging scripts |
| .github/workflows/linux_cuda_aarch64_wheel.yaml | Adds new workflow for Linux CUDA aarch64 wheel build/test |
| .github/workflows/lint.yaml | Updates actions versions and uses packaging scripts for PyTorch install |
| .github/workflows/docs.yaml | Removes previous docs workflow |
| .github/workflows/cpp_tests.yaml | Updates actions versions and uses packaging scripts for PyTorch install |
| .github/workflows/build_ffmpeg.yaml | Adds Linux aarch64 FFmpeg build job |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if in_fbcode(): | ||
| # fbcode doesn't like skipping tests, so instead we just don't collect the test | ||
| # so that they don't even "exist", hence the continue statements. | ||
| if needs_ffmpeg_cli or has_skip_marker or skipif_condition_is_true: | ||
| continue |
There was a problem hiding this comment.
In pytest_collection_modifyitems(), using continue does not remove the item from the items list, so the test will still be collected and executed. To actually avoid collection, filter items in-place (e.g., rebuild a list of kept items and assign to items[:]) or add an unconditional skip marker in fbcode rather than continuing.
| if (LINUX) | ||
| if (ffmpeg_major_version EQUAL 4) | ||
| set(library_file_names libavutil.so.56 libavcodec.so.58 libavformat.so.58 libavdevice.so.58 libavfilter.so.7 libswscale.so.5 libswresample.so.3) | ||
| elseif (ffmpeg_major_version EQUAL 5) | ||
| set(library_file_names libavutil.so.57 libavcodec.so.59 libavformat.so.59 libavdevice.so.59 libavfilter.so.8 libswscale.so.6 libswresample.so.4) | ||
| elseif (ffmpeg_major_version EQUAL 6) | ||
| set(library_file_names libavutil.so.58 libavcodec.so.60 libavformat.so.60 libavdevice.so.60 libavfilter.so.9 libswscale.so.7 libswresample.so.4) | ||
| elseif (ffmpeg_major_version EQUAL 7) | ||
| set(library_file_names libavutil.so.59 libavcodec.so.61 libavformat.so.61 libavdevice.so.61 libavfilter.so.10 libswscale.so.8 libswresample.so.5) | ||
| elseif (ffmpeg_major_version EQUAL 8) | ||
| set(library_file_names libavutil.so.60 libavcodec.so.62 libavformat.so.62 libavdevice.so.62 libavfilter.so.11 libswscale.so.9 libswresample.so.6) | ||
| endif() |
There was a problem hiding this comment.
LINUX is not a standard CMake platform variable for downstream consumers, so on Linux this will often evaluate false and fall through to the 'unsupported OS' fatal error path. Use CMake’s built-in platform conditionals instead (e.g., if(UNIX AND NOT APPLE) for Linux) or CMAKE_SYSTEM_NAME checks inside this helper, since this file is intended to be used from an installed package config.
| if (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|arm64|ARM64") | ||
| set( | ||
| platform_url | ||
| ${base_url}/linux_aarch64 | ||
| ) | ||
|
|
||
| set( |
There was a problem hiding this comment.
This CMake logic contains a duplicated if (CMAKE_SYSTEM_PROCESSOR MATCHES ...) block (one starting at line 18 and another at line 24) without a clearly corresponding endif() for the first one in the shown hunk. As written, this is very likely to produce a CMake parse error or unintended nesting/branching. Remove the duplicate block and ensure the if/else/endif structure is balanced and only sets platform_url once.
| if (CMAKE_SYSTEM_PROCESSOR MATCHES "aarch64|arm64|ARM64") | |
| set( | |
| platform_url | |
| ${base_url}/linux_aarch64 | |
| ) | |
| set( | |
| set( |
| Rotation rotationFromDegrees(std::optional<double> degrees) { | ||
| if (!degrees.has_value()) { | ||
| return Rotation::NONE; | ||
| } | ||
| // Round to nearest multiple of 90 degrees | ||
| int rounded = static_cast<int>(std::round(*degrees / 90.0)) * 90; |
There was a problem hiding this comment.
std::round requires including <cmath>. This file’s includes in the diff do not show <cmath>, which can cause compilation failures depending on transitive includes. Add an explicit #include <cmath> in this translation unit.
| default: | ||
| return ""; | ||
| } |
There was a problem hiding this comment.
The default branch silently returns an empty filter string, which can hide unsupported/invalid Rotation values and makes failures harder to diagnose. Prefer a checked default (e.g., STD_TORCH_CHECK(false, ...)) so unexpected enum values fail loudly.
| default: | |
| return ""; | |
| } | |
| } | |
| STD_TORCH_CHECK( | |
| false, "Unknown rotation: ", static_cast<int>(rotation_)); |
| 0, | ||
| config_.inputHeight, | ||
| dstPointers, | ||
| dstLinesizes); |
There was a problem hiding this comment.
The resize swscale pass result is not validated. If sws_scale returns a value different from config_.outputHeight, callers may get partially converted output without an error. Consider adding a STD_TORCH_CHECK validating the returned height matches the expected output height when needsResize_ is true.
| dstLinesizes); | |
| dstLinesizes); | |
| STD_TORCH_CHECK( | |
| colorConvertedHeight == config_.outputHeight, | |
| "Resize swscale pass failed: colorConvertedHeight != config_.outputHeight: ", | |
| colorConvertedHeight, | |
| " != ", | |
| config_.outputHeight); |
| # the content of `version.txt` plus some suffix like "+cpu" or "+cu112". | ||
| # See | ||
| # https://github.com/pytorch/test-infra/blob/61e6da7a6557152eb9879e461a26ad667c15f0fd/tools/pkg-helpers/pytorch_pkg_helpers/version.py#L113 | ||
| version = version.replace("+cpu", "") | ||
| with open(_ROOT_DIR / "version.txt", "w") as f: | ||
| f.write(f"{version}") | ||
| else: | ||
| with open(_ROOT_DIR / "version.txt") as f: | ||
| version = f.readline().strip() | ||
| try: |
There was a problem hiding this comment.
The +cpu suffix stripping is now only applied in the else: branch but not in the branch that writes version.txt. This creates inconsistent version normalization depending on whether version.txt already exists. If the intent is to always strip +cpu, apply the same normalization in both branches (before writing and after reading).
|
|
||
|
|
||
| class WavDecoder: | ||
| # TODO: Docstrings |
There was a problem hiding this comment.
WavDecoder is introduced as a new public-facing decoder class but has an explicit 'TODO: Docstrings' and no class/method docstrings. Please add at least a class docstring describing supported WAV constraints (e.g., uncompressed only), what metadata contains, and the return type/shape/dtype of get_all_samples() to make the API self-documenting.
| # TODO: Docstrings | |
| """Decode audio from a WAV file. | |
| This decoder supports a single audio stream from WAV input backed by the | |
| native WAV decoder. It is intended for standard, uncompressed WAV audio | |
| data (for example PCM-style WAV files). | |
| Args: | |
| source: Path to the WAV file to decode, as a string or ``pathlib.Path``. | |
| Attributes: | |
| metadata (AudioStreamMetadata): Metadata for the decoded audio stream. | |
| The populated fields are: | |
| - ``sample_rate``: sampling rate in Hz. | |
| - ``num_channels``: number of audio channels. | |
| - ``sample_format``: sample format reported by the decoder. | |
| - ``duration_seconds``: decoded stream duration in seconds. | |
| - ``stream_index``: stream index in the container. | |
| - ``codec``: codec name reported for the WAV audio stream. | |
| - ``bit_rate``: bit rate reported by the decoder. | |
| - ``duration_seconds_from_header``: duration derived from the WAV | |
| header when available. | |
| - ``begin_stream_seconds``: stream start time in seconds. | |
| - ``begin_stream_seconds_from_header``: always ``None`` for WAV, | |
| because the format does not provide stream start time metadata. | |
| Methods: | |
| get_all_samples(): Returns all decoded audio samples as a | |
| ``torch.Tensor`` with shape ``[num_channels, num_samples]`` and | |
| dtype ``torch.float32``. | |
| """ |
| ) | ||
| transformed_frames = decoder.get_frames_played_at(pts_seconds).data | ||
| assert len(transformed_frames) == len(pts_seconds) | ||
| return transformed_frames.data |
There was a problem hiding this comment.
Here transformed_frames is already a Tensor (because of the .data on line 77). Returning transformed_frames.data uses the deprecated/unsafe .data attribute and is redundant. Return transformed_frames directly to avoid surprising autograd semantics and keep the benchmark representative of normal tensor usage.
| return transformed_frames.data | |
| return transformed_frames |
No description provided.