Harden commit trailer subprocess handling and align trailer I/O paths#2125
Harden commit trailer subprocess handling and align trailer I/O paths#2125
Conversation
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/1a855cb6-0111-4f52-b48d-46417aec5bde Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/1a855cb6-0111-4f52-b48d-46417aec5bde Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/3d3e7ffc-d3af-478e-9c6c-128731cdd102 Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR refactors how commit trailers are generated/parsed via git interpret-trailers, aiming to harden subprocess handling and to better align trailer processing with commit message encoding behavior.
Changes:
- Centralize
git interpret-trailersinvocation in a newCommit._interpret_trailers()helper and reuse it from both trailer parsing and trailer application. - Adjust trailer parsing (
trailers_list) and trailer application (create_from_tree) to use the shared helper. - Add a test that exercises trailer creation/parsing when
i18n.commitencodingis set to a non-default encoding (ISO-8859-1).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
git/objects/commit.py |
Introduces _interpret_trailers() helper and updates trailer parse/apply paths to use it. |
test/test_commit.py |
Adds coverage for trailer handling when i18n.commitencoding is configured to ISO-8859-1. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
git/objects/commit.py
Outdated
| ) | ||
| trailer: str = proc.communicate(str(self.message).encode())[0].decode("utf8") | ||
| trailer = trailer.strip() | ||
| trailer = self._interpret_trailers(self.repo, self.message, ["--parse"]).strip() |
There was a problem hiding this comment.
trailers_list calls _interpret_trailers without passing/using the commit's actual encoding (self.encoding). Since commit messages can be configured via i18n.commitencoding, trailer parsing should encode/decode using that encoding to avoid mis-decoding or UnicodeDecodeError for non-UTF-8 commit messages. Consider passing self.encoding through to _interpret_trailers (or have the helper derive the encoding from the commit/repo config).
| trailer = self._interpret_trailers(self.repo, self.message, ["--parse"]).strip() | |
| proc = Popen( | |
| [Git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", "--parse"], | |
| stdin=PIPE, | |
| stdout=PIPE, | |
| stderr=PIPE, | |
| cwd=self.repo.working_tree_dir or self.repo.git_dir, | |
| ) | |
| stdout, _stderr = proc.communicate(self.message.encode(self.encoding)) | |
| finalize_process(proc) | |
| trailer = stdout.decode(self.encoding).strip() |
git/objects/commit.py
Outdated
| def _interpret_trailers(cls, repo: "Repo", message: Union[str, bytes], trailer_args: Sequence[str]) -> str: | ||
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | ||
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | ||
| cmd, | ||
| as_process=True, | ||
| istream=PIPE, | ||
| ) | ||
| message_bytes = message if isinstance(message, bytes) else message.encode(cls.default_encoding, errors="strict") | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| finalize_process(proc) | ||
| return stdout_bytes.decode(cls.default_encoding, errors="strict") |
There was a problem hiding this comment.
_interpret_trailers always encodes/decodes using cls.default_encoding (UTF-8) even when the repo/commit uses a different i18n.commitencoding. This can break trailer handling for commits whose bytes are not valid UTF-8 (e.g., ISO-8859-1), especially if callers pass bytes. Recommend adding an explicit encoding parameter (defaulting to the repo’s configured commit encoding) and using it consistently for both .encode() and .decode().
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
There was a problem hiding this comment.
create_from_tree has already computed conf_encoding from i18n.commitencoding, but trailer application calls _interpret_trailers without using it. To keep trailer I/O aligned with how the commit will be serialized, pass conf_encoding (or have _interpret_trailers derive it from the repo) when applying trailers.
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| message = cls._interpret_trailers( | |
| repo, | |
| str(message), | |
| trailer_args, | |
| conf_encoding=conf_encoding, | |
| ) |
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/519084d5-d5e2-4486-a9cc-5c258e596e13 Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/519084d5-d5e2-4486-a9cc-5c258e596e13 Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
There was a problem hiding this comment.
In create_from_tree(), trailers are applied via _interpret_trailers() without passing the repo-configured commit encoding (conf_encoding). This means interpret-trailers I/O is always UTF-8 even when the commit will be serialized using a different encoding, which is inconsistent with trailers_list() and can produce different on-disk bytes. Pass encoding=conf_encoding when calling _interpret_trailers here (and avoid double str() conversion if possible).
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| if not isinstance(message, str): | |
| message = str(message) | |
| message = cls._interpret_trailers(repo, message, trailer_args, encoding=conf_encoding) |
git/objects/commit.py
Outdated
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | ||
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | ||
| cmd, | ||
| as_process=True, | ||
| istream=PIPE, | ||
| ) | ||
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| finalize_process(proc) | ||
| return stdout_bytes.decode(encoding, errors="strict") |
There was a problem hiding this comment.
_interpret_trailers() starts the subprocess before encoding message to bytes. If message.encode(...) raises (e.g., UnicodeEncodeError under a legacy commitencoding), the git process is left running until GC/AutoInterrupt finalization. Compute message_bytes before launching the process, or wrap subprocess lifetime in try/finally to ensure it is waited/terminated on all failure paths.
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | |
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | |
| cmd, | |
| as_process=True, | |
| istream=PIPE, | |
| ) | |
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| finalize_process(proc) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| message_bytes = message if isinstance(message, bytes) else message.encode(encoding, errors="strict") | |
| cmd = [repo.git.GIT_PYTHON_GIT_EXECUTABLE, "interpret-trailers", *trailer_args] | |
| proc: Git.AutoInterrupt = repo.git.execute( # type: ignore[call-overload] | |
| cmd, | |
| as_process=True, | |
| istream=PIPE, | |
| ) | |
| try: | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc) |
Agent-Logs-Url: https://github.com/gitpython-developers/GitPython/sessions/3cc0bd6d-d54d-4299-9a18-1576c2a91c12 Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| stdout_bytes, _ = proc.communicate(message_bytes) | ||
| return stdout_bytes.decode(encoding, errors="strict") | ||
| finally: | ||
| finalize_process(proc) |
There was a problem hiding this comment.
In _interpret_trailers, proc.communicate() captures stderr but it’s discarded, and then finalize_process(proc) may raise GitCommandError without including the subprocess stderr (pipes may already be drained/closed). Capture stderr_bytes from communicate() and pass it into finalize_process(proc, stderr=stderr_bytes) so failures surface with a useful error message.
| try: | |
| stdout_bytes, _ = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc) | |
| stderr_bytes = None | |
| try: | |
| stdout_bytes, stderr_bytes = proc.communicate(message_bytes) | |
| return stdout_bytes.decode(encoding, errors="strict") | |
| finally: | |
| finalize_process(proc, stderr=stderr_bytes) |
| stdout_bytes, _ = proc.communicate(str(message).encode()) | ||
| finalize_process(proc) | ||
| message = stdout_bytes.decode("utf8") | ||
| message = cls._interpret_trailers(repo, str(message), trailer_args) |
There was a problem hiding this comment.
create_from_tree reads i18n.commitencoding into conf_encoding, but the trailer application path calls _interpret_trailers(...) without passing that encoding (so it always encodes/decodes as UTF-8). To keep trailer I/O aligned with the configured commit encoding (and avoid byte-length differences affecting wrapping/formatting), pass encoding=conf_encoding when calling _interpret_trailers here.
| message = cls._interpret_trailers(repo, str(message), trailer_args) | |
| message = cls._interpret_trailers(repo, str(message), trailer_args, encoding=conf_encoding) |
Uh oh!
There was an error while loading. Please reload this page.