[Bug]:Paper2Code pipeline generates 0 files — LLM returns OpenRouter HTML instead of analysis, file_tree corrupted with <!DOCTYPE html>

### Do you need to file an issue?

- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.

### Describe the bug

# Bug Report: Paper2Code Pipeline Generates 0 Files (file_tree contains OpenRouter HTML)

## Description

When running the paper-to-code pipeline (`main_cli.py --file paper.pdf`), the planning phase produces a valid YAML plan with expected file structure, but the implementation phase generates **0 files**. The `code_implementation_report.txt` shows `files_completed: 0, total_files: 0` and `results.file_tree` contains raw HTML from the OpenRouter website (`<!DOCTYPE html><html lang="en">...`) instead of a parsed file tree.

## Environment

- **DeepCode version**: v1.2.0 (cloned from HKUDS/DeepCode, commit `840bf2fc866e04c8161f28a8a80e36ee87ee923c`)
- **Python**: 3.14.5
- **OS**: Windows
- **Config**: Custom `deepcode_config.json` with DeepSeek V4 Pro (default) + OpenRouter FREE agents

## Steps to Reproduce

1. Clone DeepCode: `git clone https://github.com/HKUDS/DeepCode.git`
2. Configure `deepcode_config.json` with valid API keys
3. Run:
```bash
python cli/main_cli.py --file paper.pdf --no-plan-review --optimized --verbose
```
4. Check results:
```bash
ls deepcode_lab/tasks/*/generate_code/  # Empty
cat deepcode_lab/tasks/*/code_implementation_report.txt
```

## Expected Behavior

The pipeline should:
1. Analyze the paper content
2. Generate a proper YAML code plan with file structure
3. Create the file tree (directories and files) in `generate_code/`
4. Implement code files with actual content

## Actual Behavior

1. Planning phase generates 3 attempts, ALL returning **101,547 chars** of HTML from `https://openrouter.ai` (OpenRouter website).
2. The `usage` field shows `prompt_tokens: 0, completion_tokens: 0` across all attempts — indicating no real LLM call happened.
3. Planning falls back to `coerce_text_to_minimal_plan()` which wraps the HTML in a valid YAML plan with **hardcoded file names** (README.md, src/main.py, src/pipeline.py, tests/test_pipeline.py).
4. The `planner_analysis` field in the YAML plan contains the raw HTML.
5. Implementation phase (`create_file_structure`) sends this plan to the structure generator LLM.
6. The LLM receives a prompt contaminated with HTML, fails to parse it, and returns empty results.
7. `code_implementation_report.txt`: `files_completed: 0, total_files: 0, abort_reason: 'all planned files implemented'`

## Key Log Snippets

### planning_attempts.jsonl (all 3 attempts):
```json
{
  "attempt": 1,
  "result_chars": 101547,
  "usage": {"prompt_tokens": 0, "completion_tokens": 0},
  "tools_used": []
}
```

### code_implementation_report.txt (results.file_tree):
```
'<!DOCTYPE html><html lang="en"><head>...OpenRouter...The model "chat/completions" is not available...</html>'
```

### Console output:
```
⚠️  Last line suspicious: '"...$L38","18",{}]]\n"])</script></body></html>'
📋 Required sections: 0/5
📏 Content length: 101547 chars
?? Output completeness score: 0.15/1.0
```

## Root Cause Analysis

The bug has **two layers**:

### Layer 1: LLM call returning HTML instead of analysis

The planning agent is configured to use a specific provider/model (e.g., `planning` agent → `deepseek` / `deepseek-v4-pro`), but the actual LLM call returns HTML from OpenRouter's website. The HTML content contains:

> `"The model 'chat/completions' is not available"`

This indicates the **model name is being resolved to the literal string `chat/completions`** (the API endpoint path) instead of the configured model name. This suggests a string interpolation bug where `model_name` is being overwritten with the URL path somewhere in the provider resolution chain.

**Evidence**: The 101,547 char HTML is identical across all 3 attempts with `prompt_tokens=0`, meaning no tokens were actually consumed — the LLM API was never truly called. The result comes from a cached or pre-resolved error state.

### Layer 2: Fallback mechanism embeds HTML into valid YAML plan

When the LLM returns invalid content (HTML instead of YAML), the fallback function `coerce_text_to_minimal_plan()` in `workflows/planning_runtime.py:174` wraps the **raw HTML** into a structurally valid YAML plan:

```python
payload = {
    "file_structure": {
        "root": "generate_code",
        "files": [
            {"path": "README.md", "purpose": "..."},
            {"path": "src/main.py", "purpose": "..."},
            {"path": "src/pipeline.py", "purpose": "..."},
            {"path": "tests/test_pipeline.py", "purpose": "..."},
        ],
    },
    "implementation_strategy": {
        "planner_analysis": summary or "Planner did not return usable analysis.",
        # ^^^ This contains raw HTML from OpenRouter
    },
}
```

The downstream `create_file_structure()` in `code_implementation_workflow.py:228` then passes this HTML-contaminated plan to the structure generator LLM, which fails to understand it and produces no files.

## Suggested Fix

### Fix 1: Detect and reject HTML responses in the planning loop
In `workflows/agent_orchestration_engine.py`, before processing the LLM result (around line 710), add a check:

```python
# Detect HTML response (OpenRouter website returned instead of LLM content)
if result.strip().startswith("<!DOCTYPE html") or "<html" in result[:200]:
    attempt_record["error"] = "LLM returned HTML instead of analysis content"
    append_planning_attempt(paper_dir, attempt_record)
    raise RuntimeError("LLM returned HTML — provider/model resolution issue")
```

### Fix 2: Sanitize planner_analysis in coerce_text_to_minimal_plan
In `workflows/planning_runtime.py:174`, strip or reject HTML content before embedding it:

```python
def coerce_text_to_minimal_plan(text: str, *, paper_dir: str | Path) -> str:
    summary = (text or "").strip()
    # Reject HTML content
    if summary.startswith("<!DOCTYPE html") or summary.startswith("<html"):
        summary = "Planner returned HTML instead of analysis. Possible model/provider resolution error."
    # ...rest of function
```

### Fix 3: Debug the provider/model resolution
The root cause is that `model` resolves to `chat/completions` instead of the configured model name. Investigate in `core/config.py` and `core/providers/openai_compat.py` to find where the model name gets replaced by the URL endpoint path. Check if `apiBase` is being incorrectly used as the `model` parameter.

## Workaround

Use the `--chat` mode instead of `--file` for code generation, which uses a different code path (`execute_chat_based_planning_pipeline` → `run_chat_planning_agent`) that doesn't trigger this bug:

```bash
python cli/main_cli.py --chat "Implement your requirements here"
```

## Additional Notes

- The `defaults` agent configuration in `deepcode_config.json` must use `provider: "deepseek"` with model `deepseek-v4-pro` (not `deepseek/deepseek-v4-flash:free` via OpenRouter) for the planning phase to work correctly.
- The chat pipeline works because it uses `attach_workflow_llm()` with phase `"planning"` which resolves differently than the paper pipeline's `_generate_plan_with_single_agent()`.


### Steps to reproduce

_No response_

### Expected Behavior

_No response_

### DeepCode Config Used

# Paste your config here


### Logs and screenshots

_No response_

### Additional Information

- DeepCode Version:
- Operating System:
- Python Version:
- Related Issues:


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]:Paper2Code pipeline generates 0 files — LLM returns OpenRouter HTML instead of analysis, file_tree corrupted with <!DOCTYPE html> #134

Do you need to file an issue?

Describe the bug

Bug Report: Paper2Code Pipeline Generates 0 Files (file_tree contains OpenRouter HTML)

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Log Snippets

planning_attempts.jsonl (all 3 attempts):

code_implementation_report.txt (results.file_tree):

Console output:

Root Cause Analysis

Layer 1: LLM call returning HTML instead of analysis

Layer 2: Fallback mechanism embeds HTML into valid YAML plan

Suggested Fix

Fix 1: Detect and reject HTML responses in the planning loop

Fix 2: Sanitize planner_analysis in coerce_text_to_minimal_plan

Fix 3: Debug the provider/model resolution

Workaround

Additional Notes

Steps to reproduce

Expected Behavior

DeepCode Config Used

Paste your config here

Logs and screenshots

Additional Information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]:Paper2Code pipeline generates 0 files — LLM returns OpenRouter HTML instead of analysis, file_tree corrupted with <!DOCTYPE html> #134

Description

Do you need to file an issue?

Describe the bug

Bug Report: Paper2Code Pipeline Generates 0 Files (file_tree contains OpenRouter HTML)

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Key Log Snippets

planning_attempts.jsonl (all 3 attempts):

code_implementation_report.txt (results.file_tree):

Console output:

Root Cause Analysis

Layer 1: LLM call returning HTML instead of analysis

Layer 2: Fallback mechanism embeds HTML into valid YAML plan

Suggested Fix

Fix 1: Detect and reject HTML responses in the planning loop

Fix 2: Sanitize planner_analysis in coerce_text_to_minimal_plan

Fix 3: Debug the provider/model resolution

Workaround

Additional Notes

Steps to reproduce

Expected Behavior

DeepCode Config Used

Paste your config here

Logs and screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions