Describe the bug
mcp-server-fetch hard-codes use_readability=True when calling readabilipy.simple_json.simple_json_from_html_string in extract_content_from_html (server.py, ~line 36). This silently shells out to a bundled Node.js script that imports Mozilla's @mozilla/readability. Node.js is not declared as a dependency in pyproject.toml, not mentioned in the README, and there is no fallback or timeout: if the Node subprocess hangs (e.g., due to an unrelated npm config issue, a slow Node startup, or Node being absent), fetch_url blocks indefinitely. The MCP client eventually times out and the agent receives a Connection lost: Timed out while waiting for response to ClientRequest error with no actionable detail.
To Reproduce
Steps to reproduce the behavior:
- Install in a Python-only environment: pip install mcp-server-fetch (no Node.js / npm configuration assumed).
- Spawn the server via stdio and call the fetch tool against any HTML page, e.g. https://en.wikipedia.org/wiki/Main_Page. Minimal harness:
import os, asyncio, time
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def main():
params = StdioServerParameters(
command="python3",
args=["-m", "mcp_server_fetch", "--ignore-robots-txt"],
env=dict(os.environ),
)
async with stdio_client(params) as (r, w):
async with ClientSession(r, w) as s:
await s.initialize()
t0 = time.time()
try:
res = await asyncio.wait_for(
s.call_tool("fetch", {"url": "https://en.wikipedia.org/wiki/Main_Page", "max_length": 2000}),
timeout=70,
)
print(f"OK in {time.time()-t0:.1f}s; isError={res.isError}")
except asyncio.TimeoutError:
print(f"TIMEOUT after {time.time()-t0:.1f}s")
asyncio.run(main())
- Observe: harness times out at 70s. Probing inside fetch_url via stderr logging shows the HTTP request itself completes in ~0.3s with status 200 and a 224 KB body; the hang is entirely inside
readabilipy.simple_json_from_html_string(html, use_readability=True).
- Workaround that confirms the diagnosis: in the installed package's mcp_server_fetch/server.py (resolvable via
python3 -c "import mcp_server_fetch, inspect; print(inspect.getfile(mcp_server_fetch))"), edit extract_content_from_html (~line 36) and change use_readability=True to use_readability=False:
ret = readabilipy.simple_json.simple_json_from_html_string(
html, use_readability=False # was True
)
The probe then completes in ~0.6–1.2s with markdown content. (use_readability=False uses readabilipy's pure-Python lxml/regex path with no Node dependency.)
Expected behavior
One of the following:
- pip install mcp-server-fetch should not silently introduce a runtime requirement on Node.js. Either declare it explicitly in pyproject.toml / README, or default to use_readability=False (Python-only).
- If Node.js is required, the absence or misbehavior of Node should fail fast with a clear error rather than blocking on a subprocess with no timeout.
- A user-facing flag (e.g., --no-readability-js / --readability-backend=python) so operators can opt out without monkey-patching the installed package.
Logs.
Instrumentation used in mcp_server_fetch/server.py to produce the trace below — only _trace(...) calls were added; everything else is the upstream code:
async def fetch_url(url: str, user_agent: str, force_raw: bool = False) -> Tuple[str, str]:
from httpx import AsyncClient, HTTPError
import time, os
def _trace(msg):
with open("/tmp/fetch_trace.log", "a") as f:
f.write(f"{time.time():.3f} {msg}\n")
_trace(f"[fetch_url] entered url={url} HTTPS_PROXY={os.environ.get('HTTPS_PROXY')!r}")
t0 = time.time()
async with AsyncClient() as client:
_trace(f"[fetch_url] client created at +{time.time()-t0:.1f}s")
try:
response = await client.get(
url,
follow_redirects=True,
headers={"User-Agent": user_agent},
timeout=30,
)
_trace(f"[fetch_url] got response status={response.status_code} at +{time.time()-t0:.1f}s")
except HTTPError as e:
_trace(f"[fetch_url] HTTPError after +{time.time()-t0:.1f}s: {e!r}")
raise McpError(ErrorData(code=INTERNAL_ERROR, message=f"Failed to fetch {url}: {e!r}"))
if response.status_code >= 400:
raise McpError(ErrorData(
code=INTERNAL_ERROR,
message=f"Failed to fetch {url} - status code {response.status_code}",
))
_trace(f"[fetch_url] reading response.text at +{time.time()-t0:.1f}s")
page_raw = response.text
_trace(f"[fetch_url] page_raw len={len(page_raw)} at +{time.time()-t0:.1f}s")
content_type = response.headers.get("content-type", "")
is_page_html = (
"<html" in page_raw[:100] or "text/html" in content_type or not content_type
)
_trace(f"[fetch_url] is_page_html={is_page_html} ct={content_type!r}")
if is_page_html and not force_raw:
_trace(f"[fetch_url] calling extract_content_from_html at +{time.time()-t0:.1f}s")
extracted = extract_content_from_html(page_raw)
_trace(f"[fetch_url] extract done len={len(extracted)} at +{time.time()-t0:.1f}s")
return extracted, ""
_trace(f"[fetch_url] returning raw at +{time.time()-t0:.1f}s")
return (
page_raw,
f"Content type {content_type} cannot be simplified to markdown, but here is the raw content:\n",
)
Trace output (Node installed in container, npm config emits unrelated warnings on startup):
[fetch_url] entered url=https://en.wikipedia.org/wiki/Main_Page HTTPS_PROXY='http://...:3128'
[fetch_url] client created at +0.0s
[fetch_url] got response status=200 at +0.3s
[fetch_url] reading response.text at +0.3s
[fetch_url] page_raw len=223644 at +0.3s
[fetch_url] is_page_html=True ct='text/html; charset=UTF-8'
[fetch_url] calling extract_content_from_html at +0.3s
<no further trace; harness times out at 70s>
npm warnings appear on stderr at the start of each call (npm warn Unknown user config "chromedriver_cdnurl", etc.), confirming a Node subprocess is being launched.
After the workaround (use_readability=False), 5 consecutive runs:
[fetch_url] calling extract_content_from_html at +0.3-0.9s
[fetch_url] extract done len=12671 at +0.6-1.2s
Additional context
- Versions: mcp-server-fetch (current PyPI), readabilipy (current PyPI), Python 3.11, Linux x86_64.
- The Node call originates in readabilipy/simple_json.py → readabilipy/javascript/ExtractArticle.js, invoked via
subprocess.check_output with no timeout.
- The MCP client wraps the hang as Connection lost: Timed out while waiting for response to
ClientRequest. Waited 60.0 seconds. — opaque from the agent's perspective.
- Concrete suggestions:
a. Default use_readability=False, or surface as a CLI flag.
b. Probe shutil.which("node") at startup; warn or fail clearly when use_readability=True is requested without Node.
c. Pass a timeout= to the underlying subprocess so a wedged Node is recoverable.
d. Document the Node requirement in README.md and add it to extras (pip install "mcp-server-fetch[readability]").
Describe the bug
mcp-server-fetch hard-codes use_readability=True when calling readabilipy.simple_json.simple_json_from_html_string in extract_content_from_html (server.py, ~line 36). This silently shells out to a bundled Node.js script that imports Mozilla's @mozilla/readability. Node.js is not declared as a dependency in pyproject.toml, not mentioned in the README, and there is no fallback or timeout: if the Node subprocess hangs (e.g., due to an unrelated npm config issue, a slow Node startup, or Node being absent), fetch_url blocks indefinitely. The MCP client eventually times out and the agent receives a Connection lost: Timed out while waiting for response to ClientRequest error with no actionable detail.
To Reproduce
Steps to reproduce the behavior:
readabilipy.simple_json_from_html_string(html, use_readability=True).python3 -c "import mcp_server_fetch, inspect; print(inspect.getfile(mcp_server_fetch))"), editextract_content_from_html(~line 36) and changeuse_readability=Truetouse_readability=False:The probe then completes in ~0.6–1.2s with markdown content. (
use_readability=Falseuses readabilipy's pure-Python lxml/regex path with no Node dependency.)Expected behavior
One of the following:
Logs.
Instrumentation used in mcp_server_fetch/server.py to produce the trace below — only
_trace(...)calls were added; everything else is the upstream code:Trace output (Node installed in container, npm config emits unrelated warnings on startup):
npm warnings appear on stderr at the start of each call (npm warn Unknown user config "chromedriver_cdnurl", etc.), confirming a Node subprocess is being launched.
After the workaround (use_readability=False), 5 consecutive runs:
Additional context
subprocess.check_outputwith no timeout.ClientRequest. Waited 60.0 seconds. — opaque from the agent's perspective.a. Default
use_readability=False, or surface as a CLI flag.b. Probe
shutil.which("node")at startup; warn or fail clearly whenuse_readability=Trueis requested without Node.c. Pass a
timeout=to the underlying subprocess so a wedged Node is recoverable.d. Document the Node requirement in
README.mdand add it to extras (pip install "mcp-server-fetch[readability]").