Skip to content

Add canonical link tags for item pages and bitstream downloads#5172

Draft
bram-atmire wants to merge 1 commit into
DSpace:mainfrom
bram-atmire:feat/4509-canonical-links
Draft

Add canonical link tags for item pages and bitstream downloads#5172
bram-atmire wants to merge 1 commit into
DSpace:mainfrom
bram-atmire:feat/4509-canonical-links

Conversation

@bram-atmire

@bram-atmire bram-atmire commented Feb 27, 2026

Copy link
Copy Markdown
Member

References

Description

Adds <link rel="canonical"> to item page HTML heads and a Link HTTP header with rel="canonical" to bitstream download responses, improving SEO by declaring preferred URLs.

Instructions for Reviewers

List of changes in this PR:

  • New SeoConfig interface (src/config/seo-config.interface.ts) with canonical.items and canonical.bitstreams boolean settings (both default to true)
  • HeadTagService: injects LinkHeadService, adds <link rel="canonical"> pointing to the UUID-based item view URL (via getItemPageRoute(item, true)), and removes it on every route change. This covers both simple (/items/{uuid}) and full (/items/{uuid}/full) views; the canonical always points to the simple view. Entity pages (/entities/{type}/{uuid}) get their own canonical. Any custom URL (dspace.customurl) is deliberately ignored, because a custom URL can change with the item's metadata and is therefore not a stable canonical reference; the canonical always uses the UUID-based route. Handle URLs (/handle/{p}/{s}) already 301-redirect, so no canonical is needed there.
  • BitstreamDownloadPageComponent: adds <link rel="canonical"> to the HTML <head> (works for both SSR and client-side rendering) and appends rel="canonical" to the Link HTTP header (SSR only). Implements OnDestroy to clean up the tag.
  • Tests added for both HeadTagService and BitstreamDownloadPageComponent covering: tag added for Items, not added for non-Item DSOs, not added for non-DSO pages, removed on route change, respects config false, a custom URL still yielding the UUID-based canonical, and cleanup on destroy.
  • Note for reviewers: rebased onto current main. Canonical URLs are built from the configured ui.baseUrl via HardRedirectService.getBaseUrl() (main recently replaced getCurrentOrigin() with getBaseUrl() so the Host header is no longer trusted).

How to test:

  1. Build with SSR: npm run build:ssr && npm run serve:ssr
  2. Visit an item page, view page source — confirm <link rel="canonical" href="https://your-host/items/{uuid}"> is present in the <head>
  3. Visit the full item page (/items/{uuid}/full), view source — canonical should still point to /items/{uuid} (without /full)
  4. Visit an entity page (/entities/{type}/{uuid}) — canonical should point to /entities/{type}/{uuid}
  5. curl -i https://localhost:4000/bitstreams/{uuid}/download — confirm the Link header contains rel="canonical" (use -i for a GET; -I issues a HEAD request, which does not render the SSR response and so does not emit the header)
  6. To test the config toggle, set seo.canonical.items: false or seo.canonical.bitstreams: false in your config and verify the canonical tags are no longer added

New configuration (in config.yml or environment config):

seo:
  canonical:
    items: true        # default: true
    bitstreams: true   # default: true

Checklist

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@lgeggleston lgeggleston added component: SEO Search Engine Optimization new feature labels Mar 2, 2026
@lgeggleston lgeggleston moved this to 🏗 In Progress in DSpace 11.0 Release Mar 2, 2026
@lgeggleston lgeggleston moved this from 🏗 In Progress to 🙋 Needs Reviewers Assigned in DSpace 11.0 Release Mar 11, 2026
@github-actions

Copy link
Copy Markdown

Hi @bram-atmire,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!

@tschammnut

Copy link
Copy Markdown

Hi Bram!

I tested the code from this PR and I can confirm that points 1 to 6 are working as you described, apart from the following two observations I made:

  • curl -I … did not work for me. Issuing curl this way, it does not report the Link header.
    It did work when I used curl -i …, showing the Link header as expected, but in this case, the Angular server logged an error:
ERROR TypeError: Cannot read properties of undefined (reading 'pipe')
    at TranslateService2.get

I can't say whether this is simply caused by my local setup or whether it is a genuine bug of this PR.

  • When there are custom urls defined for an Item, the custom url is given as the canonical link instead of the entity link.

I would really love to see this PR being accepted and ideally backported to DSpace 9x because we are facing a lot of user requests regarding indexing on Google Scholar and it looks like this PR would solve a large portion of our indexing problems.

Is there anything I can do to support this development effort?

@bram-atmire bram-atmire force-pushed the feat/4509-canonical-links branch from 869a3cb to 7d4cc8c Compare June 19, 2026 19:42
@bram-atmire

Copy link
Copy Markdown
Member Author

Thanks for the thorough testing @tschammnut, and great to hear this would help with your Google Scholar indexing. I've pushed an update (rebased onto current main) addressing your feedback:

1. curl -I vs curl -i

You're right, and this is expected behaviour rather than a bug. -I issues a HEAD request while -i issues a GET. The Link header is only emitted during the SSR render of a GET, which is the same behaviour as the existing signposting Link header. Search-engine crawlers issue GET requests, so they will see it. I've corrected the test instructions in the PR description to use curl -i.

2. Custom URLs as canonical

Good catch. A custom URL (dspace.customurl) can change when the item's metadata changes, so it isn't a stable canonical reference. I've changed the canonical to always use the UUID-based route (/items/{uuid} or /entities/{type}/{uuid}), ignoring any custom URL, and added a regression test for it.

3. TypeError: Cannot read properties of undefined (reading 'pipe') at TranslateService.get

I wasn't able to tie this to the canonical code. The bitstream download component doesn't call TranslateService.get directly, so the error originates elsewhere in the download/redirect flow, and a plain GET to the download URL would trigger it with or without this PR. Could you confirm whether you also see this error on a clean main without this PR? That will tell us whether it's pre-existing and should be tracked separately, rather than something introduced here.

On backporting: I'm planning to bring this to dspace-10_x and dspace-9_x as well after it lands on main, precisely because of the Scholar indexing use case you describe.

Adds <link rel="canonical"> to item page HTML heads and a Link HTTP
header with rel="canonical" to bitstream download responses. This helps
search engines identify the preferred URL for each page, preventing
self-competition between /items/{uuid}, /entities/{type}/{uuid}, and
/items/{uuid}/full routes.

Both features are configurable via seo.canonical.items and
seo.canonical.bitstreams settings (enabled by default).

Refs DSpace#4509

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bram-atmire bram-atmire force-pushed the feat/4509-canonical-links branch from 7d4cc8c to 044e3a8 Compare June 19, 2026 19:45
@bram-atmire

Copy link
Copy Markdown
Member Author

I would really love to see this PR being accepted and ideally backported to DSpace 9x because we are facing a lot of user requests regarding indexing on Google Scholar and it looks like this PR would solve a large portion of our indexing problems.
Is there anything I can do to support this development effort?

Encouragement like this is often all it takes ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: SEO Search Engine Optimization new feature

Projects

Status: 🙋 Needs Reviewers Assigned

Development

Successfully merging this pull request may close these issues.

SEO: DSpace should be able to declare its item pages and bitstream links as canonical

3 participants