Skip to content

Commit b52e2cc

Browse files
JesperDramschclaudepre-commit-ci[bot]
authored
Add YouTube channel extraction and display support (#338)
* fix: exclude YouTube from Mastodon detection, add YouTube as conference link YouTube URLs with /@channel patterns were incorrectly matched as Mastodon profiles. This fixes the extractor to properly identify YouTube links and adds YouTube as a first-class conference field for enrichment and display. - Fix extract_links_from_url to detect YouTube before generic /@username - Add youtube field to Conference schema, validation, and data model - Add YouTube display on conference detail pages - Add tests for YouTube extraction and Mastodon disambiguation https://claude.ai/code/session_0154a8RdG7M2nj83zPWVodgZ * feat: add Bluesky and YouTube display to conference templates Bluesky and YouTube data was being tracked but never shown to users. Add display links to conference detail pages, summary pages, and the index listing row using Font Awesome brand icons. https://claude.ai/code/session_0154a8RdG7M2nj83zPWVodgZ * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(enrich-tba): use exact domain matching for YouTube/Twitter detection CodeQL flagged the substring checks ("youtube.com" in domain) as incomplete URL sanitization — a host like evil-youtube.com.attacker could match. Replace with a _domain_matches helper that accepts an exact host or a proper subdomain, and reuse it for Twitter/X. Also collapses the line Ruff E501 on the combined Twitter/YouTube skip condition into a readable form. https://claude.ai/code/session_0154a8RdG7M2nj83zPWVodgZ * chore: bump pyupgrade to v3.21.2 for Python 3.14 compatibility pyupgrade v3.15.2 crashes on Python 3.14 with a TypeError from tokenize.cookie_re — it passes a str where newer CPython expects a bytes pattern. pre-commit.ci runs on Python 3.14, so the hook was failing on every PR regardless of the diff. Bumping to v3.21.2 picks up the upstream fix. https://claude.ai/code/session_0154a8RdG7M2nj83zPWVodgZ --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 2929531 commit b52e2cc

10 files changed

Lines changed: 152 additions & 9 deletions

File tree

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ repos:
2727
- --force-single-line-imports
2828
- --profile black
2929
- repo: https://github.com/asottile/pyupgrade # Upgrade Python syntax
30-
rev: v3.15.2
30+
rev: v3.21.2
3131
hooks:
3232
- id: pyupgrade
3333
args:

_includes/head.html

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@
8989
twitter: {{ conf.twitter | jsonify }},
9090
mastodon: {{ conf.mastodon | jsonify }},
9191
bluesky: {{ conf.bluesky | jsonify }},
92+
youtube: {{ conf.youtube | jsonify }},
9293
location: {{ conf.location | jsonify }},
9394
extra_places: {{ conf.extra_places | jsonify }},
9495
workshop_deadline: {{ conf.workshop_deadline | jsonify }},

_includes/index_conf_title_row.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
{% elsif conf.twitter %}
2424
<a title="Twitter" href="https://twitter.com/{{conf.twitter}}" target="_blank" rel="noopener noreferrer"><img src="/static/img/407-twitter.svg" alt="Twitter" width="14" height="14" /></a>
2525
{% endif %}
26+
{% if conf.bluesky %}
27+
<a title="Bluesky" href="{{conf.bluesky}}" target="_blank" rel="noopener noreferrer"><i class="fa-brands fa-bluesky" style="width:14px;height:14px;" aria-hidden="true"></i></a>
28+
{% endif %}
2629
</span>
2730
</div>
2831
</div>

_layouts/conference.html

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,18 @@ <h2 id="conf-subtitle">a.k.a. {{page.alt_name}} {{page.year}}</h2>
162162
<a id="conf-mastodon" target="_blank" rel="noopener noreferrer" href="{{page.mastodon}}">Mastodon</a>
163163
</div>
164164
{% endif %}
165+
{% if page.bluesky %}
166+
<div>
167+
<i class="fa-brands fa-bluesky" style="width:16px;height:16px;" aria-hidden="true"></i>
168+
<a id="conf-bluesky" target="_blank" rel="noopener noreferrer" href="{{page.bluesky}}">Bluesky</a>
169+
</div>
170+
{% endif %}
171+
{% if page.youtube %}
172+
<div>
173+
<i class="fa-brands fa-youtube" style="width:16px;height:16px;" aria-hidden="true"></i>
174+
<a id="conf-youtube" target="_blank" rel="noopener noreferrer" href="{{page.youtube}}">YouTube</a>
175+
</div>
176+
{% endif %}
165177
</div>
166178
</div>
167179
<div id="conf-deadlines" class="row">

_layouts/summary.html

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,18 @@ <h1>
7171
<a id="conf-mastodon" target="_blank" rel="noopener noreferrer" href="{{confs[0].mastodon}}">Mastodon</a>
7272
</div>
7373
{% endif %}
74+
{% if confs[0].bluesky %}
75+
<div>
76+
<i class="fa-brands fa-bluesky" style="width:16px;height:16px;" aria-hidden="true"></i>
77+
<a id="conf-bluesky" target="_blank" rel="noopener noreferrer" href="{{confs[0].bluesky}}">Bluesky</a>
78+
</div>
79+
{% endif %}
80+
{% if confs[0].youtube %}
81+
<div>
82+
<i class="fa-brands fa-youtube" style="width:16px;height:16px;" aria-hidden="true"></i>
83+
<a id="conf-youtube" target="_blank" rel="noopener noreferrer" href="{{confs[0].youtube}}">YouTube</a>
84+
</div>
85+
{% endif %}
7486
</div>
7587
</div>
7688
<div id="all_confs">

tests/test_youtube_extraction.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
"""Tests for YouTube link extraction and Mastodon/YouTube disambiguation."""
2+
3+
import sys
4+
from pathlib import Path
5+
from unittest.mock import patch
6+
7+
sys.path.append(str(Path(__file__).parent.parent / "utils"))
8+
9+
from enrich_tba import extract_links_from_url
10+
11+
12+
class TestYouTubeExtraction:
13+
"""Test YouTube link detection in extract_links_from_url."""
14+
15+
@patch("enrich_tba.get_all_links")
16+
def test_youtube_channel_detected(self, mock_links):
17+
"""YouTube /@channel links are detected as youtube, not mastodon."""
18+
mock_links.return_value = [
19+
"https://www.youtube.com/@PyConUS",
20+
]
21+
result = extract_links_from_url("https://pycon.org")
22+
assert "youtube" in result
23+
assert result["youtube"] == "https://www.youtube.com/@PyConUS"
24+
assert "mastodon" not in result
25+
26+
@patch("enrich_tba.get_all_links")
27+
def test_youtube_channel_url_without_at(self, mock_links):
28+
"""YouTube channel links without @ are detected."""
29+
mock_links.return_value = [
30+
"https://www.youtube.com/channel/UCMjMBMGt0WP2usFilILnbcA",
31+
]
32+
result = extract_links_from_url("https://pycon.org")
33+
assert "youtube" in result
34+
assert "mastodon" not in result
35+
36+
@patch("enrich_tba.get_all_links")
37+
def test_youtube_not_mistaken_for_mastodon(self, mock_links):
38+
"""YouTube /@username must not end up in mastodon field."""
39+
mock_links.return_value = [
40+
"https://www.youtube.com/@EuroPython",
41+
"https://fosstodon.org/@europython",
42+
]
43+
result = extract_links_from_url("https://europython.eu")
44+
assert result.get("youtube") == "https://www.youtube.com/@EuroPython"
45+
assert result.get("mastodon") == "https://fosstodon.org/@europython"
46+
47+
@patch("enrich_tba.get_all_links")
48+
def test_youtu_be_short_link(self, mock_links):
49+
"""Short youtu.be links are detected as youtube."""
50+
mock_links.return_value = [
51+
"https://youtu.be/abc123",
52+
]
53+
result = extract_links_from_url("https://pycon.org")
54+
assert "youtube" in result
55+
assert "mastodon" not in result
56+
57+
@patch("enrich_tba.get_all_links")
58+
def test_mastodon_still_works(self, mock_links):
59+
"""Mastodon links on known instances still detected correctly."""
60+
mock_links.return_value = [
61+
"https://fosstodon.org/@pycon",
62+
]
63+
result = extract_links_from_url("https://pycon.org")
64+
assert "mastodon" in result
65+
assert result["mastodon"] == "https://fosstodon.org/@pycon"
66+
assert "youtube" not in result
67+
68+
@patch("enrich_tba.get_all_links")
69+
def test_generic_mastodon_still_works(self, mock_links):
70+
"""Generic /@username on unknown instances still detected as mastodon."""
71+
mock_links.return_value = [
72+
"https://social.example.org/@pyconf",
73+
]
74+
result = extract_links_from_url("https://pyconf.org")
75+
assert "mastodon" in result
76+
assert "youtube" not in result
77+
78+
@patch("enrich_tba.get_all_links")
79+
def test_youtube_first_seen_wins(self, mock_links):
80+
"""Only the first YouTube link is kept."""
81+
mock_links.return_value = [
82+
"https://www.youtube.com/@PyConUS",
83+
"https://www.youtube.com/@AnotherChannel",
84+
]
85+
result = extract_links_from_url("https://pycon.org")
86+
assert result["youtube"] == "https://www.youtube.com/@PyConUS"
87+
88+
@patch("enrich_tba.get_all_links")
89+
def test_all_social_links_extracted(self, mock_links):
90+
"""YouTube, Mastodon, and Bluesky can all be extracted together."""
91+
mock_links.return_value = [
92+
"https://bsky.app/profile/pycon.org",
93+
"https://www.youtube.com/@PyConUS",
94+
"https://fosstodon.org/@pycon",
95+
]
96+
result = extract_links_from_url("https://pycon.org")
97+
assert "bluesky" in result
98+
assert "youtube" in result
99+
assert "mastodon" in result

utils/enrich_tba.py

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
MAX_CONTENT_LENGTH = 15000 # Max characters per conference website
5151

5252
# Field type categorization for validation
53-
URL_FIELDS = {"sponsor", "finaid", "mastodon", "bluesky", "cfp_link"}
53+
URL_FIELDS = {"sponsor", "finaid", "mastodon", "bluesky", "youtube", "cfp_link"}
5454
DATE_FIELDS = {"cfp", "workshop_deadline", "tutorial_deadline"}
5555
TIMEZONE_FIELD = "timezone"
5656

@@ -267,6 +267,11 @@ def get_all_links(url: str) -> list[str]:
267267
return []
268268

269269

270+
def _domain_matches(domain: str, hosts: tuple[str, ...]) -> bool:
271+
"""Return True if domain equals one of hosts or is a subdomain of one."""
272+
return any(domain == h or domain.endswith(f".{h}") for h in hosts)
273+
274+
270275
# Known Mastodon instances (common ones in tech/Python community)
271276
MASTODON_INSTANCES = {
272277
"mastodon.social",
@@ -334,22 +339,30 @@ def extract_links_from_url(url: str) -> dict[str, str]:
334339
for link in links:
335340
link_lower = link.lower()
336341
parsed_link = urlparse(link)
342+
link_domain = parsed_link.netloc.lower()
343+
344+
is_youtube = _domain_matches(link_domain, ("youtube.com", "youtu.be"))
345+
is_twitter = _domain_matches(link_domain, ("twitter.com", "x.com"))
337346

338347
# Bluesky - always bsky.app/profile/
339348
if "bluesky" not in seen_types and "bsky.app/profile/" in link_lower:
340349
found["bluesky"] = link
341350
seen_types.add("bluesky")
342351
logger.debug(f" Found bluesky: {link}")
343352

353+
# YouTube - youtube.com/@channel or youtu.be links
354+
elif "youtube" not in seen_types and is_youtube:
355+
found["youtube"] = link
356+
seen_types.add("youtube")
357+
logger.debug(f" Found youtube: {link}")
358+
344359
# Mastodon - /@username pattern on known instances or any instance
345-
# Exclude Twitter/X which don't use /@, but guard against edge cases
360+
# Exclude Twitter/X and YouTube which also use /@username patterns
346361
elif "mastodon" not in seen_types and "/@" in link:
347-
domain = parsed_link.netloc.lower()
348-
349-
# Skip Twitter/X domains (exact host or subdomains only)
350-
if domain == "twitter.com" or domain.endswith((".x.com", ".twitter.com")) or domain == "x.com":
362+
# Skip Twitter/X and YouTube domains
363+
if is_twitter or is_youtube:
351364
pass
352-
elif domain in MASTODON_INSTANCES or "mastodon" in domain or "toot" in domain:
365+
elif link_domain in MASTODON_INSTANCES or "mastodon" in link_domain or "toot" in link_domain:
353366
found["mastodon"] = link
354367
seen_types.add("mastodon")
355368
logger.debug(f" Found mastodon: {link}")

utils/schema.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
twitter: BestConfEver # Twitter handle of conference (Optional)
1919
mastodon: https://mastodon.social/@bconf # Mastodon handle of conference (Optional)
2020
bluesky: https://bsky.app/@bconf # Bluesky handle of conference (Optional)
21+
youtube: https://www.youtube.com/@bconf # YouTube channel of conference (Optional)
2122
sub: PY # Type of conference (see or add _data/types.yml)
2223
note: Important # In case there are extra notes about the conference (Optional)
2324
location: # Geolocation for inclusion in map

utils/tidy_conf/schema.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ class Conference(BaseModel):
7272
twitter: str | None = None
7373
mastodon: HttpUrl | None = None
7474
bluesky: str | None = None
75+
youtube: HttpUrl | None = None
7576
sub: str
7677
note: str | None = None
7778
location: list[Location] | None = None
@@ -121,7 +122,7 @@ def validate_title(cls, v):
121122
return re.sub(r"\b(19|20)\d{2}\b", "", v).strip()
122123
return v
123124

124-
@field_serializer("link", "cfp_link", "sponsor", "finaid", "mastodon")
125+
@field_serializer("link", "cfp_link", "sponsor", "finaid", "mastodon", "youtube")
125126
def ser_url(self, value):
126127
return str(value)
127128

utils/tidy_conf/validation.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
"twitter",
3434
"mastodon",
3535
"bluesky",
36+
"youtube",
3637
"location",
3738
"extra_places",
3839
]

0 commit comments

Comments
 (0)