PDF Link Collector Test Page

Tests all URL patterns the PdfLinkCollector handles. Each section uses a unique PDF file.

Collected URLs

Click "Run Collector" to scan the page.

Compare Results

Paste the JSON array from the collector response below, then click Compare.


1. Standard <a href> links

sample-accessible.pdf (relative) MATCH
sample-basic.pdf (relative) MATCH
sample-university.pdf (relative) MATCH
External PDF (absolute) MATCH
HTML page NO MATCH
Regular URL NO MATCH

2. PDFs in query parameters

?file=document.pdf MATCH
?name=report.pdf MATCH
'pdf' in value but no .pdf extension NO MATCH
'pdf' in path but no .pdf extension NO MATCH

3. False positive prevention

.pdf.exe NO MATCH
.pdfx NO MATCH
.pdf.bak NO MATCH

4. Protocol filtering

mailto: NO MATCH
javascript: NO MATCH
ftp: NO MATCH

5. Hash dedup (same URL, different hashes → 1 result)

#page=1 DEDUPED #page=5 DEDUPED no hash DEDUPED

All 3 resolve to the same URL — should produce 1 result (already counted in section 1)

6. Embedded PDF via <embed>

MATCH — test-embed.pdf

7. PDF via <object data>

Your browser doesn't support embedded PDFs.

MATCH — test-object.pdf

8. PDF in <iframe src>

MATCH — test-iframe.pdf

9. Iframe with inner HTML (same-origin recursion)

MATCH — recurses into iframe, finds test-area.pdf inside

10. Image map <area href>

PDF via area

MATCH — test-area.pdf via <area>

Expected Results Summary

URLSourceExpected
http://localhost:8111/pdfs/sample-accessible.pdf<a href>Match
http://localhost:8111/pdfs/sample-basic.pdf<a href>Match
http://localhost:8111/pdfs/sample-university.pdf<a href> (deduped from 4 refs)Match
https://pdfobject.com/pdf/sample-3pp.pdf<a href>Match
http://localhost:8111/download?file=document.pdf<a href> query paramMatch
http://localhost:8111/api/fetch?name=report.pdf&token=abc<a href> query paramMatch
http://localhost:8111/pdfs/test-embed.pdf<embed src>Match
http://localhost:8111/pdfs/test-object.pdf<object data>Match
http://localhost:8111/pdfs/test-iframe.pdf<iframe src>Match
http://localhost:8111/pdfs/test-area.pdf<area href> + inner iframeMatch
page.html, example.com, pdf-handler, etc.variousNo match
report.pdf.exe, report.pdfx, etc.variousNo match
mailto:, javascript:, ftp:variousNo match

Total expected: 10 unique PDF URLs