There are lots of motives you might have to have to seek out every one of the URLs on an internet site, but your exact intention will ascertain That which you’re trying to find. For illustration, you might want to:
Determine each individual indexed URL to analyze difficulties like cannibalization or index bloat
Collect recent and historic URLs Google has seen, especially for internet site migrations
Obtain all 404 URLs to recover from submit-migration mistakes
In Just about every state of affairs, a single tool received’t Provide you with every thing you'll need. Sadly, Google Research Console isn’t exhaustive, as well as a “site:illustration.com” search is limited and difficult to extract knowledge from.
Within this put up, I’ll stroll you through some equipment to create your URL checklist and ahead of deduplicating the information using a spreadsheet or Jupyter Notebook, depending on your internet site’s measurement.
Old sitemaps and crawl exports
If you’re searching for URLs that disappeared within the Dwell site just lately, there’s an opportunity anyone on your team might have saved a sitemap file or possibly a crawl export prior to the alterations have been built. If you haven’t now, check for these data files; they might typically offer what you would like. But, should you’re examining this, you probably didn't get so Blessed.
Archive.org
Archive.org
Archive.org is a useful Software for SEO duties, funded by donations. In case you seek for a domain and choose the “URLs” solution, you can entry around ten,000 shown URLs.
Nevertheless, Here are a few restrictions:
URL limit: You can only retrieve as many as web designer kuala lumpur ten,000 URLs, that is inadequate for bigger web sites.
Top quality: A lot of URLs could possibly be malformed or reference source files (e.g., photos or scripts).
No export alternative: There isn’t a constructed-in solution to export the checklist.
To bypass The shortage of the export button, utilize a browser scraping plugin like Dataminer.io. Nevertheless, these constraints mean Archive.org may well not give a complete Alternative for much larger web sites. Also, Archive.org doesn’t indicate no matter if Google indexed a URL—but if Archive.org observed it, there’s a great prospect Google did, also.
Moz Professional
Though you could commonly utilize a website link index to search out exterior sites linking to you, these resources also learn URLs on your internet site in the method.
The way to use it:
Export your inbound back links in Moz Pro to get a brief and straightforward list of target URLs from a web-site. In case you’re handling an enormous website, consider using the Moz API to export information over and above what’s manageable in Excel or Google Sheets.
It’s essential to Observe that Moz Pro doesn’t verify if URLs are indexed or found by Google. Nevertheless, because most sites use exactly the same robots.txt policies to Moz’s bots since they do to Google’s, this technique generally operates properly like a proxy for Googlebot’s discoverability.
Google Research Console
Google Lookup Console gives numerous useful resources for constructing your list of URLs.
Links stories:
Similar to Moz Pro, the One-way links segment delivers exportable lists of target URLs. However, these exports are capped at one,000 URLs Each individual. You'll be able to implement filters for precise web pages, but because filters don’t utilize to your export, you may perhaps really need to trust in browser scraping resources—restricted to five hundred filtered URLs at any given time. Not best.
Efficiency → Search engine results:
This export gives you a summary of webpages getting lookup impressions. When the export is proscribed, you can use Google Look for Console API for larger sized datasets. There's also free of charge Google Sheets plugins that simplify pulling more considerable knowledge.
Indexing → Pages report:
This area provides exports filtered by challenge kind, though these are also constrained in scope.
Google Analytics
Google Analytics
The Engagement → Pages and Screens default report in GA4 is a wonderful supply for gathering URLs, having a generous Restrict of 100,000 URLs.
A lot better, you could use filters to create unique URL lists, efficiently surpassing the 100k Restrict. For example, if you would like export only website URLs, observe these ways:
Step one: Insert a segment towards the report
Step two: Click “Make a new segment.”
Move 3: Determine the segment with a narrower URL sample, which include URLs that contains /blog/
Take note: URLs found in Google Analytics may not be discoverable by Googlebot or indexed by Google, but they supply useful insights.
Server log documents
Server or CDN log data files are Maybe the ultimate Device at your disposal. These logs capture an exhaustive list of each URL path queried by end users, Googlebot, or other bots in the recorded period of time.
Things to consider:
Facts dimensions: Log files is usually large, numerous sites only retain the last two months of knowledge.
Complexity: Examining log data files is often difficult, but a variety of applications are offered to simplify the procedure.
Incorporate, and good luck
When you’ve collected URLs from these resources, it’s time to combine them. If your web site is sufficiently small, use Excel or, for bigger datasets, tools like Google Sheets or Jupyter Notebook. Assure all URLs are persistently formatted, then deduplicate the record.
And voilà—you now have an extensive listing of present, old, and archived URLs. Very good luck!