About webCrawler
Welcome! This might be the project I enjoyed making the most. This is a Golang CLI application that can generate a report about internal links for any single website, by crawling the HTML for the website.
Tech stack: Golang | HTML
Click here to visit this projects GitHub page!
Learning goals
- Get hands-on practice with local Go development and tooling
- Practice making HTTP requests in Go
- Learn how to parse HTML with Go
- Practice unit testing
Demo
In the video next to this you will see me using the web crawler. I am using the url from Lane Wagners blog, the creator of Boot.Dev, where I gained all my software knowledge. The number 2 I am entering limits the amount of goroutines my program will be able to use, so my program does not use all my memory. If no limit to that is entered, the default is 3. The next number, 30, limit the amount of links that are retreived from the base url. Since the program works concurrently, it would try to get load every single page for a given base url, which is too much data to be waiting for, or for a regular laptop to handle. The default for this value is 5.