i'm starting work on a search engine

I’ve decided to try and code a web search engine.

No, there’s not much reason behind it, but I figure I’ll learn a ton along the way and might actually end up with something cool. I don’t have any obligation to actually finish the project or any timeframe I’m trying to complete it within, so it’s just something I can pick up and work on whenever I feel like I need something to do.

progress so far

So far, I’ve been working on the crawler, the piece of code that goes around the web, finds cool pages, and saves information on them.

I’m writing it entirely as bash scripts, which has been a challenging but rewarding technology to use for the task.

Some data I collect about pages at this point in time:

  • title
  • description
  • author
  • language
  • links to web feeds
  • keywords
  • whether they use javascript
  • whether they have ads
  • whether they have invasive tracking/analytics

So far I’ve crawled over 25,000 pages as a starting index to develop the rest of the search engine using.

The crawler definitely needs some refining, but it’s pretty good for a first effort. I’ll likely post some more about different challenges I faced in the process along the way.

plans

I do plan to open source the search engine and crawler if I get far enough, but I don’t want to yet, so people don’t take my non-perfected crawler and start scraping the web with it and all its potential issues at this point.

Also, I haven’t decided on a name for the search engine or crawler. I’m brainstorming, but if you have ideas, shoot me a message!

Neither do I know what programming language I’ll use as a backend for the search engine. If you have suggestions, I’m open to hearing them.

conclusion

I’ve had fun so far, which is most of what matters to me for this project. I’ll keep trucking and keep you posted; I know a number of people have expressed interest in the final result to me.



If you like the work I do, please consider supporting me on Liberapay!

Badge showing amount I earn per week
Badge showing how close I am to reaching my funding goal