Course Project

 

This project counts 30% of your grade.

 

In this project, you will explore the free software that is available online for search engine construction. You can find much of this software at various online opensource software resources or you can use search engines to locate software.

 

With this software you will crawl the web for a 1000 document database and index it. Then, you will build a query engine to access the index and provide a ranking. An interface must be provided to permit others to query your database.

 

Your 1000 document database must be an “interesting” topic and should be approved by the instructor.  Please see the instructor for examples.  Students will be permitted to use already crawled collections but must crawl them for their index. Part of this exercise is to crawl for “your” collection.


Please exercise good judgement in what you crawl and respect robot exclusion principle.
DO NOT CRAWL SEARCH ENGINES.

 

Your topic must be decided by Oct 10.

 

For this project, you need to:

 

1.      Provide evidence of the crawled documents, built index and query engine.

2.      An interface that the instructor can test and use on the web.

3.      Explanation of what you did, why and how you did it.

 

A professional quality hardcopy report describing the above is due the first day of the exam schedule. The report must be a hardcopy to get full credit.

 

Students must provide a link to their search engine query interface for the entire exam week.

 

*************************************************