Attributions




Jack's Attribution:

Over the course of the quarter I implemented the following:

The crawler itself was simple to modify once the right location in the code base was found. Some more modifications were made to allow multiply instances to run from one installation of Heritrix.

The client/server system was next designed such that we could easily crawl more seeds, log the results and distribute those to the image processor. I designed the architecture and coded this system.

Indexing the webcams was made rather easy through the use of Lucene. Lucene was designed to be use to incorporate into a project and it was. This code was thoroughly integrated into the Location algorithms which while poor software engineering, was also quick to write, and still quite comprehensible due to the perceived simplicity of Lucene.

The location algorithms were the most interesting challenge that I took on this quarter. Using the location data that Kia scraped I was able to identify location names parsed from the html. Determining how to best derive a location hypothesis was challenging, especially when many websites link to various webcams in different cities, and countries. I feel that my results are good given a lack of any learning algorithm.

In the last week of the course I became enthralled with the idea of incorporating a Google Maps hack into our project. Information on how to accomplish such is available on the web, but not completely coherent. After many hours I finally got this to work and in the process solved a few bugs, but created a new one. It appears that the current Google Code does not support marked being placed on either side of the zero-degree longitude line. I fixed this somewhat, but have introduced a less visible scrolling bug. I consider this still to be less annoying than the original version.

While enjoyable, implementing this shows off our collection of webcams to a significantly higher degree than is otherwise possible making our project seem significantly more impressive.

I also admin the web server and wrote the several of the JSP pages based heavily off of the provided code from Lucene and 'Google Standalone'.

The camera icon on the maps is also provided by Josh Day, a good friend of mine.

Jim's Attribution:

Over the course of the seven weeks, Jack and I met regularly on daily basis to work on the project. Of the modules listed on the architecture diagram, I implemented the following:

Of these modules, the implementation of Image processor was the most time consuming. Since image processor is a main module was used to classify webpage to a webcam, a great deal time was spent on the module to ensure the precision. The most challenging tasks were involved of fixing synchronization problems and unexpected timeout deadlocks due to the nature of multithreaded programming. The original implementation processed only about 100 webpage per hour. The final version with four priority queues had about ten times the throughput.

Thumbnail Generator and file system was the next most time consuming module I worked on. The first implement of Thumbnail generator was linearly designed. It took rough about four and half hours to archive the thumbnails. The next implementation, I made it multi-thread program. Of course this introduced more problems into the system due to unhandled critical sections and deadlocks. After fixing most synchronization problems, the efficiency didn't improve very much over the linear implementation. As a third attempt in optimization, the program was divided into sections run by separate processes. This improved the throughput greatly. I was able to archive all the thumbnail images under an hour.

Hearing how it would be cool to have animated history from a certain progress meeting. I took the challenge and implemented the Gif generator. The code wasn't too difficult to implement after the right source files were found. Although, the most complex part was to integrate the Gif generator into the front end.

I spent the last week developing the front end our search engine with Jack. Writing JSP page was actually quite interesting once I went pass the learning curves.

Kiarash's Attribution:

My main area was information gathering, and Indexing. The following is a more in depth list of my responsibilities.

Sources used.

* - Indicates that modified versions were used, as compared to simply using the API, code, or data provided from the website as of 6/05/05.