Wednesday, April 21, 2010

Main Project : WhiZearch

My Academic Main Project : WhiZearch.

Duration : 4 months
Highlighting Technologies used :
1. Cluster management Technology - Apache Hadoop 0.20.1
2. Map-Reduce Parallel Programming Framework.
3. Semantic Web Technologies like RDF, SPARQL etc..

Abstract: WhiZearch is a web-based Search Engine meant for the Research Community as well as students who wish to find details regarding the various publications and journals scattered in an unstructured manner in the web. It also helps the end-user to find all information regarding a specific author/researcher, be it the author’s works or his/her personal info such as Mail ID, Phone No., Address, Interests, Educational profile etc..
A Login Module with Search History feature is also included.

While WhiZearch uses the advantages provided by the Semantic Web Technologies like RDF and SPARQL for data storage, processing and retrieval, it also uses the power of the Map-Reduce Programming Framework on a Hadoop Distributed Platform in the back-end. These features definitely demarcates itself from the other available web services providing similar functionalities to the end-user.

DBLP Online data, Bibliography details in Bibtex file formats are used to create RDF models of publication and Author details. All RDF Entries matching the user query are mapped and in the reduce function, RDF Querying is done for all entries with common key.

Other Technologies Used : Eclipse EE IDE for normal Java/J2EE, Jena APIs for Semantic Web Implementations, Servlet, JSP, Tomcat Server, etc..

My main project has been a good platform to learn about the latest cluster management technology, Hadoop (We tested our webservice with a Hadoop Cluster having a namenode and 5 Blade Servers added as Datanodes) as well as the state-of-the-art Semantic Web Technologies like RDF, SPARQL etc..
We used Prezi Tool for Final Project Presentation. It went really gr8.. :-)

Bye,
Narayanan

No comments: