|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Objectedu.psu.ist.youseer.Worker
public class Worker
Title:
Description: This is the basic unit of execution, each worker is responsible for parsing a document and generating the corresponding solr document. During the processing, the SubmitterDocument is passed to the CustomeExtractor to see if the user has implemented some specific extraction functions.
Copyright: Copyright Madian Khabsa @ Penn State(c) 2009
Company: Penn State
| Field Summary | |
|---|---|
private SubmitterDocument |
doc
|
private ARCSubmitter |
parent
|
| Constructor Summary | |
|---|---|
Worker(ARCSubmitter parent,
SubmitterDocument doc)
|
|
| Method Summary | |
|---|---|
java.lang.String |
GenerateDocument()
Generates solr document for the processed ARC record using the tags from the configuration file |
private static java.lang.String |
getTitle(net.htmlparser.jericho.Source source)
Extracts the title out of a text document using Jericho parser |
private boolean |
InsertToDB(java.lang.String result)
Inserts a log entry to the database that the current document wasn't submitted to the index |
boolean |
ProcessBinaryDocument()
Process the bindary document, converts it to plain text using apache tika, and then extracts the title of the file |
boolean |
ProcessTextDocument()
Processes the text document, extracts the title, and strip the HTML tags |
void |
run()
|
java.lang.String |
sendPostCommand(java.lang.String command,
java.lang.String url)
Sends a post request to the server Courtesy of Grant Ingersoll @ IBM |
java.lang.String |
StripHTML(java.lang.String rawString)
Strips the text from the HTML tags. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
private ARCSubmitter parent
private SubmitterDocument doc
| Constructor Detail |
|---|
public Worker(ARCSubmitter parent,
SubmitterDocument doc)
| Method Detail |
|---|
public void run()
run in interface java.lang.Runnableprivate boolean InsertToDB(java.lang.String result)
result - String The exception error message
public boolean ProcessTextDocument()
public boolean ProcessBinaryDocument()
public java.lang.String GenerateDocument()
doc - SubmitterDocument
public java.lang.String StripHTML(java.lang.String rawString)
rawString - String
public java.lang.String sendPostCommand(java.lang.String command,
java.lang.String url)
throws java.lang.Exception
command - String the command to be senturl - String the URL of the server
java.lang.Exceptionprivate static java.lang.String getTitle(net.htmlparser.jericho.Source source)
source - Source
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||