A B C D E F G I L M O P R S T U V W

A

ARCSubmitter - Class in edu.psu.ist.youseer
Title: ARCSubmitter
ARCSubmitter() - Constructor for class edu.psu.ist.youseer.ARCSubmitter
 

B

ByteContent - Variable in class edu.psu.ist.youseer.SubmitterDocument
The binary content of any not text document

C

CACHE - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the location of the cached version of this file
CacheFolder - Variable in class edu.psu.ist.youseer.ARCSubmitter
The virtual path that will have the ARV files in it
CacheFolder - Variable in class edu.psu.ist.youseer.SubmitterConfig
The path that will have the ARC files under it
Config - Variable in class edu.psu.ist.youseer.ARCSubmitter
Cinfiguration object
ContainingFile - Variable in class edu.psu.ist.youseer.SubmitterDocument
The absolute path of the ARC file that contains this document
Count - Variable in class edu.psu.ist.youseer.ARCSubmitter
Number of documents submitted so far

D

DatabaseProvider - Variable in class edu.psu.ist.youseer.SubmitterConfig
Database provider, read from the XML configuration file
DataType - Variable in class edu.psu.ist.youseer.SubmitterDocument
i.e.
DBConnectionString - Variable in class edu.psu.ist.youseer.SubmitterConfig
The database connection string, read from the XML configuration file
doc - Variable in class edu.psu.ist.youseer.Worker
 
DOCUMENT_TEXT - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the content

E

edu.psu.ist.youseer - package edu.psu.ist.youseer
 
Extractor - Class in edu.psu.ist.youseer
 
Extractor() - Constructor for class edu.psu.ist.youseer.Extractor
 

F

FILE_TYPE - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the file type of the document
FlushIndexedDocs() - Method in class edu.psu.ist.youseer.ARCSubmitter
Insert all the processed URLs (ARC records) to the database.

G

GenerateCustomeData(SubmitterDocument) - Static method in class edu.psu.ist.youseer.Extractor
 
GenerateCustomeData1(SubmitterDocument) - Static method in class edu.psu.ist.youseer.Extractor
 
GenerateDocument() - Method in class edu.psu.ist.youseer.Worker
Generates solr document for the processed ARC record using the tags from the configuration file
getByteContent() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getContainingFile() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getDataType() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getOffset() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getRawTextContent() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getStrippedTextContent() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getTitle() - Method in class edu.psu.ist.youseer.SubmitterDocument
 
getTitle(Source) - Static method in class edu.psu.ist.youseer.Worker
Extracts the title out of a text document using Jericho parser
getUrl() - Method in class edu.psu.ist.youseer.SubmitterDocument
 

I

IndexedDocs - Variable in class edu.psu.ist.youseer.ARCSubmitter
List of documents that have been indexed but not yet inserted to the database
IndexedTypes - Variable in class edu.psu.ist.youseer.SubmitterConfig
Contains a set of all the Indexable file types, populated from the XML Configuration file
IndexURL - Variable in class edu.psu.ist.youseer.SubmitterConfig
The URL of the index
InsertToDB(File) - Method in class edu.psu.ist.youseer.ARCSubmitter
Inserts this file to the database when the submitter completes processing all its records
InsertToDB(String) - Method in class edu.psu.ist.youseer.Worker
Inserts a log entry to the database that the current document wasn't submitted to the index
IsIndexed(File) - Method in class edu.psu.ist.youseer.ARCSubmitter
Check whether the file has been already submitted to the index or not

L

LINE_SEP - Static variable in class edu.psu.ist.youseer.ARCSubmitter
Line separator

M

main(String[]) - Static method in class edu.psu.ist.youseer.ARCSubmitter
 

O

OFFSET - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the offset within the ARC file
Offset - Variable in class edu.psu.ist.youseer.SubmitterDocument
offset within the ARC file
OrgiginalPart - Variable in class edu.psu.ist.youseer.ARCSubmitter
The root folder of the ARC files
OriginalPart - Variable in class edu.psu.ist.youseer.SubmitterConfig
The absolute path of the root folder that contains the ARC files

P

parent - Variable in class edu.psu.ist.youseer.Worker
 
ProcessBinaryDocument() - Method in class edu.psu.ist.youseer.Worker
Process the bindary document, converts it to plain text using apache tika, and then extracts the title of the file
ProcessFolder(String) - Method in class edu.psu.ist.youseer.ARCSubmitter
Process a folder full of ARC files, or subfolders containing ARC files.
ProcessTextDocument() - Method in class edu.psu.ist.youseer.Worker
Processes the text document, extracts the title, and strip the HTML tags

R

RawTextContent - Variable in class edu.psu.ist.youseer.SubmitterDocument
The text content of the file before stripping
ReadBinaryDocument(ARCRecord, int, int) - Method in class edu.psu.ist.youseer.ARCSubmitter
Reads the content of the ARC record from the ARC file
ReadConfigFile(String) - Method in class edu.psu.ist.youseer.SubmitterConfig
Reads the SubmitterConfig.xml file, and populates the data of this class.
ReadTextDocument(ARCRecord, int) - Method in class edu.psu.ist.youseer.ARCSubmitter
Reads a text document from the ARC record
run() - Method in class edu.psu.ist.youseer.Worker
 

S

sendPostCommand(String, String) - Method in class edu.psu.ist.youseer.ARCSubmitter
Sends a post request to the server Courtesy of Grant Ingersoll @ IBM
sendPostCommand(String, String) - Method in class edu.psu.ist.youseer.Worker
Sends a post request to the server Courtesy of Grant Ingersoll @ IBM
setByteContent(byte[]) - Method in class edu.psu.ist.youseer.SubmitterDocument
 
setDataType(String) - Method in class edu.psu.ist.youseer.SubmitterDocument
 
setRawTextContent(String) - Method in class edu.psu.ist.youseer.SubmitterDocument
 
setStrippedTextContent(String) - Method in class edu.psu.ist.youseer.SubmitterDocument
 
setTitle(String) - Method in class edu.psu.ist.youseer.SubmitterDocument
 
setupDBConnection() - Method in class edu.psu.ist.youseer.ARCSubmitter
Setup the database connection and create the mandatary tables
StripHTML(String) - Method in class edu.psu.ist.youseer.Worker
Strips the text from the HTML tags.
StrippedTextContent - Variable in class edu.psu.ist.youseer.SubmitterDocument
The file content after stripping the HTML tags
SubmitterConfig - Class in edu.psu.ist.youseer
 
SubmitterConfig() - Constructor for class edu.psu.ist.youseer.SubmitterConfig
 
SubmitterConfig(String, String, String, String, String, String) - Constructor for class edu.psu.ist.youseer.SubmitterConfig
 
SubmitterDocument - Class in edu.psu.ist.youseer
 
SubmitterDocument(String, String, String, int) - Constructor for class edu.psu.ist.youseer.SubmitterDocument
 
SubmitterDocument(String, String, String, String, int) - Constructor for class edu.psu.ist.youseer.SubmitterDocument
 
SubmitterDocument(String, byte[], String, String, int) - Constructor for class edu.psu.ist.youseer.SubmitterDocument
 

T

threadExecutor - Variable in class edu.psu.ist.youseer.ARCSubmitter
 
threadsCount - Variable in class edu.psu.ist.youseer.ARCSubmitter
The number of threads for processing the documents, default is 1
TITLE - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the title
Title - Variable in class edu.psu.ist.youseer.SubmitterDocument
The title of the document

U

URL - Variable in class edu.psu.ist.youseer.ARCSubmitter
URL of th eindex
URL - Variable in class edu.psu.ist.youseer.SubmitterConfig
The solr field that will store the URL
Url - Variable in class edu.psu.ist.youseer.SubmitterDocument
The URL of the document

V

ValidateConfig() - Method in class edu.psu.ist.youseer.SubmitterConfig
Validates this object after populating it from the XML configuration file.

W

WaitQueue - Variable in class edu.psu.ist.youseer.ARCSubmitter
Queue containing the waiting jobs in the thread pool
Worker - Class in edu.psu.ist.youseer
Title:
Worker(ARCSubmitter, SubmitterDocument) - Constructor for class edu.psu.ist.youseer.Worker
 

A B C D E F G I L M O P R S T U V W