Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
A
B
C
D
E
F
G
I
L
M
O
P
R
S
T
U
V
W
A
ARCSubmitter
- Class in
edu.psu.ist.youseer
Title: ARCSubmitter
ARCSubmitter()
- Constructor for class edu.psu.ist.youseer.
ARCSubmitter
B
ByteContent
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The binary content of any not text document
C
CACHE
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the location of the cached version of this file
CacheFolder
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
The virtual path that will have the ARV files in it
CacheFolder
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The path that will have the ARC files under it
Config
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
Cinfiguration object
ContainingFile
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The absolute path of the ARC file that contains this document
Count
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
Number of documents submitted so far
D
DatabaseProvider
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
Database provider, read from the XML configuration file
DataType
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
i.e.
DBConnectionString
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The database connection string, read from the XML configuration file
doc
- Variable in class edu.psu.ist.youseer.
Worker
DOCUMENT_TEXT
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the content
E
edu.psu.ist.youseer
- package edu.psu.ist.youseer
Extractor
- Class in
edu.psu.ist.youseer
Extractor()
- Constructor for class edu.psu.ist.youseer.
Extractor
F
FILE_TYPE
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the file type of the document
FlushIndexedDocs()
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Insert all the processed URLs (ARC records) to the database.
G
GenerateCustomeData(SubmitterDocument)
- Static method in class edu.psu.ist.youseer.
Extractor
GenerateCustomeData1(SubmitterDocument)
- Static method in class edu.psu.ist.youseer.
Extractor
GenerateDocument()
- Method in class edu.psu.ist.youseer.
Worker
Generates solr document for the processed ARC record using the tags from the configuration file
getByteContent()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getContainingFile()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getDataType()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getOffset()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getRawTextContent()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getStrippedTextContent()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getTitle()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
getTitle(Source)
- Static method in class edu.psu.ist.youseer.
Worker
Extracts the title out of a text document using Jericho parser
getUrl()
- Method in class edu.psu.ist.youseer.
SubmitterDocument
I
IndexedDocs
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
List of documents that have been indexed but not yet inserted to the database
IndexedTypes
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
Contains a set of all the Indexable file types, populated from the XML Configuration file
IndexURL
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The URL of the index
InsertToDB(File)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Inserts this file to the database when the submitter completes processing all its records
InsertToDB(String)
- Method in class edu.psu.ist.youseer.
Worker
Inserts a log entry to the database that the current document wasn't submitted to the index
IsIndexed(File)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Check whether the file has been already submitted to the index or not
L
LINE_SEP
- Static variable in class edu.psu.ist.youseer.
ARCSubmitter
Line separator
M
main(String[])
- Static method in class edu.psu.ist.youseer.
ARCSubmitter
O
OFFSET
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the offset within the ARC file
Offset
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
offset within the ARC file
OrgiginalPart
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
The root folder of the ARC files
OriginalPart
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The absolute path of the root folder that contains the ARC files
P
parent
- Variable in class edu.psu.ist.youseer.
Worker
ProcessBinaryDocument()
- Method in class edu.psu.ist.youseer.
Worker
Process the bindary document, converts it to plain text using apache tika, and then extracts the title of the file
ProcessFolder(String)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Process a folder full of ARC files, or subfolders containing ARC files.
ProcessTextDocument()
- Method in class edu.psu.ist.youseer.
Worker
Processes the text document, extracts the title, and strip the HTML tags
R
RawTextContent
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The text content of the file before stripping
ReadBinaryDocument(ARCRecord, int, int)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Reads the content of the ARC record from the ARC file
ReadConfigFile(String)
- Method in class edu.psu.ist.youseer.
SubmitterConfig
Reads the SubmitterConfig.xml file, and populates the data of this class.
ReadTextDocument(ARCRecord, int)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Reads a text document from the ARC record
run()
- Method in class edu.psu.ist.youseer.
Worker
S
sendPostCommand(String, String)
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Sends a post request to the server Courtesy of Grant Ingersoll @ IBM
sendPostCommand(String, String)
- Method in class edu.psu.ist.youseer.
Worker
Sends a post request to the server Courtesy of Grant Ingersoll @ IBM
setByteContent(byte[])
- Method in class edu.psu.ist.youseer.
SubmitterDocument
setDataType(String)
- Method in class edu.psu.ist.youseer.
SubmitterDocument
setRawTextContent(String)
- Method in class edu.psu.ist.youseer.
SubmitterDocument
setStrippedTextContent(String)
- Method in class edu.psu.ist.youseer.
SubmitterDocument
setTitle(String)
- Method in class edu.psu.ist.youseer.
SubmitterDocument
setupDBConnection()
- Method in class edu.psu.ist.youseer.
ARCSubmitter
Setup the database connection and create the mandatary tables
StripHTML(String)
- Method in class edu.psu.ist.youseer.
Worker
Strips the text from the HTML tags.
StrippedTextContent
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The file content after stripping the HTML tags
SubmitterConfig
- Class in
edu.psu.ist.youseer
SubmitterConfig()
- Constructor for class edu.psu.ist.youseer.
SubmitterConfig
SubmitterConfig(String, String, String, String, String, String)
- Constructor for class edu.psu.ist.youseer.
SubmitterConfig
SubmitterDocument
- Class in
edu.psu.ist.youseer
SubmitterDocument(String, String, String, int)
- Constructor for class edu.psu.ist.youseer.
SubmitterDocument
SubmitterDocument(String, String, String, String, int)
- Constructor for class edu.psu.ist.youseer.
SubmitterDocument
SubmitterDocument(String, byte[], String, String, int)
- Constructor for class edu.psu.ist.youseer.
SubmitterDocument
T
threadExecutor
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
threadsCount
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
The number of threads for processing the documents, default is 1
TITLE
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the title
Title
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The title of the document
U
URL
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
URL of th eindex
URL
- Variable in class edu.psu.ist.youseer.
SubmitterConfig
The solr field that will store the URL
Url
- Variable in class edu.psu.ist.youseer.
SubmitterDocument
The URL of the document
V
ValidateConfig()
- Method in class edu.psu.ist.youseer.
SubmitterConfig
Validates this object after populating it from the XML configuration file.
W
WaitQueue
- Variable in class edu.psu.ist.youseer.
ARCSubmitter
Queue containing the waiting jobs in the thread pool
Worker
- Class in
edu.psu.ist.youseer
Title:
Worker(ARCSubmitter, SubmitterDocument)
- Constructor for class edu.psu.ist.youseer.
Worker
A
B
C
D
E
F
G
I
L
M
O
P
R
S
T
U
V
W
Package
Class
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes