C D F G H I K M O P R S U V W Z 

C

CONTENT_COLUMN_FAMILY - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CONTENT_COLUMN_FAMILY.
CONTENT_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CONTENT_COLUMN_NAME.
CONTENT_LENGTH_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 
CONTENT_SIZE_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 
CONTENT_TYPE_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 
createKey(String, String) - Static method in class org.archive.io.hbase.Keying
Makes a key out of passed URI for use as row name or column qualifier.
createRowKeyFromUrl(String) - Static method in class org.archive.io.hbase.HBaseWriter
This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.
createUrlFromRowKey(String) - Static method in class org.archive.io.hbase.HBaseWriter
 
CURI_COLUMN_FAMILY - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant CURI_COLUMN_FAMILY.

D

DEFAULT_MAX_FILE_SIZE_IN_BYTES - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant DEFAULT_MAX_FILE_SIZE_IN_BYTES.
defaultHbaseTableNameSpace - Static variable in class org.archive.io.hbase.HBaseParameters
 
DOMAIN_NAME_DELIMITER - Static variable in class org.archive.io.hbase.Keying
 

F

FETCH_ANNOTATIONS_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 
FETCH_ANNOTATIONS_VALUE_DELIMITER - Static variable in class org.archive.io.hbase.HBaseParameters
 
FETCH_ATTEMPTS_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 
FETCH_DURATION_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
 

G

generateWriterPool(AtomicInteger) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
getByteArrayFromInputStream(ReplayInputStream, int) - Static method in class org.archive.io.hbase.HBaseWriter
Read the ReplayInputStream and write it to the given BatchUpdate with the given column.
getContentColumnFamily() - Method in class org.archive.io.hbase.HBaseParameters
Gets the content column family.
getContentColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the content column name.
getContentLengthColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getContentSizeColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getContentTypeColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getCuriColumnFamily() - Method in class org.archive.io.hbase.HBaseParameters
Gets the curi column family.
getDefaultMaxFileSize() - Method in class org.archive.modules.writer.HBaseWriterProcessor
Gets the default max file size.
getDefaultMaxFileSizeInBytes() - Method in class org.archive.io.hbase.HBaseParameters
Gets the default max file size in bytes.
getDefaultStorePaths() - Method in class org.archive.modules.writer.HBaseWriterProcessor
Gets the default store paths.
getFetchAnnotationsColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getFetchAnnotationsValueDelimiter() - Method in class org.archive.io.hbase.HBaseParameters
 
getFetchAttmptsColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getFetchDurationColumnName() - Method in class org.archive.io.hbase.HBaseParameters
 
getHbaseParameters() - Method in class org.archive.io.hbase.HBaseWriter
Gets the hbase options.
getHbaseParameters() - Method in class org.archive.modules.writer.HBaseWriterProcessor
Gets the hbase parameters.
getHbaseTableName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the hbase table name.
getHTable() - Method in class org.archive.io.hbase.HBaseWriter
Gets the HTable client.
getIpColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the ip column name.
getIsSeedColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the checks if is seed column name.
getMetadata() - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
getPathFromSeedColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the path from seed column name.
getRecordIDGenerator() - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
getRequestColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the request column name.
getSerializer() - Method in class org.archive.io.hbase.HBaseParameters
Gets the serializer.
getUrlColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the url column name.
getViaColumnName() - Method in class org.archive.io.hbase.HBaseParameters
Gets the via column name.
getZkPort() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zk port.
getZkQuorum() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zk quorum.
getZookeeperClientPortKey() - Method in class org.archive.io.hbase.HBaseParameters
Gets the zookeeper client port key.

H

HBaseParameters - Class in org.archive.io.hbase
Configures the values of the column family/qualifier used for the crawl.
HBaseParameters() - Constructor for class org.archive.io.hbase.HBaseParameters
 
HBaseWriter - Class in org.archive.io.hbase
HBaseWriter implementation.
HBaseWriter(AtomicInteger, WriterPoolSettings, HBaseParameters) - Constructor for class org.archive.io.hbase.HBaseWriter
Instantiates a new h base writer.
HBaseWriterPool - Class in org.archive.io.hbase
The Class HBaseWriterPool.
HBaseWriterPool(AtomicInteger, WriterPoolSettings, int, int, HBaseParameters) - Constructor for class org.archive.io.hbase.HBaseWriterPool
Instantiates a new HBase writer pool.
HBaseWriterProcessor - Class in org.archive.modules.writer
A Heritrix 3 processor that writes to Hadoop HBase.
HBaseWriterProcessor() - Constructor for class org.archive.modules.writer.HBaseWriterProcessor
 

I

initializeCrawlTable(Configuration, String) - Method in class org.archive.io.hbase.HBaseWriter
Creates the crawl table in HBase.
innerProcessResult(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
IP_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant IP_COLUMN_NAME.
IS_SEED_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant IS_SEED_COLUMN_NAME.
isMd5Key() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is md5 key.
isOnlyProcessNewRecords() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is only process new records.
isOnlyWriteNewRecords() - Method in class org.archive.io.hbase.HBaseParameters
Checks if is only write new records.

K

Keying - Class in org.archive.io.hbase
Utility creating hbase friendly keys.
Keying() - Constructor for class org.archive.io.hbase.Keying
 
keyToUri(String, String) - Static method in class org.archive.io.hbase.Keying
Reverse the #createKey(String) transform.

M

makeWriter() - Method in class org.archive.io.hbase.HBaseWriterPool
 
modifyPut(HBaseParameters, CrawlURI, String, Put, RecordingOutputStream, RecordingInputStream) - Method in class org.archive.modules.writer.HBaseWriterProcessor
* This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.

O

org.archive.io.hbase - package org.archive.io.hbase
Provides HBase writer for heritrix.
org.archive.modules.writer - package org.archive.modules.writer
 

P

PATH_FROM_SEED_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant PATH_FROM_SEED_COLUMN_NAME.

R

REFERER_URL_SCHEME - Static variable in class org.archive.io.hbase.Keying
 
REQUEST_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant REQUEST_COLUMN_NAME.
reverseHostname(String) - Static method in class org.archive.io.hbase.Keying
 

S

serialize(byte[]) - Method in class org.archive.io.hbase.HBaseWriter
 
serialize(byte[]) - Method in interface org.archive.io.hbase.Serializer
Implement if you want to serialize bytes in a custom manner.
Serializer - Interface in org.archive.io.hbase
The Interface Serializer.
serialVersionUID - Static variable in class org.archive.modules.writer.HBaseWriterProcessor
The Constant serialVersionUID.
setContentColumnFamily(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the content column family.
setContentColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the content column name.
setContentLengthColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setContentSizeColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setContentTypeColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setCuriColumnFamily(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the curi column family.
setDefaultMaxFileSizeInBytes(long) - Method in class org.archive.io.hbase.HBaseParameters
Sets the default max file size in bytes.
setFetchAnnotationsColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setFetchAnnotationsValueDelimiter(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setFetchAttmptsColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setFetchDurationColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
 
setHbaseParameters(HBaseParameters) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Sets the hbase parameters.
setHbaseTableName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the hbase table name.
setIpColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the ip column name.
setIsSeedColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the checks if is seed column name.
setMd5Key(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the md5 key.
setOnlyProcessNewRecords(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the only process new records.
setOnlyWriteNewRecords(boolean) - Method in class org.archive.io.hbase.HBaseParameters
Sets the only write new records.
setPathFromSeedColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the path from seed column name.
setRequestColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the request column name.
setSerializer(Serializer) - Method in class org.archive.io.hbase.HBaseParameters
Sets the serializer.
setupPool(AtomicInteger) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
setUrlColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the url column name.
setViaColumnName(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the via column name.
setZkPort(int) - Method in class org.archive.io.hbase.HBaseParameters
Sets the zk port.
setZkQuorum(String) - Method in class org.archive.io.hbase.HBaseParameters
Sets the zk quorum.
shouldProcess(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
 
shouldWrite(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Whether the given CrawlURI should be written to archive files.

U

URL_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant URL_COLUMN_NAME.

V

VIA_COLUMN_NAME - Static variable in class org.archive.io.hbase.HBaseParameters
The Constant VIA_COLUMN_NAME.

W

write(HBaseWriterProcessor, CrawlURI, String, RecordingOutputStream, RecordingInputStream, long) - Method in class org.archive.io.hbase.HBaseWriter
Write the crawled output to the configured HBase table.
write(CrawlURI) - Method in class org.archive.modules.writer.HBaseWriterProcessor
Write to HBase.

Z

ZK_PORT - Static variable in class org.archive.io.hbase.HBaseParameters
DEFAULT OPTIONS *.
ZOOKEEPER_CLIENT_PORT - Static variable in class org.archive.io.hbase.HBaseParameters
The ZOOKEEPER client port.
C D F G H I K M O P R S U V W Z 

Copyright © 2007–2014. All rights reserved.