Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
C
D
G
H
I
M
O
P
R
S
U
V
W
Z
C
CONTENT_COLUMN_FAMILY
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant CONTENT_COLUMN_FAMILY.
CONTENT_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant CONTENT_COLUMN_NAME.
CURI_COLUMN_FAMILY
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant CURI_COLUMN_FAMILY.
D
DEFAULT_MAX_FILE_SIZE_IN_BYTES
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant DEFAULT_MAX_FILE_SIZE_IN_BYTES.
G
getByteArrayFromInputStream(ReplayInputStream, int)
- Method in class org.archive.io.hbase.
HBaseWriter
Read the ReplayInputStream and write it to the given BatchUpdate with the given column.
getClient()
- Method in class org.archive.io.hbase.
HBaseWriter
Gets the HTable client.
getContentColumnFamily()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the content column family.
getContentColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the content column name.
getCuriColumnFamily()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the curi column family.
getDefaultMaxFileSizeInBytes()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the default max file size in bytes.
getHbaseOptions()
- Method in class org.archive.io.hbase.
HBaseWriter
Gets the hbase options.
getHbaseParameters()
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
Gets the hbase parameters.
getHbaseTableName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the hbase table name.
getIpColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the ip column name.
getIsSeedColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the checks if is seed column name.
getMetadata()
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
getPathFromSeedColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the path from seed column name.
getRecordIDGenerator()
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
getRequestColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the request column name.
getSerializer()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the serializer.
getUrlColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the url column name.
getViaColumnName()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the via column name.
getZkPort()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the zk port.
getZkQuorum()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the zk quorum.
getZookeeperClientPortKey()
- Method in class org.archive.io.hbase.
HBaseParameters
Gets the zookeeper client port key.
H
HBaseParameters
- Class in
org.archive.io.hbase
Configures the values of the column family/qualifier used for the crawl.
HBaseParameters()
- Constructor for class org.archive.io.hbase.
HBaseParameters
HBaseWriter
- Class in
org.archive.io.hbase
HBase implementation.
HBaseWriter(AtomicInteger, WriterPoolSettings, HBaseParameters)
- Constructor for class org.archive.io.hbase.
HBaseWriter
Instantiates a new h base writer.
HBaseWriterPool
- Class in
org.archive.io.hbase
The Class HBaseWriterPool.
HBaseWriterPool(AtomicInteger, WriterPoolSettings, int, int, HBaseParameters)
- Constructor for class org.archive.io.hbase.
HBaseWriterPool
Instantiates a new h base writer pool.
HBaseWriterProcessor
- Class in
org.archive.modules.writer
A
Heritrix 3
processor that writes to
Hadoop HBase
.
HBaseWriterProcessor()
- Constructor for class org.archive.modules.writer.
HBaseWriterProcessor
I
initializeCrawlTable(Configuration, String)
- Method in class org.archive.io.hbase.
HBaseWriter
Creates the crawl table in HBase.
innerProcessResult(CrawlURI)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
IP_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant IP_COLUMN_NAME.
IS_SEED_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant IS_SEED_COLUMN_NAME.
isMd5Key()
- Method in class org.archive.io.hbase.
HBaseParameters
Checks if is md5 key.
isOnlyProcessNewRecords()
- Method in class org.archive.io.hbase.
HBaseParameters
Checks if is only process new records.
isOnlyWriteNewRecords()
- Method in class org.archive.io.hbase.
HBaseParameters
Checks if is only write new records.
M
makeWriter()
- Method in class org.archive.io.hbase.
HBaseWriterPool
O
org.archive.io.hbase
- package org.archive.io.hbase
Provides
HBase
writer for
heritrix
.
org.archive.modules.writer
- package org.archive.modules.writer
P
PATH_FROM_SEED_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant PATH_FROM_SEED_COLUMN_NAME.
processContent(Put, ReplayInputStream, int)
- Method in class org.archive.io.hbase.
HBaseWriter
This is a stub method and is here to allow extension/overriding for custom content parsing, data manipulation and to populate new columns.
R
REQUEST_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant REQUEST_COLUMN_NAME.
S
serialize(byte[])
- Method in class org.archive.io.hbase.
HBaseWriter
serialize(byte[])
- Method in interface org.archive.io.hbase.
Serializer
Implement if you want to serialize bytes in a custom manner.
Serializer
- Interface in
org.archive.io.hbase
The Interface Serializer.
setContentColumnFamily(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the content column family.
setContentColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the content column name.
setCuriColumnFamily(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the curi column family.
setDefaultMaxFileSizeInBytes(long)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the default max file size in bytes.
setHbaseParameters(HBaseParameters)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
Sets the hbase parameters.
setHbaseTableName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the hbase table name.
setIpColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the ip column name.
setIsSeedColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the checks if is seed column name.
setMd5Key(boolean)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the md5 key.
setOnlyProcessNewRecords(boolean)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the only process new records.
setOnlyWriteNewRecords(boolean)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the only write new records.
setPathFromSeedColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the path from seed column name.
setRequestColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the request column name.
setSerializer(Serializer)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the serializer.
setupPool(AtomicInteger)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
setUrlColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the url column name.
setViaColumnName(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the via column name.
setZkPort(int)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the zk port.
setZkQuorum(String)
- Method in class org.archive.io.hbase.
HBaseParameters
Sets the zk quorum.
shouldProcess(CrawlURI)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
shouldWrite(CrawlURI)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
Whether the given CrawlURI should be written to archive files.
U
URL_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant URL_COLUMN_NAME.
V
VIA_COLUMN_NAME
- Static variable in class org.archive.io.hbase.
HBaseParameters
The Constant VIA_COLUMN_NAME.
W
write(CrawlURI, String, RecordingOutputStream, RecordingInputStream)
- Method in class org.archive.io.hbase.
HBaseWriter
Write the crawled output to the configured HBase table.
write(CrawlURI, long, InputStream)
- Method in class org.archive.modules.writer.
HBaseWriterProcessor
Write to HBase.
Z
ZK_PORT
- Static variable in class org.archive.io.hbase.
HBaseParameters
DEFAULT OPTIONS *.
ZOOKEEPER_CLIENT_PORT
- Static variable in class org.archive.io.hbase.
HBaseParameters
The ZOOKEEPE r_ clien t_ port.
C
D
G
H
I
M
O
P
R
S
U
V
W
Z
Overview
Package
Class
Use
Tree
Deprecated
Index
Help
PREV NEXT
FRAMES
NO FRAMES
All Classes
Copyright © 2007-2012. All Rights Reserved.