org.archive.io.hbase
Class HBaseParameters

java.lang.Object
  extended by org.archive.io.hbase.HBaseParameters
All Implemented Interfaces:
org.archive.io.ArchiveFileConstants

public class HBaseParameters
extends Object
implements org.archive.io.ArchiveFileConstants

Configures the values of the column family/qualifier used for the crawl. Also contains a full set of default values that are the same as the previous Heritrix2 implementation. Meant to be configured within the Spring framework either inline of HBaseWriterProcessor or as a named bean and references later on.

 <bean id="hbaseParameterSettings" class="org.archive.io.hbase.HBaseParameters">
 	<!-- These settings are required -->
 	<property name="zkQuorum" value="localhost" />
 	<property name="hbaseTableName" value="crawl" />
 
 	<!-- This should reflect your installation, but 2181 is the default -->
 	<property name="zkPort" value="2181" />
 
 	<!-- All other settings are optional -->
 	<property name="onlyProcessNewRecords" value="false" />
 	<property name="onlyWriteNewRecords" value="false" />
 	<property name="contentColumnFamily" value="newcontent" />
 	<!-- Overwrite more options here -->
 </bean>
 
 <bean id="hbaseWriterProcessor" class="org.archive.modules.writer.HBaseWriterProcessor">
 	<property name="hbaseParameters">
 		 <ref bean="hbaseParameterSettings"/> 
 	</property>
 </bean>
 
 <bean id="dispositionProcessors" class="org.archive.modules.DispositionChain">
 	<property name="processors">
 		 <list>
 			<ref bean="hbaseWriterProcessor"/>
 			<!-- other references -->
 		</list>
 	 </property>
 </bean>
 
 

See Also:
{@link org.archive.modules.writer.HBaseWriterProcessor} for a full example

Field Summary
static String CONTENT_COLUMN_FAMILY
          The Constant CONTENT_COLUMN_FAMILY.
static String CONTENT_COLUMN_NAME
          The Constant CONTENT_COLUMN_NAME.
static String CURI_COLUMN_FAMILY
          The Constant CURI_COLUMN_FAMILY.
static long DEFAULT_MAX_FILE_SIZE_IN_BYTES
          The Constant DEFAULT_MAX_FILE_SIZE_IN_BYTES.
static String IP_COLUMN_NAME
          The Constant IP_COLUMN_NAME.
static String IS_SEED_COLUMN_NAME
          The Constant IS_SEED_COLUMN_NAME.
static String PATH_FROM_SEED_COLUMN_NAME
          The Constant PATH_FROM_SEED_COLUMN_NAME.
static String REQUEST_COLUMN_NAME
          The Constant REQUEST_COLUMN_NAME.
static String URL_COLUMN_NAME
          The Constant URL_COLUMN_NAME.
static String VIA_COLUMN_NAME
          The Constant VIA_COLUMN_NAME.
static int ZK_PORT
          DEFAULT OPTIONS *.
static String ZOOKEEPER_CLIENT_PORT
          The ZOOKEEPE r_ clien t_ port.
 
Fields inherited from interface org.archive.io.ArchiveFileConstants
ABSOLUTE_OFFSET_KEY, CDX, CDX_FILE, CDX_LINE_BUFFER_SIZE, CRLF, DATE_FIELD_KEY, DEFAULT_DIGEST_METHOD, DOT_COMPRESSED_FILE_EXTENSION, DUMP, GZIP_DUMP, HEADER, INVALID_SUFFIX, LENGTH_FIELD_KEY, MIMETYPE_FIELD_KEY, NOHEAD, OCCUPIED_SUFFIX, READER_IDENTIFIER_FIELD_KEY, RECORD_IDENTIFIER_FIELD_KEY, SINGLE_SPACE, TYPE_FIELD_KEY, URL_FIELD_KEY, VERSION_FIELD_KEY
 
Constructor Summary
HBaseParameters()
           
 
Method Summary
 String getContentColumnFamily()
          Gets the content column family.
 String getContentColumnName()
          Gets the content column name.
 String getCuriColumnFamily()
          Gets the curi column family.
 long getDefaultMaxFileSizeInBytes()
          Gets the default max file size in bytes.
 String getHbaseTableName()
          Gets the hbase table name.
 String getIpColumnName()
          Gets the ip column name.
 String getIsSeedColumnName()
          Gets the checks if is seed column name.
 String getPathFromSeedColumnName()
          Gets the path from seed column name.
 String getRequestColumnName()
          Gets the request column name.
 Serializer getSerializer()
          Gets the serializer.
 String getUrlColumnName()
          Gets the url column name.
 String getViaColumnName()
          Gets the via column name.
 int getZkPort()
          Gets the zk port.
 String getZkQuorum()
          Gets the zk quorum.
 String getZookeeperClientPortKey()
          Gets the zookeeper client port key.
 boolean isMd5Key()
          Checks if is md5 key.
 boolean isOnlyProcessNewRecords()
          Checks if is only process new records.
 boolean isOnlyWriteNewRecords()
          Checks if is only write new records.
 void setContentColumnFamily(String contentColumnFamily)
          Sets the content column family.
 void setContentColumnName(String contentColumnName)
          Sets the content column name.
 void setCuriColumnFamily(String curiColumnFamily)
          Sets the curi column family.
 void setDefaultMaxFileSizeInBytes(long defaultMaxFileSizeInBytes)
          Sets the default max file size in bytes.
 void setHbaseTableName(String tableName)
          Sets the hbase table name.
 void setIpColumnName(String ipColumnName)
          Sets the ip column name.
 void setIsSeedColumnName(String isSeedColumnName)
          Sets the checks if is seed column name.
 void setMd5Key(boolean md5Key)
          Sets the md5 key.
 void setOnlyProcessNewRecords(boolean onlyProcessNewRecords)
          Sets the only process new records.
 void setOnlyWriteNewRecords(boolean onlyWriteNewRecords)
          Sets the only write new records.
 void setPathFromSeedColumnName(String pathFromSeedColumnName)
          Sets the path from seed column name.
 void setRequestColumnName(String requestColumnName)
          Sets the request column name.
 void setSerializer(Serializer serializer)
          Sets the serializer.
 void setUrlColumnName(String urlColumnName)
          Sets the url column name.
 void setViaColumnName(String viaColumnName)
          Sets the via column name.
 void setZkPort(int port)
          Sets the zk port.
 void setZkQuorum(String quorum)
          Sets the zk quorum.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ZK_PORT

public static final int ZK_PORT
DEFAULT OPTIONS *.

See Also:
Constant Field Values

CONTENT_COLUMN_FAMILY

public static final String CONTENT_COLUMN_FAMILY
The Constant CONTENT_COLUMN_FAMILY.

See Also:
Constant Field Values

CONTENT_COLUMN_NAME

public static final String CONTENT_COLUMN_NAME
The Constant CONTENT_COLUMN_NAME.

See Also:
Constant Field Values

CURI_COLUMN_FAMILY

public static final String CURI_COLUMN_FAMILY
The Constant CURI_COLUMN_FAMILY.

See Also:
Constant Field Values

IP_COLUMN_NAME

public static final String IP_COLUMN_NAME
The Constant IP_COLUMN_NAME.

See Also:
Constant Field Values

PATH_FROM_SEED_COLUMN_NAME

public static final String PATH_FROM_SEED_COLUMN_NAME
The Constant PATH_FROM_SEED_COLUMN_NAME.

See Also:
Constant Field Values

IS_SEED_COLUMN_NAME

public static final String IS_SEED_COLUMN_NAME
The Constant IS_SEED_COLUMN_NAME.

See Also:
Constant Field Values

VIA_COLUMN_NAME

public static final String VIA_COLUMN_NAME
The Constant VIA_COLUMN_NAME.

See Also:
Constant Field Values

URL_COLUMN_NAME

public static final String URL_COLUMN_NAME
The Constant URL_COLUMN_NAME.

See Also:
Constant Field Values

REQUEST_COLUMN_NAME

public static final String REQUEST_COLUMN_NAME
The Constant REQUEST_COLUMN_NAME.

See Also:
Constant Field Values

DEFAULT_MAX_FILE_SIZE_IN_BYTES

public static final long DEFAULT_MAX_FILE_SIZE_IN_BYTES
The Constant DEFAULT_MAX_FILE_SIZE_IN_BYTES.

See Also:
Constant Field Values

ZOOKEEPER_CLIENT_PORT

public static String ZOOKEEPER_CLIENT_PORT
The ZOOKEEPE r_ clien t_ port.

Constructor Detail

HBaseParameters

public HBaseParameters()
Method Detail

getZkQuorum

public String getZkQuorum()
Gets the zk quorum.

Returns:
the zk quorum

setZkQuorum

public void setZkQuorum(String quorum)
Sets the zk quorum.

Parameters:
quorum - the new zk quorum

getZkPort

public int getZkPort()
Gets the zk port.

Returns:
the zk port

setZkPort

public void setZkPort(int port)
Sets the zk port.

Parameters:
port - the new zk port

getHbaseTableName

public String getHbaseTableName()
Gets the hbase table name.

Returns:
the hbase table name

setHbaseTableName

public void setHbaseTableName(String tableName)
Sets the hbase table name.

Parameters:
tableName - the new hbase table name

getContentColumnFamily

public String getContentColumnFamily()
Gets the content column family.

Returns:
the content column family

setContentColumnFamily

public void setContentColumnFamily(String contentColumnFamily)
Sets the content column family.

Parameters:
contentColumnFamily - the new content column family

getContentColumnName

public String getContentColumnName()
Gets the content column name.

Returns:
the content column name

setContentColumnName

public void setContentColumnName(String contentColumnName)
Sets the content column name.

Parameters:
contentColumnName - the new content column name

getCuriColumnFamily

public String getCuriColumnFamily()
Gets the curi column family.

Returns:
the curi column family

setCuriColumnFamily

public void setCuriColumnFamily(String curiColumnFamily)
Sets the curi column family.

Parameters:
curiColumnFamily - the new curi column family

getIpColumnName

public String getIpColumnName()
Gets the ip column name.

Returns:
the ip column name

setIpColumnName

public void setIpColumnName(String ipColumnName)
Sets the ip column name.

Parameters:
ipColumnName - the new ip column name

getPathFromSeedColumnName

public String getPathFromSeedColumnName()
Gets the path from seed column name.

Returns:
the path from seed column name

setPathFromSeedColumnName

public void setPathFromSeedColumnName(String pathFromSeedColumnName)
Sets the path from seed column name.

Parameters:
pathFromSeedColumnName - the new path from seed column name

getIsSeedColumnName

public String getIsSeedColumnName()
Gets the checks if is seed column name.

Returns:
the checks if is seed column name

setIsSeedColumnName

public void setIsSeedColumnName(String isSeedColumnName)
Sets the checks if is seed column name.

Parameters:
isSeedColumnName - the new checks if is seed column name

getViaColumnName

public String getViaColumnName()
Gets the via column name.

Returns:
the via column name

setViaColumnName

public void setViaColumnName(String viaColumnName)
Sets the via column name.

Parameters:
viaColumnName - the new via column name

getUrlColumnName

public String getUrlColumnName()
Gets the url column name.

Returns:
the url column name

setUrlColumnName

public void setUrlColumnName(String urlColumnName)
Sets the url column name.

Parameters:
urlColumnName - the new url column name

getRequestColumnName

public String getRequestColumnName()
Gets the request column name.

Returns:
the request column name

setRequestColumnName

public void setRequestColumnName(String requestColumnName)
Sets the request column name.

Parameters:
requestColumnName - the new request column name

getZookeeperClientPortKey

public String getZookeeperClientPortKey()
Gets the zookeeper client port key.

Returns:
the zookeeper client port key

getSerializer

public Serializer getSerializer()
Gets the serializer.

Returns:
the serializer

setSerializer

public void setSerializer(Serializer serializer)
Sets the serializer.

Parameters:
serializer - the new serializer

isMd5Key

public boolean isMd5Key()
Checks if is md5 key.

Returns:
true, if is md5 key

setMd5Key

public void setMd5Key(boolean md5Key)
Sets the md5 key.

Parameters:
md5Key - the new md5 key

isOnlyWriteNewRecords

public boolean isOnlyWriteNewRecords()
Checks if is only write new records.

Returns:
true, if is only write new records

setOnlyWriteNewRecords

public void setOnlyWriteNewRecords(boolean onlyWriteNewRecords)
Sets the only write new records.

Parameters:
onlyWriteNewRecords - the new only write new records

isOnlyProcessNewRecords

public boolean isOnlyProcessNewRecords()
Checks if is only process new records.

Returns:
true, if is only process new records

setOnlyProcessNewRecords

public void setOnlyProcessNewRecords(boolean onlyProcessNewRecords)
Sets the only process new records.

Parameters:
onlyProcessNewRecords - the new only process new records

getDefaultMaxFileSizeInBytes

public long getDefaultMaxFileSizeInBytes()
Gets the default max file size in bytes.

Returns:
the default max file size in bytes

setDefaultMaxFileSizeInBytes

public void setDefaultMaxFileSizeInBytes(long defaultMaxFileSizeInBytes)
Sets the default max file size in bytes.

Parameters:
defaultMaxFileSizeInBytes - the new default max file size in bytes


Copyright © 2007-2012. All Rights Reserved.