Package org.alfresco.encoding
Class AbstractCharactersetFinder
java.lang.Object
org.alfresco.encoding.AbstractCharactersetFinder
- All Implemented Interfaces:
CharactersetFinder
- Direct Known Subclasses:
BomCharactersetFinder
,GuessEncodingCharsetFinder
- Since:
- 2.1
- Author:
- Derek Hulley
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal Charset
detectCharset
(byte[] buffer) Attempt to detect the character set encoding for the given buffer.final Charset
Attempt to detect the character set encoding for the give input stream.protected abstract Charset
detectCharsetImpl
(byte[] buffer) Worker method for implementations to override.protected int
Some implementations may only require a few bytes to do detect the stream type, whilst others may be more efficient with larger buffers.void
setBufferSize
(int bufferSize) Set the maximum number of bytes to read ahead when attempting to determine the characterset.
-
Constructor Details
-
AbstractCharactersetFinder
public AbstractCharactersetFinder()
-
-
Method Details
-
setBufferSize
public void setBufferSize(int bufferSize) Set the maximum number of bytes to read ahead when attempting to determine the characterset. Most characterset detectors are efficient and can process 8K of buffered data very quickly. Some, may need to be constrained a bit.- Parameters:
bufferSize
- the number of bytes - default 8K.
-
detectCharset
Attempt to detect the character set encoding for the give input stream. The input stream will not be altered or closed by this method, and must therefore support marking. If the input stream available doesn't support marking, then it can be wrapped with aBufferedInputStream
.The current state of the stream will be restored before the method returns.
The input stream is checked to ensure that it supports marks, after which a buffer is extracted, leaving the stream in its original state.
- Specified by:
detectCharset
in interfaceCharactersetFinder
- Parameters:
is
- an input stream that must support marking- Returns:
- Returns the encoding of the stream, or null if encoding cannot be identified
-
detectCharset
Description copied from interface:CharactersetFinder
Attempt to detect the character set encoding for the given buffer.- Specified by:
detectCharset
in interfaceCharactersetFinder
- Parameters:
buffer
- the first n bytes of the character stream- Returns:
- Returns the encoding of the buffer, or null if encoding cannot be identified
-
getBufferSize
protected int getBufferSize()Some implementations may only require a few bytes to do detect the stream type, whilst others may be more efficient with larger buffers. In either case, the number of bytes actually present in the buffer cannot be enforced.Only override this method if there is a very compelling reason to adjust the buffer size, and then consider handling the
setBufferSize(int)
method by issuing a warning. This will prevent users from setting the buffer size when it has no effect.- Returns:
- Returns the maximum desired size of the buffer passed
to the
CharactersetFinder.detectCharset(byte[])
method. - See Also:
-
detectCharsetImpl
Worker method for implementations to override. All exceptions will be reported and absorbed and null returned.The interface contract is that the data buffer must not be altered in any way.
- Parameters:
buffer
- the buffer of data no bigger than the requested best buffer size. This can, very efficiently, be turned into an InputStream using a ByteArrayInputStream.- Returns:
- Returns the charset or null if an accurate conclusion is not possible
- Throws:
Exception
- Any exception, checked or not
-