Package org.alfresco.encoding
Interface CharactersetFinder
-
- All Known Implementing Classes:
AbstractCharactersetFinder
,BomCharactersetFinder
,GuessEncodingCharsetFinder
public interface CharactersetFinder
Interface for classes that are able to read a text-based input stream and determine the character encoding.There are quite a few libraries that do this, but none are perfect. It is therefore necessary to abstract the implementation to allow these finders to be configured in as required.
Implementations should have a default constructor and be completely thread safe and stateless. This will allow them to be constructed and held indefinitely to do the decoding work.
Where the encoding cannot be determined, it is left to the client to decide what to do. Some implementations may guess and encoding or use a default guess - it is up to the implementation to specify the behaviour.
- Since:
- 2.1
- Author:
- Derek Hulley
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Charset
detectCharset(byte[] buffer)
Attempt to detect the character set encoding for the given buffer.Charset
detectCharset(InputStream is)
Attempt to detect the character set encoding for the give input stream.
-
-
-
Method Detail
-
detectCharset
Charset detectCharset(InputStream is)
Attempt to detect the character set encoding for the give input stream. The input stream will not be altered or closed by this method, and must therefore support marking. If the input stream available doesn't support marking, then it can be wrapped with aBufferedInputStream
.The current state of the stream will be restored before the method returns.
- Parameters:
is
- an input stream that must support marking- Returns:
- Returns the encoding of the stream, or null if encoding cannot be identified
-
detectCharset
Charset detectCharset(byte[] buffer)
Attempt to detect the character set encoding for the given buffer.- Parameters:
buffer
- the first n bytes of the character stream- Returns:
- Returns the encoding of the buffer, or null if encoding cannot be identified
-
-