Class AbstractCharactersetFinder

java.lang.Object
org.alfresco.encoding.AbstractCharactersetFinder
All Implemented Interfaces:
CharactersetFinder
Direct Known Subclasses:
BomCharactersetFinder, GuessEncodingCharsetFinder

public abstract class AbstractCharactersetFinder extends Object implements CharactersetFinder
Since:
2.1
Author:
Derek Hulley
  • Constructor Details

    • AbstractCharactersetFinder

      public AbstractCharactersetFinder()
  • Method Details

    • setBufferSize

      public void setBufferSize(int bufferSize)
      Set the maximum number of bytes to read ahead when attempting to determine the characterset. Most characterset detectors are efficient and can process 8K of buffered data very quickly. Some, may need to be constrained a bit.
      Parameters:
      bufferSize - the number of bytes - default 8K.
    • detectCharset

      public final Charset detectCharset(InputStream is)
      Attempt to detect the character set encoding for the give input stream. The input stream will not be altered or closed by this method, and must therefore support marking. If the input stream available doesn't support marking, then it can be wrapped with a BufferedInputStream.

      The current state of the stream will be restored before the method returns.

      The input stream is checked to ensure that it supports marks, after which a buffer is extracted, leaving the stream in its original state.

      Specified by:
      detectCharset in interface CharactersetFinder
      Parameters:
      is - an input stream that must support marking
      Returns:
      Returns the encoding of the stream, or null if encoding cannot be identified
    • detectCharset

      public final Charset detectCharset(byte[] buffer)
      Description copied from interface: CharactersetFinder
      Attempt to detect the character set encoding for the given buffer.
      Specified by:
      detectCharset in interface CharactersetFinder
      Parameters:
      buffer - the first n bytes of the character stream
      Returns:
      Returns the encoding of the buffer, or null if encoding cannot be identified
    • getBufferSize

      protected int getBufferSize()
      Some implementations may only require a few bytes to do detect the stream type, whilst others may be more efficient with larger buffers. In either case, the number of bytes actually present in the buffer cannot be enforced.

      Only override this method if there is a very compelling reason to adjust the buffer size, and then consider handling the setBufferSize(int) method by issuing a warning. This will prevent users from setting the buffer size when it has no effect.

      Returns:
      Returns the maximum desired size of the buffer passed to the CharactersetFinder.detectCharset(byte[]) method.
      See Also:
    • detectCharsetImpl

      protected abstract Charset detectCharsetImpl(byte[] buffer) throws Exception
      Worker method for implementations to override. All exceptions will be reported and absorbed and null returned.

      The interface contract is that the data buffer must not be altered in any way.

      Parameters:
      buffer - the buffer of data no bigger than the requested best buffer size. This can, very efficiently, be turned into an InputStream using a ByteArrayInputStream.
      Returns:
      Returns the charset or null if an accurate conclusion is not possible
      Throws:
      Exception - Any exception, checked or not