Class BomCharactersetFinder

  • All Implemented Interfaces:
    CharactersetFinder

    public class BomCharactersetFinder
    extends AbstractCharactersetFinder
    Byte Order Marker encoding detection.
    Since:
    2.1
    Author:
    Pacific Northwest National Lab, Derek Hulley
    • Constructor Detail

      • BomCharactersetFinder

        public BomCharactersetFinder()
    • Method Detail

      • setBufferSize

        public void setBufferSize​(int bufferSize)
        Description copied from class: AbstractCharactersetFinder
        Set the maximum number of bytes to read ahead when attempting to determine the characterset. Most characterset detectors are efficient and can process 8K of buffered data very quickly. Some, may need to be constrained a bit.
        Overrides:
        setBufferSize in class AbstractCharactersetFinder
        Parameters:
        bufferSize - the number of bytes - default 8K.
      • detectCharsetImpl

        protected Charset detectCharsetImpl​(byte[] buffer)
                                     throws Exception
        Just searches the Byte Order Marker, i.e. the first three characters for a sign of the encoding.
        Specified by:
        detectCharsetImpl in class AbstractCharactersetFinder
        Parameters:
        buffer - the buffer of data no bigger than the requested best buffer size. This can, very efficiently, be turned into an InputStream using a ByteArrayInputStream.
        Returns:
        Returns the charset or null if an accurate conclusion is not possible
        Throws:
        Exception - Any exception, checked or not