Class AbstractCharactersetFinder

    • Method Summary

      All Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      java.nio.charset.Charset detectCharset​(byte[] buffer)
      Attempt to detect the character set encoding for the given buffer.
      java.nio.charset.Charset detectCharset​(java.io.InputStream is)
      Attempt to detect the character set encoding for the give input stream.
      protected abstract java.nio.charset.Charset detectCharsetImpl​(byte[] buffer)
      Worker method for implementations to override.
      protected int getBufferSize()
      Some implementations may only require a few bytes to do detect the stream type, whilst others may be more efficient with larger buffers.
      void setBufferSize​(int bufferSize)
      Set the maximum number of bytes to read ahead when attempting to determine the characterset.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • AbstractCharactersetFinder

        public AbstractCharactersetFinder()
    • Method Detail

      • setBufferSize

        public void setBufferSize​(int bufferSize)
        Set the maximum number of bytes to read ahead when attempting to determine the characterset. Most characterset detectors are efficient and can process 8K of buffered data very quickly. Some, may need to be constrained a bit.
        Parameters:
        bufferSize - the number of bytes - default 8K.
      • detectCharset

        public final java.nio.charset.Charset detectCharset​(java.io.InputStream is)
        Attempt to detect the character set encoding for the give input stream. The input stream will not be altered or closed by this method, and must therefore support marking. If the input stream available doesn't support marking, then it can be wrapped with a BufferedInputStream.

        The current state of the stream will be restored before the method returns.

        The input stream is checked to ensure that it supports marks, after which a buffer is extracted, leaving the stream in its original state.

        Specified by:
        detectCharset in interface CharactersetFinder
        Parameters:
        is - an input stream that must support marking
        Returns:
        Returns the encoding of the stream, or null if encoding cannot be identified
      • detectCharset

        public final java.nio.charset.Charset detectCharset​(byte[] buffer)
        Description copied from interface: CharactersetFinder
        Attempt to detect the character set encoding for the given buffer.
        Specified by:
        detectCharset in interface CharactersetFinder
        Parameters:
        buffer - the first n bytes of the character stream
        Returns:
        Returns the encoding of the buffer, or null if encoding cannot be identified
      • getBufferSize

        protected int getBufferSize()
        Some implementations may only require a few bytes to do detect the stream type, whilst others may be more efficient with larger buffers. In either case, the number of bytes actually present in the buffer cannot be enforced.

        Only override this method if there is a very compelling reason to adjust the buffer size, and then consider handling the setBufferSize(int) method by issuing a warning. This will prevent users from setting the buffer size when it has no effect.

        Returns:
        Returns the maximum desired size of the buffer passed to the CharactersetFinder.detectCharset(byte[]) method.
        See Also:
        setBufferSize(int)
      • detectCharsetImpl

        protected abstract java.nio.charset.Charset detectCharsetImpl​(byte[] buffer)
                                                               throws java.lang.Exception
        Worker method for implementations to override. All exceptions will be reported and absorbed and null returned.

        The interface contract is that the data buffer must not be altered in any way.

        Parameters:
        buffer - the buffer of data no bigger than the requested best buffer size. This can, very efficiently, be turned into an InputStream using a ByteArrayInputStream.
        Returns:
        Returns the charset or null if an accurate conclusion is not possible
        Throws:
        java.lang.Exception - Any exception, checked or not