Class TikaCharsetFinder

All Implemented Interfaces:
CharactersetFinder

public class TikaCharsetFinder extends AbstractCharactersetFinder
Uses Apache Tika as a fallback encoding detector
Since:
3.4
Author:
Nick Burch
  • Constructor Details

    • TikaCharsetFinder

      public TikaCharsetFinder()
  • Method Details

    • detectCharsetImpl

      protected Charset detectCharsetImpl(byte[] buffer) throws Exception
      Specified by:
      detectCharsetImpl in class AbstractCharactersetFinder
      Throws:
      Exception
    • getThreshold

      public int getThreshold()
      Return the matching threshold before we decide that what we detected is a good match. In the range 0-100.
    • setThreshold

      public void setThreshold(int threshold)
      At what point do we decide our match is good enough? In the range 0-100. If we don't reach the threshold, we'll decline, and either another finder will work on it or the fallback encoding will be taken.