public class TikaContentExtractor extends Object
| Modifier and Type | Class and Description |
|---|---|
static class |
TikaContentExtractor.TikaContent
Extracted content, metadata and media type container
|
| Constructor and Description |
|---|
TikaContentExtractor(List<Parser> parsers)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(List<Parser> parsers,
Detector detector)
Create new Tika-based content extractor using the provided parser instances.
|
TikaContentExtractor(Parser parser)
Create new Tika-based content extractor using the provided parser instance.
|
TikaContentExtractor(Parser parser,
boolean validateMediaType)
Create new Tika-based content extractor using the provided parser instance and
optional media type validation.
|
| Modifier and Type | Method and Description |
|---|---|
TikaContentExtractor.TikaContent |
extract(InputStream in)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extract(InputStream in,
ContentHandler handler,
ParseContext context)
Extract the content and metadata from the input stream.
|
TikaContentExtractor.TikaContent |
extractMetadata(InputStream in)
Extract the metadata only from the input stream.
|
SearchBean |
extractMetadataToSearchBean(InputStream in)
Extract the metadata only from the input stream.
|
public TikaContentExtractor(Parser parser)
parser - parser instancepublic TikaContentExtractor(List<Parser> parsers)
parsers - parser instancespublic TikaContentExtractor(List<Parser> parsers, Detector detector)
parsers - parser instancespublic TikaContentExtractor(Parser parser,
boolean validateMediaType)
parser - parser instancevalidateMediaType - enabled or disable media type validationparserpublic TikaContentExtractor.TikaContent extract(InputStream in)
in - input stream to extract the content and metadata frompublic TikaContentExtractor.TikaContent extractMetadata(InputStream in)
in - input stream to extract the metadata frompublic SearchBean extractMetadataToSearchBean(InputStream in)
in - input stream to extract the metadata frompublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler)
in - input stream to extract the metadata fromhandler - custom ContentHandlerpublic TikaContentExtractor.TikaContent extract(InputStream in, ContentHandler handler, ParseContext context)
in - input stream to extract the metadata fromhandler - custom ContentHandlercontext - custom contextApache CXF