Apache Tika

Tika
DeveloperApache Software Foundation
Stable release
3.2.3  / 9 September 2025 (9 September 2025)
Written inJava
Operating systemCross-platform
TypeSearch and index API
LicenseApache License 2.0
Websitetika.apache.org
RepositoryTika Repository

Apache Tika is a content detection and analysis framework, written in Java, stewarded at the Apache Software Foundation. It detects and extracts metadata and text from over a thousand different file types, and as well as providing a Java library, has server and command-line editions suitable for use from other programming languages.