How to Install Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS

Cpanel/Whm License $3/mo Plesk License $10/mo Cloudlinux License $5/mo

(: December 30, 2018)

How can I install Apache Tika 1.20 on Ubuntu 18.04 / Ubuntu 16.04?. Apache Tika is an Open source toolkit that detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). Tika is very useful for search engine indexing, content analysis, translation e.t.c.

What is new in Apache Tika 1.20

  • Upgrade to POI 4.0.1
  • Upgrade to PDFBox 2.0.13
  • Integrate/parameterize new angles handling in
  • Prevent content within <style> and <script/> elements to be written in the ToTextContentHandle
  • Switch child to parent communication to a shared memory-mapped file in tika-server’s – spawnChild mode
  • Bulk upgrade of dependencies
  • Upgrade jaxb-runtime and javax.activation
  • Improve language id efficiency in tika-eval
  • Remove duplication of notes in PPT slides
  • Upgrade sqlite “provided” dependency to 3.25.2

In this post, we will discuss the installation of Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS.

Apache Tika dependencies

What you need to build and install Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS are:{text-align:left} img{margin:0 auto 0 0}
  • Java Runtime Environment (JRE)
  • Apache Maven

We will install these dependencies before we can download and install Tika on Ubuntu 18.04 / Ubuntu 16.04.

Step 1: Update your Ubuntu system

Start by ensuring you’re running an updated Ubuntu Desktop / Server.

sudo apt update
sudo apt -y upgrade
sudo apt -y intall wget curl vim

Step 2: Install Java on Ubuntu 18.04 / Ubuntu 16.04

As from Tika 1.19, build from Java 11 is supported. You can install Java 11 on Ubuntu 18.04 / Ubuntu 16.04 LTS using our previous guide below.

How to Install Java 11 on Ubuntu 18.04 /16.04 / Debian 9

For Java 8, install it using commands below

sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-set-default

Confirm installed version of Java:

$ java --version
java 11.0.1 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)

Step 3: Install Apache Maven

Install Apache Maven by following our guide:

Install Latest Apache Maven on Ubuntu 18.04 /16.04 / Debian 9

Step 4: Download and Install Apache Tika

Download latest Apache Tika from the Downloads page.

export VER="1.20"

Unzip the downloaded file.

unzip tika-${VER}

Change to new folder and run mvn install

cd tika-${VER}
mvn install

Sample output.

Wait for the installation to finish then test Tika within its base directory.

Reference:{text-align:left} img{margin:0 auto 0 0}

Related posts

Thai Government Agency Develops Blockchain Tech for Elections Voting


Ubuntu Linux install OpenSSH server


The Father of the ICO Is All About Identity Now


Bitcoin in the Headlines: Blockchain Drumbeat Grows Louder


Japan’s SBI Holdings Is Gearing Up to Mine Bitcoin


Organizing Your Sites in Managed WordPress Portal


Bitcoin Fights Back, But Too Early to Call Bull Reversal


Coinbase Launches Bitcoin Exchange in the UK


Report: South Korea Could Decide This Week on Crypto Exchange Regulation


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More