How to Install Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS

Cpanel/Whm License $3/mo Plesk License $10/mo Cloudlinux License $5/mo

(: December 30, 2018)

How can I install Apache Tika 1.20 on Ubuntu 18.04 / Ubuntu 16.04?. Apache Tika is an Open source toolkit that detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). Tika is very useful for search engine indexing, content analysis, translation e.t.c.

What is new in Apache Tika 1.20

  • Upgrade to POI 4.0.1
  • Upgrade to PDFBox 2.0.13
  • Integrate/parameterize new angles handling in
    PDFBox
  • Prevent content within <style> and <script/> elements to be written in the ToTextContentHandle
  • Switch child to parent communication to a shared memory-mapped file in tika-server’s – spawnChild mode
  • Bulk upgrade of dependencies
  • Upgrade jaxb-runtime and javax.activation
  • Improve language id efficiency in tika-eval
  • Remove duplication of notes in PPT slides
  • Upgrade sqlite “provided” dependency to 3.25.2

In this post, we will discuss the installation of Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS.

Apache Tika dependencies

What you need to build and install Apache Tika on Ubuntu 18.04 / Ubuntu 16.04 LTS are:

.td_uid_2_5d90789f73cf9_rand.td-a-rec-img{text-align:left}.td_uid_2_5d90789f73cf9_rand.td-a-rec-img img{margin:0 auto 0 0}
  • Java Runtime Environment (JRE)
  • Apache Maven

We will install these dependencies before we can download and install Tika on Ubuntu 18.04 / Ubuntu 16.04.

Step 1: Update your Ubuntu system

Start by ensuring you’re running an updated Ubuntu Desktop / Server.

sudo apt update
sudo apt -y upgrade
sudo apt -y intall wget curl vim

Step 2: Install Java on Ubuntu 18.04 / Ubuntu 16.04

As from Tika 1.19, build from Java 11 is supported. You can install Java 11 on Ubuntu 18.04 / Ubuntu 16.04 LTS using our previous guide below.

How to Install Java 11 on Ubuntu 18.04 /16.04 / Debian 9

For Java 8, install it using commands below

sudo add-apt-repository ppa:webupd8team/java
sudo apt update
sudo apt install oracle-java8-set-default

Confirm installed version of Java:

$ java --version
java 11.0.1 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)

Step 3: Install Apache Maven

Install Apache Maven by following our guide:

Install Latest Apache Maven on Ubuntu 18.04 /16.04 / Debian 9

Step 4: Download and Install Apache Tika

Download latest Apache Tika from the Downloads page.

export VER="1.20"
wget https://archive.apache.org/dist/tika/tika-${VER}-src.zip

Unzip the downloaded file.

unzip tika-${VER}-src.zip

Change to new folder and run mvn install

cd tika-${VER}
mvn install

Sample output.

Wait for the installation to finish then test Tika within its base directory.

Reference:

http://tika.apache.org/1.20/gettingstarted.html
.td_uid_4_5d90789f73f79_rand.td-a-rec-img{text-align:left}.td_uid_4_5d90789f73f79_rand.td-a-rec-img img{margin:0 auto 0 0}

Related posts

Thai Government Agency Develops Blockchain Tech for Elections Voting

SXI ADMIN

Ubuntu Linux install OpenSSH server

SXI ADMIN

The Father of the ICO Is All About Identity Now

SXI ADMIN

Bitcoin in the Headlines: Blockchain Drumbeat Grows Louder

SXI ADMIN

Japan’s SBI Holdings Is Gearing Up to Mine Bitcoin

SXI ADMIN

Organizing Your Sites in Managed WordPress Portal

SXI ADMIN

Bitcoin Fights Back, But Too Early to Call Bull Reversal

SXI ADMIN

Coinbase Launches Bitcoin Exchange in the UK

SXI ADMIN

Report: South Korea Could Decide This Week on Crypto Exchange Regulation

SXI ADMIN

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More