Analytics

Searching through vast amounts of unstructured data to find the information you need can be tedious and time-consuming. That's where Kendra, Amazon Web Services' (AWS) intelligent search service, comes in.

Kendra uses natural language processing (NLP) and machine learning (ML) to enable users to search through large document sets and find relevant information quickly and easily. In this post, we'll take a deep dive into Kendra to understand what it is, how it works, key features, use cases, and more.

What is Kendra?

Kendra is a highly accurate and easy to use enterprise search service from AWS. It uses NLP and ML techniques to deliver powerful natural language search capabilities across your documents and data sources.

With Kendra, you don't need to know exact keywords or phrases to find relevant content. You can search in natural language as you would speak to another person, and Kendra will understand the context and intent behind your queries to deliver precise results.

Some key capabilities and benefits of Kendra include:

Natural language query understanding: Kendra goes beyond simple keyword matching. It understands the contextual meaning behind queries to return accurate results even for complex questions.

Content ingestion and indexing: Kendra can ingest content from a variety of sources like S3, file shares, databases, Salesforce, ServiceNow and more. It indexes the content automatically so it is searchable.

Customizable relevance tuning: Kendra provides tools like query clauses and synonyms to customize how results are ranked for relevance. This improves precision over time.

Consolidated search experience: Kendra enables you to search across multiple, siloed content repositories through a single search box.

Easy to use: Kendra provides a simple search interface and REST APIs. You don't need expertise in data science or ML to benefit from its intelligence.

Cost-effective: Kendra charges based on the amount of text indexed per month. You only pay for what you use.

In summary, Kendra takes care of the heavy lifting behind natural language search so you can focus on deriving insights from your content. Next, let's look at how Kendra is able to understand search queries and return accurate results.

How Kendra Understands Search Queries

Kendra leverages multiple NLP and ML techniques in combination to deeply understand the context and intent behind natural language search queries. This enables it to return highly relevant results even when a user does not enter the exact keywords.

Kendra architecture (Source: AWS)

The main components involved are:

Indexing

Kendra ingests content from approved data sources and indexes the documents for search. This includes:

  • Crawling and parsing: Extracting raw text, metadata, tables, images etc. from documents.
  • Entity detection: Identifying people, places, dates, quantities, events and more within the text.
  • Document classification: Categorizing documents into different types based on contents.

Kendra builds an inverse index from the extracted text and metadata which allows fast lookups during search.

Query understanding

When a user enters a search query, Kendra analyzes it to determine the intent using:

  • NLP algorithms: Tokenizing text, removing stopwords, stemming words, detecting entities etc.
  • Context inference: Understanding the context of entities within the query to narrow down intent.
  • Query expansions: Adding synonyms and relevant terms to widen the search scope.

Ranking results

Finally, Kendra ranks results by relevance before returning them to the user. This involves:

  • Term-matching: Matching query terms with indexed content.
  • Ranking algorithms: Weighing results based on term frequency, page rank models and more.
  • Personalization: Learning from user behavior and feedback to refine results.

By combining all of these techniques, Kendra is able to dig deep into the meaning behind queries and surface the most appropriate content even if the keywords don't exactly match.

Next, let's look at some common use cases where Kendra can add value.

Common Use Cases for Kendra

Kendra is designed to enhance search experiences across different industries and content repositories. Here are some common scenarios where Kendra can help:

Enterprise document search

Searching through large corporate document sets like PDFs, Word docs, PowerPoints and more is difficult with basic keyword search. Kendra simplifies this by enabling natural language queries across the content. Employees can find the documents they need faster.

Intranet and portal search

Companies often have multiple intranet sites and portals for internal tools and resources. Consolidating search across these siloed sites using Kendra makes it easier for employees to find information.

FAQ bots and chatbots

Kendra can power FAQ chatbots by understanding natural language questions and mapping them to curated answers. This improves self-service options for customers.

eCommerce search

Searching product catalogs, inventory databases, support docs and more is crucial for eCommerce businesses. Kendra can enhance on-site search to improve customer experience.

Media archives

Media organizations have massive archives of news stories, images and videos. Kendra enables journalists to quickly find relevant media assets for their reporting.

The common thread across these uses cases is the need to find 'needles in a haystack' - locating the most relevant content from massive volumes of unstructured data. Kendra solves this efficiently through intelligent search.

Key Features of Kendra

Now that we've seen how Kendra works and potential use cases, let's look at some of its key features:

Natural language query

Kendra supports conversational natural language queries like "Show me all the documents from last quarter related to Company X" or "What are the top FAQs on refunds?". No need to enter exact keywords.

35+ connectors

Ingest content from S3, file systems, SharePoint, Salesforce, ServiceNow, relational databases, news feeds and more. Kendra natively integrates with many data sources.

ML-powered relevance tuning

Use query clauses, synonyms and concept tagging to customize how Kendra ranks results. The ML model continuously learns from your feedback.

Role-based access control

Manage access to searches and data sources through IAM roles. Granular permissions can be set for groups/users.

APIs and SDKs

Kendra provides SDKs for Python, Java, JS, .NET and GoLang along with REST APIs. Easily integrate Kendra into your apps.

On-demand or auto indexing

You can submit individual documents for one-off indexing or set up recurring crawls of data sources like S3. Kendra incrementally indexes updates.

Query suggestions

Kendra suggests queries as you type based on context, popularity and previous searches. This guides users to relevant results faster.

These rich features make it easier to get started with Kendra and customize it to your unique use cases.

Getting Started with Kendra

Kendra is available as a fully managed service within AWS. Let's briefly walk through how to get started:

Sign-up for AWS: Create an AWS account if you don't have one already.

Create an index: An index is where your content is ingested and indexed by Kendra. Console, CLI or SDKs can be used.

Configure data sources: Connectors make it easy to ingest content from S3, SharePoint, databases and other sources.

Tune relevance: Use synonyms, clauses and filters to customize result rankings for your use case.

Search: Interact via console, SDK or REST APIs. No complex deployment needed.

Kendra offers a generous free tier to get started. You pay as you go based on number of text extracts indexed per month.

To dive deeper, refer to the Kendra developer guide. You can also check out AWS blogs like this tutorial on creating a sample index.

Summary

In this post, we took a comprehensive look at Kendra - AWS's intelligent enterprise search service. Key takeaways include:

  • Kendra uses NLP and ML to deliver natural language search capabilities with high accuracy.
  • It understands contextual meaning behind queries to return precise results even without exact keyword matches.
  • Common use cases include document search, portals, FAQs, eCommerce and media archives where finding relevant needles in large haystacks is important.
  • Key features include 35+ connectors, ML-powered relevance tuning, role-based access control, APIs/SDKs and more.
  • Kendra is available as a fully managed AWS service with a generous free tier to get started.

Kendra makes it easy to enhance search experiences across multiple use cases, improving business productivity and customer satisfaction. With its intelligent natural language capabilities, Kendra is undoubtedly a game-changer for enterprise search.

1. What kind of content can I index into Kendra?

Kendra works best for searching unstructured text documents like PDFs, Word, PowerPoint, HTML pages, text files and more. Both textual content as well as metadata like title, author etc. are indexed.

Some key supported formats include:

  • PDF
  • Microsoft Office files like Word, Excel, PowerPoint
  • HTML
  • Text files like TXT, CSV
  • JSON
  • XML

Images and videos can also be indexed by extracting alt-text and subtitles.

2. How do I get data into Kendra?

Kendra provides native connectors for many popular data sources like S3, file systems, SharePoint, Salesforce, Databases and more. You can simply point Kendra to your content repositories and it will automatically crawl, extract text and index the documents.

Kendra also provides APIs to submit documents programmatically one by one for indexing. This can be useful for real-time indexing scenarios.

3. What is an index in Kendra?

An index is where the content being searched is ingested, processed and indexed by Kendra. You can create multiple indexes to silo different sets of documents like by department, geography, content type etc.

Each index has its own:

  • Connected data sources
  • Custom access policies
  • Relevance tuning configurations like synonyms.

You query each index independently. Indexes make it easy to manage large document sets.

4. How do I tune search relevance in Kendra?

Kendra provides several tools to customize how search results are ranked such as:

Synonyms: Map terms to common concepts so singular vs. plural or acronyms don't impact relevance.

Query clauses: Boost results matching specific criteria like date ranges or fields.

Document metadata: Tag docs with custom attributes like location, type etc. to filter results.

Click analytics: Kendra logs queries and result clicks to learn relevance patterns.

Relevance tuning improves over time as Kendra ingests more data and user feedback.

5. How do I calculate the total cost for my use case?

Kendra pricing is based on amount of text indexed per month across all indexes. The free tier provides 1 million text extracts per month.

To calculate cost:

Estimate the number of text extracts per document based on average document size.

Calculate total extracts per month = extracts per doc x number of docs.

Look up the price per extract based on tier.

Total cost = Price per extract x Total extracts per month

Start small and scale up as your content libraries grow.

6. Is Kendra available globally?

Yes, Kendra is available in all public AWS regions except China, GovCloud and Paris. You can deploy indexes in the region closest to your data sources and users for performance.

Kendra builds redundancy automatically within the region to meet high availability requirements without any effort.

7. Can I use Kendra with only new documents going forward?

Yes, absolutely. Kendra has no requirement to bulk index existing historical content.

You can use Kendra only for new documents created or added to your repositories going forward. Kendra will incrementally index the latest content as it becomes available.

This applies to content from messaging queues, databases, API streams and other real-time sources.

8. Does Kendra support multiple languages?

Currently Kendra supports English language queries and content out of the box. Support for Spanish, Japanese, simplified Chinese and Korean is on the roadmap based on Amazon's announcements.

Even with English content, Kendra can understand many non-English entities like product names, company names, locations etc. automatically.

9. Can I use Kendra with my existing frontend?

Yes, Kendra provides APIs and SDKs for integration into any custom frontend like web or mobile apps built with popular frameworks.

You can use the Query API to search indexes and the Indexing API to submit new docs programmatically.

Kendra even provides a JavaScript library to embed search within existing web apps.

10. Is my data secure and private in Kendra?

Absolutely. Kendra has rigorous security protections like:

  • Encryption of data at rest and in transit.
  • Isolation of your data within region.
  • Granular IAM policies and private VPC connectivity.

Kendra is also SOC, ISO and PCI compliant. Data processed by Kendra is only used to provide the service.

So you can confidently use Kendra while meeting privacy requirements.

Rasheed Rabata

Is a solution and ROI-driven CTO, consultant, and system integrator with experience in deploying data integrations, Data Hubs, Master Data Management, Data Quality, and Data Warehousing solutions. He has a passion for solving complex data problems. His career experience showcases his drive to deliver software and timely solutions for business needs.