Hey, I'm Marco and welcome to my newsletter!
As a software engineer, I created this newsletter to share my first-hand knowledge of the development world. Each topic we will explore will provide valuable insights, with the goal of inspiring and helping all of you on your journey.
In this episode, I will explain how to make a search engine using Elasticsearch. We will use short descriptions of YouTube videos to make an app. This app will let you find summaries of videos using sentences.
At the end of this episode, I'll show you a user interface application that I created. You can use it to see the project in action.
👋 Introduction
In the next parts, I'll talk about the systems I use. I'll show you each step so you can do it yourself for your own projects!
1) Prerequisites
Node.js: For this tutorial you need to have Yarn and Node.js installed, specifically I used the LTS version 20.9.0. If you don't have Node.js on your machine you can install it from the official website.
Heroku: I explained step by step in this episode how to create the account to take advantage of the 1000 hours executions plan:
Elasticsearch: we must have a running instance to realise the project, I will explain later how to have one for free with Heroku.
You can download for free all the code shown directly from my Github repository: https://github.com/marcomoauro/fulltext-search-be
🔎 Elasticsearch
Elasticsearch is a distributed search engine and open source data analysis infrastructure. It is designed to store, analyse and query large amounts of data in real time. It is known for its speed, scalability and flexibility. Elasticsearch is particularly useful for full-text search, log and event analysis applications, and is often used as part of an ELK (Elasticsearch, Logstash, Kibana) architecture.
Elasticsearch works by dividing data into documents, which are JSON objects containing the data to be indexed. These documents are organised into indexes, which can be thought of as logical groups of similar documents. Once the data has been indexed, it can be queried using a variety of search and analysis methods provided by Elasticsearch.
Elasticsearch offers a wide range of functionalities, including:
Full-Text Search: Elasticsearch is known for its ability to perform full-text searches on large datasets efficiently and quickly.
Horizontal Scalability: It is designed to easily scale horizontally, which means it can handle a high volume of queries and data distributed across multiple nodes.
Geographical Search: Elasticsearch offers geographical search functionality, which allows location-based data to be found and analysed.
Log Analysis: Elasticsearch is often used for log analysis, which means it can be used to monitor and analyse large amounts of log data from servers, applications and other systems.
Semantic Search: Elasticsearch can be used to create semantically intelligent search engines that understand the meaning of words and are able to provide more relevant search results.
Aggregation and Analysis: Elasticsearch offers a number of aggregation functionalities that allow statistics to be calculated on indexed data.
In short, Elasticsearch is a powerful data search and analysis engine that allows large datasets to be stored, analysed and queried in real time, offering a range of advanced features for full-text search, log analysis and more.
1) Host a free version with Heroku!
we can use the Bonsai Elasticsearch addon shown in this episode, the free plan allows us to create indexes that can contain a maximum of 35000 documents with a total weight of 125 MB.
We will see in the next section how to add it to the Heroku project.
👨💻 Let's get down to practice
You can download for free all the code shown directly from my Github repository: https://github.com/marcomoauro/fulltext-search-be
Let us now move on to implementation, the steps we need to address are as follows:
Creation of the server based on my backend template
Host Bonsai Elasticsearch add-on
Index creation
Seed with Youtube video summaries
Api development: route, controller and model
Deploy and test api
1) Creation of the server based on my backend template
Let us start with the backend template I made for Node.js and showed in this episode:
Create folders with:
mkdir fulltext-search-be
cd fulltext-search-be
Now copy and paste the template inside, remember to update the name key in the package.json file. The project tree should look like this:
You can safely remove the files controllers/newsletters.js, models/Newsletter.js and the api GET /newsletters/:id defined in router.js, they were introduced in the previous episode, we will no longer need them.
Now creates the .env file by specifying the following environment variables:
MODE=development
NODE_ENV=production
PORT=80
ELASTIC_SEARCH_URL=
ELASTIC_SEARCH_INDEX_SUMMARIES=youtube_summaries
we will define the value of ELASTIC_SEARCH_URL env later.
2) Host Bonsai Elasticsearch add-on
Here's what we'll do: we'll upload the project to GitHub, make a project on Heroku, connect the project through GitHub, and lastly set up the environment variables in the Settings tab. You can find all these steps here:
Once the Heroku project is created, we can add the Bonsai Elasticsearch add-on by specifying the free 'Sandbox' plan.
Once we add it, the BONSAI_URL environment variable will be created automatically. This variable will contain the address we need to connect to the Elasticsearch service. It'll look something like this::
https://and7ek5tnx:n5938dn40k@maple-694125431.eu-west-1.bonsaisearch.net
Let's change the name of the BONSAI_URL environment variable. We'll create a new one called ELASTIC_SEARCH_URL, and give it the same value. Then, we'll add it to the .env file.
ELASTIC_SEARCH_URL=https://and7ek5tnx:n5938dn40k@maple-694125431.eu-west-1.bonsaisearch.net
here's an example value; use your own!
3) Index creation
Next, we'll create a new index called youtube_summaries. This index will hold the summaries of our YouTube videos. We can create it by making a PUT request. You can use tools like Postman, or if you're like me, you can use the terminal command curl.
curl -XPUT 'https://and7ek5tnx:n5938dn40k@maple-694125431.eu-west-1.bonsaisearch.net/youtube_summaries' -H 'Content-Type: application/json' -d '
{
"mappings": {
"properties": {
"video_id": {
"type": "keyword"
},
"video_title": {
"type": "keyword"
},
"video_author": {
"type": "keyword"
},
"language_code": {
"type": "keyword"
},
"summary": {
"type": "text"
}
}
}
}'
To check if everything is set up correctly, you can go to <ELASTIC_SEARCH_URL>/_cat/indices in a web browser. The page should show something like this:
We set up the fields "video_id," "video_title," "video_author," and "language_code" as keyword type. This means that they are stored as exact, unanalyzed strings. They are stored in their original form without being modified or broken down into smaller parts. When we use the keyword type in Elasticsearch, we can perform exact searches on these fields.
We also set up a field called "summary" with type text. This is a type of field that's used to store long pieces of text, like the body of a document or a description. When a field is defined as text in Elasticsearch, it will be split into words or phrases. These will be stored in an index, which lets us search for and retrieve documents quickly and flexibly. We're going to use this field to get summaries based on a sentence you type in.
4) Seed with Youtube video summaries
I added around 1,000 summaries of cryptocurrency YouTube videos to the index. I took them from the daily posts I publish in my Quickview newsletter.
Are you interested in the world of cryptocurrencies but don't have time to stay up-to-date? Sign up now to receive daily video summaries!
You can use this example curl command to make new documents in your new index:
curl -XPOST 'https://and7ek5tnx:n5938dn40k@maple-694125431.eu-west-1.bonsaisearch.net/youtube_summaries/_doc' -H 'Content-Type: application/json' -d '{
"video_id": "<video_id>",
"video_title": "<video_title>",
"video_author": "<video_author>",
"language_code": "en",
"summary": "Lorem ipsum dolor sit amet, consectetur adipiscing..."
}'
and you can retrieve it by calling the /_search api:
curl -XPOST 'https://and7ek5tnx:n5938dn40k@maple-694125431.eu-west-1.bonsaisearch.net/youtube_summaries/_search?size=10000' \
--data '{
"query": {
"match": {
"summary": "consectetur adipiscing"
}
}
}'
5) Api development: route, controller and model
We're going to make a new API. This API will take a text string as input and then use it to look for video summaries in the Elasticsearch index.
First we create a new Summary.js model in the /elasticsearch-models folder, it will directly query the Elasticsearch:
import axios from 'axios';
import { APIError400, APIError404 } from '../errors.js';
import log from '../log.js';
export default class ESSummary {
video_id;
video_title;
video_author;
language_code;
value;
constructor(properties) {
Object.keys(this)
.filter((k) => typeof this[k] !== 'function')
.map((k) => (this[k] = properties[k]));
}
static fromESHit = (hit) => {
const doc = hit._source;
const summary = new ESSummary({
video_id: doc.video_id,
video_title: doc.video_title,
video_author: doc.video_author,
language_code: doc.language_code,
value: doc.summary,
});
return summary;
};
static list = async (search) => {
log.info('Model::ESSummary::list', { search });
let body
if (search) {
body = {
query: {
match: {
summary: search,
},
},
};
} else {
body = {
query: {
match_all: {},
},
};
}
const { hits } = await ESSummary._callSearchEndpoint(body);
const summaries = hits.hits.map(ESSummary.fromESHit);
return summaries;
};
static _callSearchEndpoint = async (body) => {
const { data } = await axios.post(`${process.env.ELASTIC_SEARCH_URL}/${process.env.ELASTIC_SEARCH_INDEX_SUMMARIES}/_search`, body, {
headers: { 'Content-Type': 'application/json' },
});
return data;
};
}
the Summary model offers a method called "list". If you pass in the string "search" as input, it'll find all the summaries whose text matches the words in "search".
Now, let's make a new controller file called summaries.js. It'll be in the /controllers folder. This file will connect the Summary model and the new API. The controller will have a method that takes the search string from the query string and gives it to the model's "list" method:
import log from "../log.js";
import ESSummary from "../elasticsearch-models/Summary.js";
export const searchSummaries = async ({search}) => {
log.info('Controller::Summaries::searchSummaries', {search});
const summaries = await ESSummary.list(search);
return summaries
}
Finally, we need to set up the new API. To do this, we add the following code to the router.js file:
import {searchSummaries} from "./controllers/summaries.js";
router.get('/summaries', routeToFunction(searchSummaries));
You can download for free all the code shown directly from my Github repository: https://github.com/marcomoauro/fulltext-search-be
6) Test new api and deploy
We are done!
We can start the server by launching:
yarn serve:development
We go to http://localhost/summaries?search=bitcoin+halving, if everything went ok you will find in response an array with all summaries whose text matches the searched string.
Push all changes to GitHub, if you have enabled automatic deployment then your application will soon be updated.
this is the link to my backend on Heroku, you can try the API:
https://fulltext-search-be-03ecfe105513.herokuapp.com/summaries?search=bitcoin+etf
https://fulltext-search-be-03ecfe105513.herokuapp.com/summaries?search=cardano+staking
https://fulltext-search-be-03ecfe105513.herokuapp.com/summaries?search=ledger+hardwallet
⭐️ Bonus: Try my application with UI!
I developed a UI through Bubble, a low-code tool for making graphical user interfaces.
You can use it to query my index here: https://fulltext-search-fe.bubbleapps.io/version-test
If you want to know more about Bubble and whether it's useful to use a low-code tool, subscribe to the newsletter. I'll write a post about it soon.
And that’s it for today! If you are finding this newsletter valuable, consider doing any of these:
🍻 Read with your friends — Implementing lives thanks to word of mouth. Share the article with someone who would like it.
📣 Provide your feedback — We welcome your thoughts! Please share your opinions or suggestions for improving the newsletter, your input helps us adapt the content to your tastes.
💬 Chat with me — If you have any doubts or curiosity, please write to me, I will be happy to answer you!
I wish you a great day! ☀️
Marco