Hey, I'm Marco and welcome to my newsletter!
As a software engineer, I created this newsletter to share my first-hand knowledge of the development world. Each topic we will explore will provide valuable insights, with the goal of inspiring and helping all of you on your journey.
In this episode I want to bring you the first tutorial, on how to make a system in Node.js that starting from a youtube video link generates a summary using OpenAI's completions api, the same api on which the ChatGPT system is based.
You can download all the code shown directly from my Github repository: https://github.com/marcomoauro/youtube-summarizer
1)🏰 Architecture
The system architecture primarily comprises:
Extracting text from YouTube videos
Generating text summaries
1) 📝 Extracting text from YouTube videos
This process involves extracting text from the video and utilizing it for summary generation. Various options were considered, including:
Employing a paid third-party API, such as Deepgram, to extract text from the video.
Utilizing the Speech-to-Text API from Microsoft Azure, which necessitates the audio file.
Leveraging the Speech-to-Text API from OpenAI, also requiring the audio file.
Scraping YouTube's automatic captioning, available in each video.
The chosen solution involves scraping, which is the most challenging among the options. This decision is motivated by the fact that implementing everything independently incurs no costs associated with third-party APIs for text extraction. Additionally, as a genuine enthusiast, I prefer this approach, given that most of my personal projects are founded on this technique.
If you are interested in finding out what are the best practices for web scraping sign up for the newsletter, I will be publishing a post about it soon.
2) 📄 Generating text summaries
After getting the captions, we put them into OpenAI. The first challenge I faced was the limit on the maximum size of the text that the completions API can handle. This limit depends on the model used; for the 3.5 turbo model, it's specifically set at 4.000 tokens.
To overcome this limitation, I adopted a recursive approach. The text is divided into smaller parts, which are merged into groups and summarized independently; this process is repeated until a single output text is generated, corresponding to the final summary generated from the intermediate summaries.
2)👨💻 Let's get down to practice
For this tutorial you need to have Yarn and Node.js installed, specifically I used the LTS version 20.9.0. If you don't have Node.js on your machine you can install it from the official website.
1) Setup the project
Starting from my workspaces folder I have created the project folder and the npm package:
Keep reading with a 7-day free trial
Subscribe to Implementing to keep reading this post and get 7 days of free access to the full post archives.