NOTICE:

In this newsletter, I want to share about the DEEPSEEK model’s specification and what is the method they used that helped them succeed in this AI race for perfection I believe this newsletter not only teaches about the DEEPSEEK but also some of the concepts in ML AND AI. If you don’t have time for this you can skip or delete this or even UNSUBSCRIBE THE NEWSLETTER.

But if it is what you need to know or want to stay updated on the AI technology that is used around then this is your catch.

INTRODUCTION:

We all know that the open-source AI model named “DEEPSEEK” made a significant impact on our world. We can see that the open-source (LLM) was launched on 20th JANUARY 2025 and it was created by a company called Hangzhou Deepseek Artificial Intelligence Basic Technology Research Co. Ltd which is located in China, Hangzhou.

EFFECT ON THE WORLD:

We all know the economic events that took place within the week the “DEEPSEEK” was launched. The Nvidia stock price went down and France’s AI summit. This indicates the power of AI not only in the SOFTWARE side but also in the economic forum.

BUT HERE WE ARE NOT GOING TO LEARN ABOUT THE ECONOMIC EVENTS. BUT WE ARE GOING TO DIVE DEEP INTO THE DEEPSEEK MODEL’S SPECIFICATION AND THE METHODS THEY USED AND ALL THE ML STUFF WILL BE DISCUSSED IN HERE IN A SIMPLE BUT EFFECTIVE WAY.

IF YOU ARE STILL CONTINUING THEN YOU ARE GIVING ME THE TIME I NEED AND I ALSO GIVE MY BEST TO YOU.

LET’S GET STARTED.

DEEPSEEK-R1

BASIC KNOWLEDGE:

Before we start to understand the DEEPSEEK-R1 which is an LLM - LARGE LANGUAGE MODEL, we should know how it works usually, The LLMs learn from SUPERVISED LEARNING AND The learning process of the LLMs costs up to millions even tens of millions and after learning they will answer to the query that we give.

IF YOU ARE NOT AWARE OF SUPERVISED LEARNING, NO WORRIES I CAN HELP.

SUPERVISED LEARNING:

This means making our LLMs learn like humans. Where we will show the LLMs a picture of an apple (input) and teach (train the model) that the picture is called the apple. Just like that we should give the input for everything in this world and train the model.

Now we know how the LLMs usually work. Then we can jump into DEEPSEEK-R1.

BACKGROUND:

The DEEPSEEK-R1 is not the only model that the company built it also built other models like DeepSeek Coder, DeepSeek LLM, Deepseek-V2, DeepSeek-code-V2, Deepseek-V3, DeepSeek-R1, DeepSeek-R1-Zero and the new model Janus-pro-7B.

But the star of the show today is DEEPSEEK-R1-ZERO AND DEEPSEEK-R1.

WHAT IS THE DIFFERENCE:

We saw that the other models use the SUPERVISED LEARNING method for training the model but the DEEPSEEK-R1-ZERO is trained only by the REINFORCEMENT LEARNING and the DEEPSEEK-R1 is cold started with little SUPERVISED LEARNING and after that the model is also used REINFORCEMENT LEARNING for its training. This helped the company make the model in a cost-efficient way. I believe the spend of making DeepSeek is about $6 million whereas its rivalry open AI spend was $100 million.

DIFFERENCES BETWEEN DEEP SEEK AND OPEN AI

Now we know the basic differences. We can move towards the TECHNICAL STUFF:

SCENARIO:

Before DeepSeek was launched the main problem was creating a Full training pipeline based on POST-TRAINING for the LLMs that used the SUPERVISED LEARNING because it was not cost-effective and requires more computational power. So the DeepSeek people used REINFORCEMENT LEARNING for training the model. This method of training requires significantly less Computational power compared to the Other LLM’s training method.

REINFORCEMENT LEARNING:

Instead of giving the model all the knowledge needed before the work or the training takes place here, we will make the model learn on its own by trial and error. To be precise when the model does work correctly we will give a reward. This will help the model to learn this is right and when the model doesn’t do its work properly then we will give punishment or no reward thus the model will learn this is wrong.

REINFORCEMENT LEARNING

In DeepSeek they used the GRPO as the RL(REINFORCEMENT LEARNING) framework. This GRPO:- GROUP BASED POLICY OPTIMIZATION. Which is a powerful RL technique This helped The Deepseek-R1-Zero model show significant development in the Reasoning Benchmark by scoring 86.7% in AIME 2024 which is a math competition that needs the model to use logical thinking, to be honest, the AIME is pretty tough. But the DeepSeek managed to score 86.7% Which made it on board with other LLMs like Gemini and Open AI

OKAY WE CAME TO THE END.

Let’s review what we Learned:

  1. BASIC INFO OF THE COMPANY BEHIND DeepSeek.

  2. What is DeepSeek and what’s the difference from other LLMs in the world.

  3. What is meant by supervised and Reinforcement learning in a simple way.

I LEARNED A LOT WHILE WRITING THIS EMAIL. I ALSO BELIEVE YOU ALSO FOUND SOME VALUE IN IT.

YOUR FEEDBACK IS HEARTILY WELCOMED

THANK YOU FOR SPENDING YOUR TIME WITH ME…..

Reply

Avatar

or to participate

Keep Reading