OpenAI o1 地表最强模型发布,10 分钟博士级 AI 实地对比测试 | 回到Axton
0:00
Hello
0:01
Welcome back to Axton
0:02
OpenAI has just released the most powerful AI model ever
0:05
OpenAI o1
0:07
The new model has reached the level of a human PhD student
0:10
Why is the new model called o1?
0:14
Why not call it GPT-5?
0:15
This is because it has exceptional ability in complex reasoning tasks
0:20
OpenAI said
0:21
This represents a new level of AI capability
0:25
A new starting point
0:26
From the very beginning
0:27
So, this series is named o1
0:30
Currently, all ChatGPT Plus or Team users
0:35
have already received updates
0:36
Here, we can choose the new model
0:39
But let's not rush
0:41
Let's first take a look at the official website's introduction to o1
0:44
Then we'll use a real-world problem to compare
0:47
the capabilities of o1 and GPT-4o
0:50
This is OpenAI's official website introduction
0:53
It's now called OpenAI o1 preview
0:55
It started being provided on September 12th
0:58
Right now as I'm recording this video
0:59
It has been 100% rolled out to
1:01
ChatGPT Plus and Team users
1:04
Okay, let's look at some of the key points
1:07
First, it's a preview version
1:08
It will have regular updates and improvements in the future
1:11
And the next version is under development
1:13
Okay, the key is how they work.
1:15
When training these new models,
1:18
it's about making them spend more time like humans,
1:20
carefully thinking about problems.
1:22
Then they learn to try different strategies to solve problems,
1:25
and recognize their mistakes.
1:27
This model, in particular, in a few areas,
1:29
physics, chemistry, and biology, these benchmark tasks,
1:32
its performance is already similar to that of a PhD student.
1:35
That's remarkable.
1:36
I remember it used to be at the high school level.
1:39
It also excels in math and coding.
1:43
In the entrance exam for the International Mathematical Olympiad,
1:46
GPT-4o, the model we're using now,
1:49
correctly solved 13% of the problems.
1:52
This new o1 model got 83%.
1:55
That's almost a seven or eight-fold improvement.
1:58
Also, its programming ability has improved significantly.
2:01
Its coding ability in Codeforces competitions has reached the 89th percentile.
2:05
Meaning it surpasses 89% of humans.
2:08
This is essentially at an expert, master level.
2:12
Of course, it's currently an early model preview.
2:15
It doesn't have some other additional features yet.
2:18
Like browsing the internet.
2:19
Uploading files and images.
2:21
So for now, in common cases,
2:23
GPT-4o is still more widely used.
2:27
But for complex reasoning tasks,
2:29
this is a pretty significant advance.
2:31
This is what we just mentioned.
2:33
Why did they name this series OpenAI o1?
2:37
It's because OpenAI believes
2:38
This represents a new level of AI capabilities.
2:41
A new starting point.
2:42
Okay, who is this new model for?
2:44
These enhanced reasoning abilities
2:46
are particularly useful in the following areas:
2:48
When dealing with complex problems in fields like science, coding, mathematics, and similar areas.
2:51
They are especially useful.
2:52
For example, here are many examples.
2:54
We can take a quick look.
2:55
This segment shows its cognitive abilities.
2:57
Its ability to analyze emotions.
2:59
This video tests o1's capabilities in economics.
3:03
It asks under what conditions
3:05
tariffs can improve domestic trade conditions.
3:07
This in turn increases domestic welfare, and so on.
3:10
This is a demonstration of research in genetics and heredity.
3:13
Some information about citrates.
3:15
This video shows o1's applications in quantum physics.
3:18
It asks a quantum physics related question.
3:21
Then o1 provides a detailed mathematical solution and derivation.
3:24
And it's correct.
3:25
So, these examples demonstrate applications in coding
3:27
economics, genetics, quantum physics, and so on.
3:32
Those interested can watch each one.
3:35
Each video is about two to three minutes long.
3:37
That's because the o1 series excels in coding.
3:40
To provide developers with more efficient solutions
3:43
OpenAI has also released a mini module.
3:46
o1 mini
3:48
A faster and cheaper model
3:50
Its main feature is that it is very effective in coding.
3:53
This is still very interesting.
3:54
As a smaller model
3:56
its price is 80% cheaper.
3:58
So this is very suitable for coding tasks.
4:01
that require inference.
4:01
but don't require a wide range of knowledge.
4:04
So this should be a boon for programmers.
4:06
How do you use OpenAI's o1 model?
4:09
ChatGPT team users can already use it today.
4:13
Let's take a look.
4:14
OK.
4:15
This is my ChatGPT.
4:16
After coming in
4:17
We can see in this model list
4:19
besides the previous GPT-4O.
4:21
There's also o1 Preview here.
4:23
That is, the preview version of o1.
4:24
And o1 mini.
4:26
We now have both of these models.
4:28
And the previous GPT-4O mini and GPT-4
4:31
are all in this 'more models' section.
4:34
in the next menu.
4:35
So if you want to use the o1 Preview model
4:38
you can just select it in the conversation.
4:41
We'll do a test comparison later.
4:43
Let's finish reading this article first.
4:45
So its current limitations
4:47
The weekly rate for o1 Preview is 30 messages.
4:51
That's 30 messages per week.
4:52
So, you still need to be careful about how you use it.
4:55
o1 mini offers 50 messages.
4:57
That's not a lot either.
4:58
I hope they can increase the rate without raising prices.
5:00
I hope they can raise the rate as soon as possible.
5:03
The new model can also be accessed through API.
5:06
But you need API level 5 to do so.
5:08
API level 5 has a certain threshold.
5:12
I'm only at level 3 right now.
5:14
So I can't use the API for these two models right now.
5:16
The API is limited to 20 requests per minute.
5:20
And these APIs don't include:
5:22
function calls, streaming, system message support, and other features.
5:26
It's still a fairly basic API.
5:28
They also plan to allow:
5:30
free ChatGPT users to access o1 mini.
5:33
That seems pretty generous.
5:36
Besides these model updates,
5:39
they will also add web browsing, file uploads, images,
5:42
and other features to make it more practical.
5:44
They're trying to get closer to the GPT-4 model.
5:47
They will also continue to develop and release models in the GPT series.
5:50
Besides OpenAI's o1 series,
5:52
that means the current GPT-4 series will likely continue to be developed.
5:56
Will there be GPT-5, GPT-6, GPT-7, GPT-8?
5:59
I don't know.
6:00
Next are more examples.
6:03
Coding.
6:04
It also has reasoning abilities.
6:05
This is a demonstration of decrypting fragmented Korean text.
6:09
Math.
6:10
Then logic puzzles.
6:12
This is also reasoning.
6:13
Coding.
6:14
Writing puzzles.
6:16
This is a Snake game.
6:18
Okay.
6:19
Let's pick one or two of these examples.
6:22
To test them in practice.
6:24
Compare the new model and GPT-4O to see where the difference lies.
6:29
Let's first look at its decryption of Korean.
6:32
He gave an incomplete Korean sentence.
6:36
Of course, I don't understand Korean.
6:37
So I can't tell what it is.
6:38
Let's see if we can copy it.
6:40
Hopefully the recognition won't be wrong.
6:42
Let's first look at the comparison.
6:47
Okay.
6:48
It basically combines Korean words.
6:50
Like a cipher.
6:52
If you understand Korean.
6:53
You should understand what this passage is saying.
6:56
But for AI models.
6:57
It will consider it an incorrect sentence.
7:00
It won't recognize it.
7:01
So, during testing, 4O will refuse to translate.
7:05
But o1, through step-by-step reasoning,
7:08
will finally translate this Korean text into correct English.
7:11
Let's just compare them.
7:13
Let's have it translated into Chinese.
7:15
Let's try it.
7:16
Okay.
7:16
Let's see if GPT-4O can do it first.
7:19
But there's no problem.
7:20
GPT-4O can also translate correctly.
7:23
Even some expressions that are easy for Koreans to understand,
7:25
may not be able to be expressed smoothly when translated directly into other languages.
7:29
In languages with different pronunciation rules,
7:30
certain expressions can be confusing.
7:32
This can lead to misinterpretations of the original intention.
7:34
Because the original text may contain spelling or language mixing errors,
7:37
some adjustments were made during translation
7:38
to make the sentences more fluent.
7:40
Okay.
7:40
So let's compare the new model.
7:43
Let's put them side by side.
7:44
Okay.
7:44
Here we choose o1 Preview.
7:46
Similarly,
7:47
please translate the following into Chinese.
7:49
Start thinking.
7:50
This is the result after it thought for 12 seconds.
7:53
First, it will decrypt it.
7:55
There's a mystery in Korean,
7:56
an encryption method that cannot be unlocked by ordinary means.
7:59
But Koreans can easily decode it with their language skills.
8:02
Then it converts it.
8:04
I'm thinking about all the reasons and the consonant conversions.
8:07
How can the perceived meaning of the original text be changed?
8:09
To make it appear
8:10
visually different.
8:11
And then finally start decrypting.
8:13
I'm very interested in encrypting Korean text.
8:15
Combine consonants and vowels together.
8:17
This allows Koreans to read
8:18
while making it unreadable for others.
8:20
This method creates a blend of recognition and obfuscation.
8:23
Okay.
8:24
So, the final result is
8:26
that there's a type of text on Earth that no translation tool can translate
8:29
but Koreans can easily recognize.
8:31
Korean text encryption method
8:33
Through various transformations of vowels and consonants
8:35
it makes people, when paying attention,
8:36
see it as visually different.
8:39
This method can make the original text very confusing.
8:42
So, GPT-4's translation
8:43
even though some expressions are easily understood by Koreans,
8:46
it might not be able to express them smoothly when translated into other languages.
8:50
In languages with different pronunciation rules,
8:51
certain expressions can be confusing.
8:54
This can lead to misinterpretation of the original intent.
8:56
Also, because I don't understand Korean at all,
8:57
I don't know which of the two is more accurate.
9:01
How big the difference is.
9:02
But at least it seems like 4o
9:04
didn't consider it a completely unreadable text.
9:07
Okay, next let's look at the comparison of the encoding.
9:10
Since you say o1's coding is very powerful
9:12
Let's take a look at his coding
9:13
Let's take a look at this example
9:15
(Music)
9:20
This is what he asked o1 to code
9:23
Let's translate it into Chinese first
9:25
Using HTML and JavaScript to write
9:27
Transformer's word attention
9:29
Interactive visualization code
9:30
Don't use any libraries
9:32
Use the example sentence
9:33
The quick brown fox
9:35
This is the English sentence
9:37
The quick brown fox
9:39
When hovering over a token
9:41
Visualize the edge with its thickness proportional to the attention score
9:45
The edge should be curved
9:47
It shouldn't overlap
9:48
Make sure the edge starts and ends at the center of each token
9:51
When clicking on a token, display the value of the attention score
9:55
Visualize it beautifully in a LaTeX-rendered vector notation
9:58
Put each token next to the score
10:00
Ensure good LaTeX rendering for it
10:03
Remove the attention score vector when clicking again
10:06
There's a 50-pixel vertical margin at the top
10:09
Okay, let's take a look at his demo results
10:11
This is a piece of code that was written
10:13
Then we directly look at how it runs
10:16
Then it's directly an HTML file
10:18
After writing it, let it run
10:20
Let's see the effect.
10:22
OK, this is the effect.
10:24
When the mouse hovers over different words,
10:26
it will also display a marker.
10:28
OK, let's try out the effect of this code.
10:31
Let's compare them.
10:32
First, let's see if GPT-4o can handle it.
10:34
Similarly, on the left is GPT-4o.
10:36
On the right is GPT-o1.
10:38
Just copy the English requirements from before.
10:40
This is its real speed, no acceleration.
10:42
We don't need to skip it either.
10:43
We can just compare the speed difference.
10:45
It's a long piece of code.
10:46
Let's copy it out.
10:49
Okay, we'll do it in an empty directory.
10:50
In this empty directory, Test-01-Coding.
10:52
Create a file.
10:54
Okay, that was the GPT-4o code.
10:56
Let's write in the GPT-4o code first.
11:00
GPT-4o.
11:01
Okay, copy it here.
11:07
Save.
11:08
Open.
11:10
The words are arranged here.
11:11
Clicking it does have a response.
11:12
But the response went over there.
11:14
Did you all see that?
11:15
It's a thick and thin line.
11:17
The score didn't come out.
11:19
This is the direct execution result of the code written by GPT-4o.
11:22
Let's try it with the new GPT o1.
11:25
Give it the same prompt.
11:27
Now it starts thinking.
11:29
It will tell you the entire thinking process step-by-step.
11:32
So it's a deductive model for calculating positions.
11:36
Of course, we can also open it to see what it's doing.
11:38
It will show its entire reasoning process.
11:42
It will also test the code itself.
11:43
OK, it's figured it out.
11:45
It wrote it pretty quickly.
11:47
It's finished.
11:48
Let's copy it.
11:49
Okay, this is o1's.
11:51
Let's open o1 again.
11:59
Unfortunately...
11:59
It's not running correctly either.
12:01
It's just giving an error.
12:02
It flashed and then stopped working.
12:04
That's a bit of a flop.
12:06
That's a bit too bad.
12:08
Let's check our prompt for any errors.
12:10
We don't see any issues with the text itself.
12:13
It's just that the line breaks are slightly different.
12:14
Let's open a new conversation window to test and compare.
12:19
Let's try again.
12:21
Okay, we'll still add two.
12:23
Here's o1's new conversation window.
12:26
We'll also open a new window for GPT-4o.
12:28
We'll adjust this slightly compared to what we saw earlier.
12:31
This is a line break, which is slightly different from the original text.
12:36
Then there's a space or a hyphen here.
12:39
This should be correct.
12:40
That's the prompt.
12:42
Let's try again.
12:44
First, give it to GPT-4O.
12:45
Then at the same time, I'll give it to o1.
12:48
Both GPT-4O and o1 have finished together.
12:51
o1 has also finished writing.
12:53
The speed difference isn't that big.
12:55
It's not particularly slow.
12:56
Similarly, we copy the code to create a new file.
12:59
That means we create a file called GPT-4o1.html.
13:05
Then we copy this over.
13:15
Copy o1 over.
13:17
We'll call it o11.html.
13:21
Okay, two new ones.
13:23
Just to be safe.
13:24
This time, we'll open it with the Chrome browser.
13:26
Chrome browser.
13:27
This is the effect of GPT-4O.
13:30
When you click, Latex will appear.
13:33
But it's not rendered.
13:34
This curve is fixed.
13:37
Okay, let's open o1's result.
13:39
This time, it's much better.
13:42
This time, the effect is really good.
13:43
It can run normally.
13:45
There are also thickness variations.
13:46
It's just that the lines are on top.
13:48
Clicking 'like' is good, and the data is correct.
13:51
The data is also very good.
13:52
At least it wasn't a complete flop.
13:53
o1's coding skills are still strong.
13:56
Let's try the new prompt with Claude again.
13:59
Let's see if it's any better than before.
14:02
Claude's results this time are also better than before.
14:04
And the results are clearly better than GPT-4O.
14:08
Although it's not perfectly aligned.
14:10
But the meaning is clear.
14:12
The click data is also there.
14:13
So Claude's level is still good.
14:16
So, that's a simple comparison of OpenAI's latest o1 model and GPT-4O.
14:19
From a coding perspective, it's still much stronger.
14:21
Okay, now let's talk about an interesting topic.
14:24
We know that most AI models currently have a subscription fee.
14:27
Their prices are usually around $20 per month.
14:30
This price isn't very cheap.
14:31
But to be honest, many people can still accept it.
14:34
They can afford it.
14:36
But a few days ago, we heard rumors.
14:39
They said OpenAI's new model might cost as much as $1000 per month.
14:40
Later, there were also rumors of $200 per month.
14:43
Fortunately, the newly released model didn't raise its price.
14:46
We can still afford it.
14:49
So the question arises.
14:52
It didn't increase in price.
14:54
We can still use it.
14:56
So, the question is...
14:57
...
14:58
Will there be super powerful AI models in the future?
15:01
But they are also super expensive.
15:03
Only the wealthy can afford them.
15:05
The answer is yes.
15:07
So for now, at least, everyone is using more or less the same AI tools.
15:11
We can say that we are still on a relatively equal starting line.
15:15
But this situation won't last forever.
15:18
Therefore, now is the perfect time for us to seize the opportunity.
15:21
While these powerful AI tools are still within our reach,
15:24
and we can still afford them,
15:26
we should diligently learn how to use them.
15:29
How can we integrate them into our work and lives?
15:32
This is why I have to emphasize
15:34
the importance of learning core AI skills.
15:36
We need to learn how to use these AI tools,
15:40
but more importantly, we need to learn how to think about AI.
15:43
How can we use AI to enhance our own abilities?
15:46
Enter the website axtonliu.ai
15:48
Visit the AI Elite Academy to learn about my two courses.
15:52
This won't take much of your time,
15:54
but it might open a new door for you.