An old proverb suggests not asking the innkeeper if the whisky is good. Doing so would clearly pose a potential conflict of interest. But today, where skills acquired through practical experience are certified, which can then also take on legal value, the starting point is often a "self-assessment" by the candidate. This serves as a starting point for the candidate and the examining committee to delve deeper into the fundamental aspects of the evaluation and reach a balanced judgment; one free from partisan interests. With this in mind, we decided to ask ChatGPT5 to compare itself to the previous version 4.0, and to include in the evaluation an opinion from a different AI system independent of ChatGPT. The results have been included in the tables that follow and presented in this blog, believing that they can offer a useful preliminary summary and orientation tool, especially for novice scholars, but also for a wider, more knowledgeable audience who wish to delve deeper into the technicalities. In doing so, they can provide colleagues (including myself) with a more informed opinion. OpenAI's language models have made great strides in recent years, improving not only their ability to understand and generate text, but also their speed, accuracy, and ability to handle complex tasks. The tables below compare GPT-4.0 with GPT-5 (the current version), highlighting the main areas of improvement.
Table
1 – Technical features GPT-4.0 vs GPT-5.0
Feature |
ChatGPT 4.0 |
ChatGPT 5.0 |
Architecture |
Graph
Neural Network with transformer-based attention mechanisms |
Same,
with enhanced transformer attention mechanisms |
Context
capacity |
Up to
~25,000 words (~50,000 tokens) |
Expanded
memory up to ~50,000 words (~100,000 tokens) |
Multimodality |
Basic
support for text + images |
Full
support for text, images, audio, and video |
Energy
efficiency |
High
consumption |
Optimized
to reduce usage by up to 30% compared to GPT-4 |
Processing
speed |
Standard
average on complex tasks |
20–30%
faster in standard/complex tasks |
Configurations
& compliance |
General-purpose
model, GDPR/CCPA compliance, standard data protection |
Modular
suite (flagship, mini, nano), GDPR/CCPA compliance, enterprise features |
Main
limitations |
Certain
long-context or specialized tasks may be less efficient |
Resolves
many GPT-4 limits, but requires advanced configuration skills |
Independent
evaluation |
Graph
structure understanding improved but requires specialized training
infrastructure |
Supports
much longer conversations without loss of coherence, better multimodal
integration, lower operational costs, faster responses, enterprise-ready
compliance |
Table
2 – Functional comparison GPT-4.0 vs GPT-5.0
Aspect |
GPT-4.0 |
GPT-5.0 |
Context
understanding |
Good
up to medium-length conversations, tended to “lose the thread” on very long
exchanges |
Better
management of extended context, with less loss of detail after many
interactions |
Response
speed |
Generally
fast, but slowed with complex or long tasks |
Faster
in complex processing and handling large amounts of text |
Reasoning
ability |
Solid
logic, but could fall into “mechanical” steps or less nuanced answers |
More
articulated reasoning, better multi-step inference |
Creativity |
Good
for creative writing and ideas, but sometimes produced more generic output |
Greater
variety and coherence in creative style, better adherence to requested tone |
Ambiguity
handling |
Often
asked for clarifications |
More
ability to propose plausible interpretations without interrupting flow |
Data
accuracy |
Reliable
but with occasional inaccuracies or “hallucinations” |
Improved
error reduction, though verification on critical data still advised |
Data
analysis |
Could
read/comment simple data but limited in spotting complex patterns or
correlations |
Deeper
dataset analysis, identification of trends/anomalies with step-by-step
explanations |
Mathematical
modeling |
Good
with algebra and standard calculations, less reliable with advanced
modeling/optimization |
More accuracy
in solving complex math problems, building models, and explaining reasoning
steps |
Multimodality |
Mainly
text, some implementations supported images |
Native
integration of images, text, and (in some platforms) advanced visual analysis |
Interaction
style |
More
“formal” and less adaptive |
More
natural and flexible style, with ability to adjust tone/complexity per user |
Table
3 – Costs, API and implementation
Aspect |
ChatGPT 4.0 |
ChatGPT 5.0 |
Notes |
Pricing
model |
$0.03/1K
tokens input, $0.06/1K tokens output |
$0.025/1K
tokens input, $0.05/1K tokens output |
GPT-5
reduces costs by ~16–20%, with enterprise discounts |
TCO
(Total Cost of Ownership) |
High
for GPU/TPU resources, licenses, maintenance |
Lower
operational cost; includes provisioning, infrastructure, hardware, cloud mgmt |
|
API
& SDK |
RESTful
endpoint (JSON), Python/JS SDK |
Unified
multimodal endpoint with streaming, SDK extended to audio/video |
Latency
reduced by ~20% |
Documentation
& testing |
Interactive
docs, multimodal examples, sandbox-as-a-service |
Same +
faster test cycles, sector-specific tutorials |
|
Performance
& uptime |
SLA
99.5%, avg latency 100–500 ms |
SLA
99.9%, avg latency 80–300 ms |
Lower
downtime, better throughput |
Support
tiers |
Standard
& Premium enterprise support |
Standard,
Premium & Executive (24/7 support, quarterly architecture consulting) |
|
Use
cases – Finance |
Sentiment
analysis, financial news automation |
Real-time
trading insights, live video stream classification |
|
Use
cases – Healthcare |
Clinical
assistance (EHR), QA on literature |
Higher
multimodal accuracy (~+12%), image + text diagnostics |
|
Use
cases – Education |
Text-based
tutoring, quizzes |
Immersive
content, adaptive learning, emotion recognition |
|
Integration
complexity |
Medium-high |
High
(due to multimodal orchestration) |
Requires
extra skill for optimal setup |
Compliance |
GDPR,
basic audit logging |
GDPR,
HIPAA, PCI-DSS, financial services standards |
|
ROI
& TTM |
ROI in
9–12 months, TTM 3–6 months |
ROI in
6–9 months, TTM 1–3 months |
|
Table
4 – ChatGPT plans
Plan |
Price |
Models & Access |
Usage & Key Limits |
Free |
$0/month |
GPT-5
(standard, mini), GPT-4o (limited), GPT-4.1 mini |
Message
limits, file uploads, data analysis, image generation, limited Deep Research |
Plus |
$20/month |
Full
access to GPT-5, GPT-4.5 preview, o3, o4-mini, o4-mini-high, o1, o1-mini |
Higher
limits for messages/month, data, images, voice/video, GPT agent access |
Pro |
$200/month |
Unlimited GPT-5, o1 pro mode, GPT-4o, o1-mini, o3-pro, chat agent, etc. |
Unlimited
use (policy-bound), up to 120 Deep Research queries/month |
Team |
$25/user/month
(annual), $30/user/month (monthly) |
Same
as Plus/Pro, collaborative workspace, admin controls, enterprise privacy |
Increased
limits vs Plus, team admin & data control |
Enterprise |
Custom
(~$60/user/month) |
All
Team features + higher security, compliance, 24/7 support, SLA, extended
context |
Ideal
for >149 users, custom contracts |
Practical experience suggests that:
1) The free option is certainly an excellent idea for
educating and introducing users to the powerful new tools available, but it
requires long waiting lists punctuated by invitations to upgrade, not only for
commercial reasons, but presumably to recoup the investments made during the
development and implementation of the systems.
2) The length of the texts, as well as the breadth of
the databases used in data analysis, can indeed lead to some
incompleteness/inconsistency issues;
3) Difficulties in translating texts (even very short
ones) into images persist even in version 5.
Beyond this, we can only be grateful and pleased to
have tools that can quantify variables that, once, could only provide a metric
through the development of scales, which were often not objective and in any
case open to question.
Post Scriptum August 17, 2025
Since this post was published, students and graduates of Sapienza University of Rome have reported, by short routes, errors in the GPT 5 chat. Here are some examples (particularly for disciplines such as Geology and Psychology):
1) References to articles with incorrect DOIs, or whose authors are incorrect, or whose content is unrelated to the topic being discussed.
2) Real seismic events are interpreted differently from what is required by current best practices and knowledge.