An old proverb suggests not asking the innkeeper if
the whisky is good. Doing so would clearly pose a potential conflict of
interest. But today, where skills acquired through practical experience are
certified, which can then also take on legal value, the starting point is often
a "self-assessment" by the candidate. This serves as a starting point
for the candidate and the examining committee to delve deeper into the
fundamental aspects of the evaluation and reach a balanced judgment; one free
from partisan interests. With this in mind, we decided to ask ChatGPT5 to
compare itself to the previous version 4.0, and to include in the evaluation an
opinion from a different AI system independent of ChatGPT. The results have
been included in the tables that follow and presented in this blog, believing
that they can offer a useful preliminary summary and orientation tool,
especially for novice scholars, but also for a wider, more knowledgeable
audience who wish to delve deeper into the technicalities. In doing so, they
can provide colleagues (including myself) with a more informed opinion. OpenAI's
language models have made great strides in recent years, improving not only
their ability to understand and generate text, but also their speed, accuracy,
and ability to handle complex tasks. The tables below compare GPT-4.0 with GPT-5 (the
current version), highlighting the main areas of improvement.
Table
1 – Technical features GPT-4.0 vs GPT-5.0
Feature
|
ChatGPT 4.0
|
ChatGPT 5.0
|
Architecture
|
Graph
Neural Network with transformer-based attention mechanisms
|
Same,
with enhanced transformer attention mechanisms
|
Context
capacity
|
Up to
~25,000 words (~50,000 tokens)
|
Expanded
memory up to ~50,000 words (~100,000 tokens)
|
Multimodality
|
Basic
support for text + images
|
Full
support for text, images, audio, and video
|
Energy
efficiency
|
High
consumption
|
Optimized
to reduce usage by up to 30% compared to GPT-4
|
Processing
speed
|
Standard
average on complex tasks
|
20–30%
faster in standard/complex tasks
|
Configurations
& compliance
|
General-purpose
model, GDPR/CCPA compliance, standard data protection
|
Modular
suite (flagship, mini, nano), GDPR/CCPA compliance, enterprise features
|
Main
limitations
|
Certain
long-context or specialized tasks may be less efficient
|
Resolves
many GPT-4 limits, but requires advanced configuration skills
|
Independent
evaluation
|
Graph
structure understanding improved but requires specialized training
infrastructure
|
Supports
much longer conversations without loss of coherence, better multimodal
integration, lower operational costs, faster responses, enterprise-ready
compliance
|
Table
2 – Functional comparison GPT-4.0 vs GPT-5.0
Aspect
|
GPT-4.0
|
GPT-5.0
|
Context
understanding
|
Good
up to medium-length conversations, tended to “lose the thread” on very long
exchanges
|
Better
management of extended context, with less loss of detail after many
interactions
|
Response
speed
|
Generally
fast, but slowed with complex or long tasks
|
Faster
in complex processing and handling large amounts of text
|
Reasoning
ability
|
Solid
logic, but could fall into “mechanical” steps or less nuanced answers
|
More
articulated reasoning, better multi-step inference
|
Creativity
|
Good
for creative writing and ideas, but sometimes produced more generic output
|
Greater
variety and coherence in creative style, better adherence to requested tone
|
Ambiguity
handling
|
Often
asked for clarifications
|
More
ability to propose plausible interpretations without interrupting flow
|
Data
accuracy
|
Reliable
but with occasional inaccuracies or “hallucinations”
|
Improved
error reduction, though verification on critical data still advised
|
Data
analysis
|
Could
read/comment simple data but limited in spotting complex patterns or
correlations
|
Deeper
dataset analysis, identification of trends/anomalies with step-by-step
explanations
|
Mathematical
modeling
|
Good
with algebra and standard calculations, less reliable with advanced
modeling/optimization
|
More accuracy
in solving complex math problems, building models, and explaining reasoning
steps
|
Multimodality
|
Mainly
text, some implementations supported images
|
Native
integration of images, text, and (in some platforms) advanced visual analysis
|
Interaction
style
|
More
“formal” and less adaptive
|
More
natural and flexible style, with ability to adjust tone/complexity per user
|
Table
3 – Costs, API and implementation
Aspect
|
ChatGPT 4.0
|
ChatGPT 5.0
|
Notes
|
Pricing
model
|
$0.03/1K
tokens input, $0.06/1K tokens output
|
$0.025/1K
tokens input, $0.05/1K tokens output
|
GPT-5
reduces costs by ~16–20%, with enterprise discounts
|
TCO
(Total Cost of Ownership)
|
High
for GPU/TPU resources, licenses, maintenance
|
Lower
operational cost; includes provisioning, infrastructure, hardware, cloud mgmt
|
|
API
& SDK
|
RESTful
endpoint (JSON), Python/JS SDK
|
Unified
multimodal endpoint with streaming, SDK extended to audio/video
|
Latency
reduced by ~20%
|
Documentation
& testing
|
Interactive
docs, multimodal examples, sandbox-as-a-service
|
Same +
faster test cycles, sector-specific tutorials
|
|
Performance
& uptime
|
SLA
99.5%, avg latency 100–500 ms
|
SLA
99.9%, avg latency 80–300 ms
|
Lower
downtime, better throughput
|
Support
tiers
|
Standard
& Premium enterprise support
|
Standard,
Premium & Executive (24/7 support, quarterly architecture consulting)
|
|
Use
cases – Finance
|
Sentiment
analysis, financial news automation
|
Real-time
trading insights, live video stream classification
|
|
Use
cases – Healthcare
|
Clinical
assistance (EHR), QA on literature
|
Higher
multimodal accuracy (~+12%), image + text diagnostics
|
|
Use
cases – Education
|
Text-based
tutoring, quizzes
|
Immersive
content, adaptive learning, emotion recognition
|
|
Integration
complexity
|
Medium-high
|
High
(due to multimodal orchestration)
|
Requires
extra skill for optimal setup
|
Compliance
|
GDPR,
basic audit logging
|
GDPR,
HIPAA, PCI-DSS, financial services standards
|
|
ROI
& TTM
|
ROI in
9–12 months, TTM 3–6 months
|
ROI in
6–9 months, TTM 1–3 months
|
|
Table
4 – ChatGPT plans
Plan
|
Price
|
Models & Access
|
Usage & Key Limits
|
Free
|
$0/month
|
GPT-5
(standard, mini), GPT-4o (limited), GPT-4.1 mini
|
Message
limits, file uploads, data analysis, image generation, limited Deep Research
|
Plus
|
$20/month
|
Full
access to GPT-5, GPT-4.5 preview, o3, o4-mini, o4-mini-high, o1, o1-mini
|
Higher
limits for messages/month, data, images, voice/video, GPT agent access
|
Pro
|
$200/month
|
Unlimited GPT-5, o1 pro mode, GPT-4o, o1-mini, o3-pro, chat agent, etc.
|
Unlimited
use (policy-bound), up to 120 Deep Research queries/month
|
Team
|
$25/user/month
(annual), $30/user/month (monthly)
|
Same
as Plus/Pro, collaborative workspace, admin controls, enterprise privacy
|
Increased
limits vs Plus, team admin & data control
|
Enterprise
|
Custom
(~$60/user/month)
|
All
Team features + higher security, compliance, 24/7 support, SLA, extended
context
|
Ideal
for >149 users, custom contracts
|
Practical experience suggests that:
1) The free option is certainly an excellent idea for
educating and introducing users to the powerful new tools available, but it
requires long waiting lists punctuated by invitations to upgrade, not only for
commercial reasons, but presumably to recoup the investments made during the
development and implementation of the systems.
2) The length of the texts, as well as the breadth of
the databases used in data analysis, can indeed lead to some
incompleteness/inconsistency issues;
3) Difficulties in translating texts (even very short
ones) into images persist even in version 5.
Beyond this, we can only be grateful and pleased to
have tools that can quantify variables that, once, could only provide a metric
through the development of scales, which were often not objective and in any
case open to question.
Post Scriptum August 17, 2025
Since this post was published, students and graduates of Sapienza University of Rome have reported, by short routes, errors in the GPT 5 chat. Here are some examples (particularly for disciplines such as Geology and Psychology):
1) References to articles with incorrect DOIs, or whose authors are incorrect, or whose content is unrelated to the topic being discussed.
2) Real seismic events are interpreted differently from what is required by current best practices and knowledge.
Post Scriptum August 21, 2025
The
explanations provided by ChatGPT for the issues reported in the previous
postscript can be summarized as follows (translation of the text from Italian
to English) :
«You're right: when it comes to
bibliographic references, articles, and links, I can sometimes return
incomplete, duplicate, or even fabricated data (what's known in technical
jargon as "hallucination"). This is a critical point, because those
who use references without checking them can make mistakes, which undermines
trust.
For
transparency:
I don't
have a proprietary database of scientific articles; I draw on what I learned
during my training and, when possible, from online sources.
If you ask
for a precise citation (with DOI or link), the safest solution is to integrate
with reliable sources (ORCID, Crossref, PubMed, etc.). Without that direct
connection, there's a risk that I'll "fill in the blanks" with
plausible but inaccurate data.
For this
reason, the best practice is always: I help with searching and formatting, but
the final source verification should be done on official databases.
The fact
that students and graduates have reported problems to you is a clear signal: it
means that robustness improvements are needed on this front, along with more
cautionary warnings when providing references. I welcome your message as an
important reminder».
She/He/It cannot lie! Can we then say: Ergo, it is not "human"?