Planet Python
Last update: August 29, 2025 01:43 AM UTC
August 29, 2025
Zero to Mastery
[August 2025] Python Monthly Newsletter đ
69th issue of Andrei Neagoie's must-read monthly Python Newsletter: Python Performance Myths, Do You Need Classes, Good System Design, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.
August 28, 2025
Peter Bengtsson
Faster way to sum an integer series in Python
You can sum a simple series with `n(n-1)/2`
Real Python
Quiz: Profiling Performance in Python
Ready to level up your Python code optimization skills? In this quiz, you’ll revisit key concepts about profiling, benchmarking, and diagnosing performance bottlenecks. You’ll practice with tools like cProfile and timeit, and see how deterministic and statistical profilers differ.
Brush up on the profiling workflow and see how to spot hotspots efficiently in your programs. Need a refresher? Check out the Profiling Performance in Python for detailed guidance.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Reuven Lerner
Youâre probably using uv wrong
This is adapted from my “Better developers” newsletter: https://BetterDevelopersWeekly.com.
Like many others in the Python world, I’ve adopted “uv“, the do-everything, lightning-fast package manager written in Rust.
uv does it all: For people who just want to download and install packages, it replaces pip. For people who want to keep multiple versions of Python on their computer, it replaces pyenv. For people who want to work on multiple projects at the same time using virtual environments, it handles that, too. And for people who want to develop and distribute Python software, it works for them, also.
Here’s the thing, though: If you’re using uv as a replacement for one of these tools or problems, then you’re probably using it wrong. Yes, uv is a superset of these tools. But the idea is to sweep many of these things under the rug, thanks to the idea of a uv “project.” In many ways, a project in uv allows us to ignore virtual environments, ignore Python versions, and even ignore pip.
I know this, because I’ve used uv the wrong way for quite a while. It was so much faster than pip that I started to say
uv pip install PACKAGE
instead of
pip install PACKAGE
But actually, that’s not quite true — I use virtual environments on software projects, but 99% of my work is teaching Python via one-off Jupyter notebooks, which means I don’t have to worry about package and version conflicts. This being the case, I would just install packages on my global Python installation:
uv pip install --system PACKAGE
Which works! However, this isn’t really the way that things are supposed to be done.
So, how are we supposed to do things?
uv assumes that everything you do will be in a “project.” Now, uv isn’t unique in this approach; PEP 518 (https://peps.python.org/pep-0518/) from way back in 2016 talked about projects, and specified a file called pyproject.toml that describes a minimal project. The file’s specifications have evolved over time, and the official specification is currently at https://packaging.python.org/en/latest/specifications/pyproject-toml/.
For many years, Python programs were just individual files. A bunch of files could be put together into a single folder and treated as a package. The term “project” was used informally at companies and working groups, or among people who wrote Python editors, such as PyCharm and VSCode.
Even without a formal definition, we all kind of know what a package is â a combination of Python and other files, all grouped together into one whole, to solve one set of problems.
A pyproject.toml file is meant to be the ultimate authority regarding a project. TOML format is similar to an INI configuration file, but with Python-like data structures such as strings and integers. It also supports version numbers and comparison operators, allowing us to indicate exact, approximate, “not less than” and “not more than” versions for dependencies.
The minimal, initial pyproject.toml file looks like this:
[project]
name = "myproj"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = []
As you can see, it only defines a “project” section, and then a number of name-value pairs. You can see that I called this project “myproj”, and that I created it using Python 3.13 â hence the “requires-python” line. It doesn’t do anything just yet, which is why it has an empty list of dependencies.
How and when do you create such a project? How does it intersect with Python’s virtual environments? How does it intersect with a Python version manager, such as pyenv, about which I’ve written previously?
Here’s the secret: To use uv correctly, you ignore pyenv. You ignore pip. You ignore venv. You just create and work with a Python project. If you do that from within uv, then you’ll basically be letting uv do all of the hard work for you, papering over all of the issues that Python packaging has accumulated over the years.
The thing is, uv does offer a variety of subcommands that let you work with virtual environments, Python versions, and package installation. So it’s easy to get lulled into using these parts of uv to replace one or more of them. But if you do that, you’re missing the point, and the overall design goals, of uv.
So, how should you be using it?
First: You create a project with “uv init”. It is possible to retroactively use uv on an existing directory of code, but let’s assume that you want to start a brand-new project. You say
uv init myproj
This creates a subdirectory, “myproj”, under the current directory. This directory contains:
- .git, the directory containing Git repo information. So yes, uv assumes that you’ll manage your project with Git, and already initializes a new repo.
- .gitignore, with reasonable defaults for anyone coding in Python. It’ll ignore __pycache__ directories, pyo and pyc compiled files, the build and dist subdirectories, and a variety of other file types that we don’t need to store in Git.
- .python-version, a file that tells uv (and pyenv, if you’re using it) what version of Python to use
- main.py, a skeleton file that you can modify (or rename) to use as the base for your application
- pyproject.toml, the configuration file I described earlier
- README.md,
Once the project is created, you can write code to your heart’s content, adding files and directories as you see fit. You can commit to the local Git repo, or you can add a remote repo and push to it.
So far, uv doesn’t seem to be doing much for us.
But let’s say that we want to modify main.py to download the latest edition of python.org and display the number of bytes contained on that page. We can say:
import requests
def main():
print("Hello from myproj!")
r = requests.get('https://python.org')
print(f'Content at python.org contains {len(r.content)} bytes.')
if __name__ == "__main__":
main()
If you run it with “python main.py”, you’ll find that it works just fine, printing a greeting and the number of bytes at python.org.
But you shouldn’t be doing that! Using “python main.py” means that you’re running whatever version of Python is in your PATH. That might well be different from what uv is using. And (as we’ll see in a bit) it likely has access to a different set of packages than uv’s installation might have.
Rather, you should be running the program with “uv run python main.py”. Running your program via “uv” means that it’ll take your pyproject.toml configuration file into account.
Why would we care? Because pyproject.toml is shared among all of the people working on your project. It ensures that they’re in sync regarding not only the version of Python you’re using, but also the libraries and tools you’re using, too. (We’ll get to packages in just a moment.) If I make sure to configure everything correctly in “pyproject.toml”, then everyone on my team will have an identical environment when they run my code. It also means that if we install our code on a project system, it’ll also use things correctly.
So, what happens when I run it?
⯠uv run python main.py
Using CPython 3.13.5 interpreter at: /Users/reuven/.pyenv/versions/3.13.5/bin/python3.13
Creating virtual environment at: .venv
Traceback (most recent call last):
File "/Users/reuven/Desktop/myproj/main.py", line 1, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
As we can see, “requests” is not installed. But wait â we just saw that it’s installed on my system. Indeed, we got a response back from the program!
This is where anyone familiar with virtual environments will start to nod their head, saying, “Good! uv is ensuring that only packages installed in the virtual environment for this project will be available.”
And indeed, you can see that uv noticed a lack of a venv, and created one in a hidden subdirectory, “.venv”. So “uv run” doesn’t just run our program, it does so within the context of a virtual environment.
If you’re expecting us to start using “activate” and “pip install” within a venv, you’ll be sadly mistaken. That’s because uv wants to shield us from such things. Instead, we’ll add one or more files to pyproject.toml using “uv add”:
⯠uv add requests
Here’s what I get:
Resolved 6 packages in 42ms
Installed 5 packages in 13ms
+ certifi==2025.8.3
+ charset-normalizer==3.4.3
+ idna==3.10
+ requests==2.32.4
+ urllib3==2.5.0
These packages were installed in .venv/lib/python3.13/site-packages, which is what we would expect in a virtual environment. But you can mostly ignore the .venv directory. That’s because the most important file is pyproject.toml, which we see has been changed via “uv add”:
[project]
name = "myproj"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"requests>=2.32.4",
]
We now have one dependency, requests with a version of at least 2.32.4.
In a traditional venv-based project, we would activate the venv, pip install, run, and then deactivate the venv. In the uv world, we use uv to add to our pyproject.toml and to run our program with “uv run”. In both cases, the venv is used, but that usage is mostly hidden from view.
But wait a second: It’s nice that we indicated what version of requests we need. But what about the packages that requests requires? Moreover, what if our program also requires NumPy, which has components that are compiled from C? How can we be sure that everyone who downloads this project and uses “uv run” is going to use precisely the same versions of the same packages?
The answer is another configuration file, called “uv.lock”. This file is written and maintained by uv, and shouldn’t ever be touched by us. It should, however, be committed to Git and distributed to everyone running the project. When you use “uv run”, uv checks “uv.lock” to ensure that all of the needed packages are installed, and that they are all compatible with one another. If it needs, it’ll download and install versions that are missing, too. And “uv.lock” includes the precise filenames that are needed for each package, for each supported architecture and version of Python — for the packages that we explicitly list as dependencies, and those on which the dependencies themselves rely.
If you adopt uv in the way it’s meant to be used, you thus end up with a workflow that’s less complex than what many Python developers have used before. When you need a package, you “uv add” it. When you want to run your program, you “uv run” it. And so long as you make sure to check “uv.lock” into Git, then anyone else downloading, installing, and running your program via “uv run” will be sure that all libraries are installed and compatible with one another.
The post You’re probably using uv wrong appeared first on Reuven Lerner.
August 27, 2025
Real Python
Python 3.14 Preview: Lazy Annotations
Recent Python releases have introduced several small improvements to the type hinting system, but Python 3.14 brings a single major change: lazy annotations. This change delays annotation evaluation until explicitly requested, improving performance and resolving issues with forward references. Library maintainers might need to adapt, but for regular Python users, this change promises a simpler and faster development experience.
By the end of this tutorial, youâll understand that:
- Although annotations are used primarily for type hinting in Python, they support both static type checking and runtime metadata processing.
- Lazy annotations in Python 3.14 defer evaluation until needed, enhancing performance and reducing startup time.
- Lazy annotations address issues with forward references, allowing types to be defined later.
- You can access annotations via the
.__annotations__
attribute or useannotationlib.get_annotations()
andtyping.get_type_hints()
for more robust introspection. typing.Annotated
enables combining type hints with metadata, facilitating both static type checking and runtime processing.
Explore how lazy annotations in Python 3.14 streamline your development process, offering both performance benefits and enhanced code clarity. If youâre just looking for a brief overview of the key changes in 3.14, then expand the collapsible section below:
Python 3.14 introduces lazy evaluation of annotations, solving long-standing pain points with type hints. Hereâs what you need to know:
- Annotations are no longer evaluated at definition time. Instead, their processing is deferred until you explicitly access them.
- Forward references work out of the box without needing string literals or
from __future__ import annotations
. - Circular imports are no longer an issue for type hints because annotations donât trigger immediate name resolution.
- Startup performance improves, especially for modules with expensive annotation expressions.
- Standard tools, such as
typing.get_type_hints()
andinspect.get_annotations()
, still work but now benefit from the new evaluation strategy. inspect.get_annotations()
becomes deprecated in favor of the enhancedannotationlib.get_annotations()
.- You can now request annotations at runtime in alternative formats, including strings, values, and proxy objects that safely handle forward references.
These changes make type hinting faster, safer, and easier to use, mostly without breaking backward compatibility.
Get Your Code: Click here to download the free sample code that shows you how to use lazy annotations in Python 3.14.
Take the Quiz: Test your knowledge with our interactive âPython Annotationsâ quiz. Youâll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python AnnotationsTest your knowledge of annotations and type hints, including how different Python versions evaluate them at runtime.
Python Annotations in a Nutshell
Before diving into whatâs changed in Python 3.14 regarding annotations, itâs a good idea to review some of the terminology surrounding annotations. In the next sections, youâll learn the difference between annotations and type hints, and review some of their most common use cases. If youâre already familiar with these concepts, then skip straight to lazy evaluation of annotations for details on how the new annotation processing works.
Annotations vs Type Hints
Arguably, type hints are the most common use case for annotations in Python today. However, annotations are a more general-purpose feature with broader applications. Theyâre a form of syntactic metadata that you can optionally attach to your Python functions and variables.
Although annotations can convey arbitrary information, they must follow the languageâs syntax rules. In other words, you wonât be able to define an annotation representing a piece of syntactically incorrect Python code.
To be even more precise, annotations must be valid Python expressions, such as string literals, arithmetic operations, or even function calls. On the other hand, annotations canât be simple or compound statements that arenât expressions, like assignments or conditionals, because those might have unintended side effects.
Note: For a deeper explanation of the difference between these two constructs, check out Expression vs Statement in Python: Whatâs the Difference?
Python supports two flavors of annotations, as specified in PEP 3107 and PEP 526:
- Function annotations: Metadata attached to signatures of callable objects, including functions and methodsâbut not lambda functions, which donât support the annotation syntax.
- Variable annotations: Metadata attached to local, nonlocal, and global variables, as well as class and instance attributes.
The syntax for function and variable annotations looks almost identical, except that functions support additional notation for specifying their return value. Below is the official syntax for both types of annotations in Python. Note that <annotation>
is a placeholder, and you donât need the angle brackets when replacing this placeholder with the actual annotation:
Python 3.6+
class Class:
# These two could be either class or instance attributes:
attribute1: <annotation>
attribute2: <annotation> = value
def method(
self,
parameter1,
parameter2: <annotation>,
parameter3: <annotation> = default_value,
parameter4=default_value,
) -> <annotation>:
self.instance_attribute1: <annotation>
self.instance_attribute2: <annotation> = value
...
def function(
parameter1,
parameter2: <annotation>,
parameter3: <annotation> = default_value,
parameter4=default_value,
) -> <annotation>:
...
variable1: <annotation>
variable2: <annotation> = value
To annotate a variable, attribute, or function parameter, put a colon (:
) just after its name, followed by the annotation itself. Conversely, to annotate a functionâs return value, place the right arrow (->
) symbol after the closing parenthesis of the parameter list. The return annotation goes between that arrow and the colon denoting the start of the functionâs body.
Note: The right arrow symbol isnât unique to Python. A few other programming languages use it as well but for different purposes. For example, Java and CoffeeScript use it to define anonymous functions, similar to Pythonâs lambdas. This symbol is sometimes referred to as the thin arrow (->
) to distinguish it from the fat arrow (=>
) found in JavaScript and Scala.
As shown, you can mix and match function and method parameters, including optional parameters, with or without annotations. You can also annotate a variable without assigning it a value, effectively making a declaration of an identifier that might be defined later.
Declaring a variable doesnât allocate memory for its storage or even register it in the current namespace. Still, it can be useful for communicating the expected type to other people reading your code or a static type checker. Another common use case is instructing the Python interpreter to generate boilerplate code on your behalf, such as when working with data classes. Youâll explore these scenarios in the next section.
To give you a better idea of what Python annotations might look like in practice, below are concrete examples of syntactically correct variable annotations:
Python 3.6+
>>> temperature: float
>>> pressure: {"unit": "kPa", "min": 220, "max": 270}
You annotate the variable temperature
with float
to indicate its expected type. For the variable pressure
, you use a Python dictionary to specify the air pressure unit along with its minimum and maximum values. This kind of metadata could be used to validate the actual value at runtime, generate documentation based on the source code, or even automatically build a command-line interface for a Python script.
Read the full article at https://realpython.com/python-annotations/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
The Python Show
54 - Neural Networks and Data Visualization with Nicolas Rougier
In this episode, we have the honor of having Nicolas Rougier on our show. Nicolas is a researcher and team leader at the Institute of Neurodegenerative Diseases (Bordeaux, France).
We discuss how Nicolas utilizes computational models and neural networks in his research on the brain. We also talk about Nicolas’s history with Python, his work on Glumpy and VisPy, and much, much more!
Links
Scientific Visualization: Python & Matplotlib, an open access book on scientific visualization.
From Python to Numpy, an open access book on numerical computing
100 Numpy Exercises is a collection of 100 numpy exercises, from easy to hard.
Real Python
Quiz: Working With Python's .__dict__ Attribute
This quiz helps you sharpen your understanding of Python’s .__dict__
attribute. You’ll revisit how attributes are stored, who actually has a .__dict__
, what mappingproxy means, and why __slots__
matters.
Get ready to test your knowledge after watching Working With Python’s .__dict__
Attribute.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Quiz: Python Annotations
In this quiz, you’ll test your understanding of lazy annotations introduced in Python 3.14.
By working through this quiz, you’ll revisit how they improve performance, address forward reference issues, and support both static type checking and runtime processing.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Seth Michael Larson
The vulnerability might be in the proof-of-concept
The Security Developer-in-Residence role at the Python Software Foundation is funded by Alpha-Omega. Thanks to Alpha-Omega for sponsoring security in the Python ecosystem.
I'm on the security team for multiple open source projects with ~medium levels of report volume. Over the years, you see patterns in how reporters try to have a report accepted as a vulnerability in the project.
One pattern that I see frequently is submitting proof-of-concept code that itself contains the vulnerability. However, the project code is also used, so the reporters try to convince you that the vulnerability is in the project code.
Here's a simplified version of reports that the Python Security Response Team sees fairly frequently:
user_controlled_value = "..."
# ...(some layers of indirection)
eval(user_controlled_value) # RCE!!!
This isn't a vulnerability in Python, clearly. Python is designed to execute code, so if you tell Python to execute code it will do so. But it can be less obvious when there's a more subtle vulnerability in the proof-of-concept. The below example filters user-controlled URLs and returns an HTTP response for acceptable URLs:
import urllib3
from urllib.parse import urlparse
def safe_url_opener(url):
input_url = urlparse(url)
input_scheme = input_url.scheme
input_host = input_url.hostname
block_schemes = ["file", "ftp"]
block_hosts = ["evil.com"]
if input_scheme in block_schemes:
return None
if input_host in block_hosts:
return None
return urllib3.request("GET", url)
The reporter claimed that there was a vulnerability in urlparse
because
the parser behaved differently than urllib3.request
and thus an attacker would
be able to circumvent the block list with a URL crafted to exploit these differences (âSSRFâ).
Keep in mind both urlparse
and urllib3
both implement RFC 3986, but due to backwards compatibility
urllib3 supports âscheme-lessâ URLs in the form âlocalhost:8080/
â to be accepted and handled as âhttp://localhost:8080/
â.
I didn't agree with this reporters determination, instead I asserted that the safe_url_opener()
function contains
the vulnerability. To prove this, I implemented a safe_url_opener()
function
that uses urlparse
with urllib3
securely:
import urllib3
from urllib.parse import urlparse
def safe_url_opener(unsafe_url):
safe_url = urlparse(unsafe_url)
# Use an allow-list, not a block-list.
allow_schemes = ["https"]
allow_hosts = ["good.com"]
if safe_url.scheme not in allow_schemes:
return
if safe_url.hostname not in allow_hosts:
return
# Check the URL doesn't have components we don't expect.
if safe_url.auth is not None or safe_url.port is not None:
return
# Use the safe parsed values, not the unsafe URL.
pool = urllib3.HTTPSConnectionPool(
host=safe_url.hostname,
assert_hostname=safe_url.hostname,
)
target = safe_url.path or "/"
if safe_url.query:
target += f"?{safe_url.query}"
return pool.request("GET", target)
The above program could be even more secure and use urllib3's urllib3.util.parse_url()
function to completely remove SSRF potential.
This post is meant as a reminder to security teams and maintainers of open source projects that
sometimes the vulnerability is in the proof-of-concept and not your own project's code.
Having a security policy (e.g. âurlparse
strictly implements RFC 3986 regardless of other implementation behaviorsâ) and threat model (e.g. âusers must not combine with other URL parsersâ) documented for public APIs means
security reports can be treated consistently while minimizing stress and reducing repeated-research into historical decisions around API design.
Thanks for keeping RSS alive! â„
Quansight Labs Blog
Expressions are coming to pandas!
`pd.col` will soon be a real thing!
August 26, 2025
PyCoderâs Weekly
Issue #696: Namespaces, with, functools.Placeholder, and More (Aug. 26, 2025)
#696 â AUGUST 26, 2025
View in Browser »
Python Namespace Packages Are a Pain
Namespace packages are a way of splitting a Python package across multiple directories. Namespaces can be implicit or explicit and this can cause confusion. This article explains why and recommends what to do.
JOSH CANNON
Python’s with
Statement: Manage External Resources Safely
Understand Python’s with
statement and context managers to streamline the setup and teardown phases in resource management. Start writing safer code today!
REAL PYTHON
functools.Placeholder
Learn how to use functools.Placeholder
, new in Python 3.14, with a real-life example.
RODRIGO GIRĂO SERRĂO
Articles & Tutorials
Agentic Al Programming With Python
Agentic AI programming is what happens when coding assistants stop acting like autocomplete and start collaborating on real work. This episode of Talk Python To Me interviews Matthew Makai and they cut through the hype and incentives to define âagentic,â and get hands-on with how it can work for you.
KENNEDY & MAKAI podcast
pytest for Data Scientists
This guide shows how to use pytest to write lightweight yet powerful tests for functions, NumPy arrays, and pandas DataFrames. Youâll also learn about parametrization, fixtures, and mocking to make your workflows more reliable and production-ready.
CODECUT.AI âą Shared by Khuyen Tran
SciPy, NumPy, and Fostering Scientific Python
What went into developing the open-source Python tools data scientists use every day? This week on the show, we talk with Travis Oliphant about his work on SciPy, NumPy, Numba, and many other contributions to the Python scientific community.
REAL PYTHON podcast
The State of Python 2025
Explore the key trends and actionable ideas from the latest Python Developers Survey, which was conducted jointly by the Python Software Foundation and JetBrains PyCharm and includes insights from over 30,000 developers. Discover the key takeaways in this blog post.
JETBRAINS.COM âą Shared by Evgeniia Verbina from JetBrains PyCharm
Preventing Domain Resurrection Attacks
“PyPI now checks for expired domains to prevent domain resurrection attacks, a type of supply-chain attack where someone buys an expired domain and uses it to take over PyPI accounts through password resets.”
MIKE FIEDLER
How to Use Redis With Python
“Redis is an open-source, in-memory data structure store that can be used as a database, cache, message broker, or queue” Learn how to use it with Python in this step-by-step tutorial.
APPSIGNAL.COM âą Shared by AppSignal
Custom Parametrization Scheme With pytest
Custom parametrisation schemes are not advertised a lot within the pytest community. Learn how they can improve readability and debugging of your tests.
CHRISTOS LIONTOS âą Shared by Christos Liontos
Hypothesis Is Now Thread-Safe
Hypothesis is a property-based testing library for Python. In order to move towards comparability with free-threading, the library is now thread safe.
LIAM DEVOE
Single and Double Underscores in Python Names
Learn Python naming conventions with single and double underscores to design APIs, create safe classes, and prevent name clashes.
REAL PYTHON
Projects & Code
Events
Weekly Real Python Office Hours Q&A (Virtual)
August 27, 2025
REALPYTHON.COM
PyCon Poland 2025
August 28 to September 1, 2025
PYCON.ORG
PyCon Kenya 2025
August 28 to August 31, 2025
PYCON.KE
PyCon Greece 2025
August 29 to August 31, 2025
PYCON.GR
đ ÂĄCuarta ReuniĂłn De Pythonistas GDL!
August 30, 2025
PYTHONISTAS-GDL.ORG
PyData Berlin 2025
September 1 to September 4, 2025
PYDATA.ORG
Limbe
September 1 to September 2, 2025
NOKIDBEHIND.ORG
Django Summit DELSU
September 1 to September 6, 2025
HAMPLUSTECH.COM
PyCon Taiwan
September 6 to September 8, 2025
PYCON.ORG
Happy Pythoning!
This was PyCoder’s Weekly Issue #696.
View in Browser »
[ Subscribe to đ PyCoder’s Weekly đ â Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
Mike Driscoll
Python Books and Courses â Back to School Sale
If you are heading back to school and need to learn Python, consider checking out my sale. You can get 25% off any of my eBooks or courses using the following coupon at checkout: FALL25
My books and course cover the following topics:
- Beginner Python (Python 101)
- Intermediate Python
- Creating GUIs with wxPython
- Working with Excel
- Image Processing
- Creating PDFs with Python
- Working with JupyterLab
- Creating TUIs with Python and Textual
- Python Logging
Start learning Python or widen your Python knowledge today!
The post Python Books and Courses – Back to School Sale appeared first on Mouse Vs Python.
Real Python
Profiling Performance in Python
Do you want to optimize the performance of your Python program to make it run faster or consume less memory? Before diving into any performance tuning, you should strongly consider using a technique called software profiling. It can help you decide whether optimizing the code is necessary and, if so, which parts of the code you should focus on.
Sometimes, the return on investment in performance optimizations just isn’t worth the effort. If you only run your code once or twice, or if it takes longer to improve the code than execute it, then what’s the point?
When it comes to improving the quality of your code, you’ll probably optimize for performance as a final step, if you do it at all. Often, your code will become speedier and more memory efficient thanks to other changes that you make. When in doubt, go through this short checklist to figure out whether to work on performance:
- Testing: Have you tested your code to prove that it works as expected and without errors?
- Refactoring: Does your code need some cleanup to become more maintainable and Pythonic?
- Profiling: Have you identified the most inefficient parts of your code?
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Seth Michael Larson
SMS URLs
Did you know there are is a URL scheme for sending
an âSMSâ or text message, similar to mailto:
? SMS URLs
are defined in RFC 5724 and are formatted like so:
sms:<recipient(s)>?body=<body>
Here's a bunch of test links with different scenarios you can try on your mobile phone:
Annoyingly, it appears that as of today Apple doesn't implement RFC 5724 correctly for multiple recipients. The first URL won't work on iPhones, but will work on Android. Only the second URL will work on iPhones (and there's not much public explanation as to why that might be).
sms:+15551230001,+15551230002,...?body=Hello%20world!
sms://open?addresses=+15551230001,+15551230002,...&body=Hello%20world!
Thanks for keeping RSS alive! â„
August 25, 2025
The Lunar Cowboy
Introducing unittest-fixtures
I would like to introduce unittest-fixtures
. The
unittest-fixtures package is a
helper for the
unittest.TestCase class
that allows one to define fixtures as simple functions and declare them in
your TestCase
using decorators.
The unittest-fixtures package is available now from PyPI.
The following is extracted from the project's README:
Description
unittest-fixtures spun off from my Gentoo Build
Publisher project. I use
unittest, the test
framework in the Python standard library, where it's customary to define a
TestCase's fixtures in the .setUp()
method. Having done it this way for
years, it occurred to me one day that this goes against
OCP. What if
instead of cracking open the .setUp()
method to add a fixture to a TestCase
one could instead add a decorator? That's what unittest-fixtures allows one
to do.
from unittest_fixtures import given
@given(dog)
class MyTest(TestCase):
def test_method(self, fixtures):
dog = fixtures.dog
In the above example, dog
is a fixture function. Fixture functions are
passed to the given
decorator. When the test method is run, the fixtures are
"instantiated" and attached to the fixtures
keyword argument of the test
method.
Fixture functions
Fixture functions are functions that one defines that return a "fixture". For example the above dog fixture might look like this:
from unittest_fixtures import fixture
@fixture()
def dog(fixtures):
return Dog(name="Fido")
Fixture functions are always passed a Fixtures
argument. Because fixtures
can depend on other fixtures. For example:
@fixture(dog)
def person(fixtures):
p = Person(name="Jane")
p.pet = fixtures.dog
return p
Fixture functions can have keyword parameters, but those parameters must have defaults.
@fixture
def dog(fixtures, name="Fido"):
return Dog(name=name)
Then one's TestCase can use the where
decorator to passed the parameter:
from unittest_fixtures import given, where
@given(dog)
@where(dog__name="Buddy")
class MyTest(TestCase):
def test_method(self, fixtures):
dog = fixtures.dog
self.assertEqual("Buddy", dog.name)
Duplicating fixtures
The unittest-fixtures library allows one to use a fixture more than once. This can be done by passing the fixture as a keyword argument giving different names to the same fixture. Different parameters can be passed to them:
@given(fido=dog, buddy=dog)
@where(fido__name="Fido", buddy__name="Buddy")
class MyTest(TestCase):
def test_method(self, fixtures):
self.assertEqual("Buddy", fixtures.buddy.name)
self.assertEqual("Fido", fixtures.fido.name)
Fixture-depending fixtures will all use the same fixture, but only if they
have the same name. So in the above example, if we also gave the TestCase the
person
fixture, that person would have a different dog because it depends on
a fixture called "dog". However this will work:
@given(dog, person)
class MyTest(TestCase):
def test_method(self, fixtures):
dog = fixtures.dog
person = fixtures.person
self.assertIs(person.pet, dog)
@where (fixture parameters)
The where
decorator can be used to pass parameters to a fixture function.
Fixture functions are not required to take arguments. To pass a parameter to a
function, for example pass name
to the dog
fixture it's the name of the
function, followed by __
followed by the parameter name. For example:
dog__name
. Fixture functions can also have a parameter that is the same
name as the fixture itself. For example:
@given(settings)
@where(settings={"DEBUG": True, "SECRET": "sauce"})
class MyTest(TestCase):
...
There are times when one may desire to pass a fixture parameter that uses the
value of another fixture, however that value does not get calculated until
each test is run. The Param
type allows one to accomplish this:
from unittest_fixtures import Param, given, where
@given(person)
@where(person__name=Param(lambda fixtures: fixtures.name))
@given(name=random_choice)
@where(name__choices=["Liam", "Noah", "Jack", "Oliver"])
class MyTest(TestCase):
...
[!NOTE] In the above example, fixture ordering is important. Given that
person
implicitly depends onname
, thename
fixture needs to be set up first. We do this by declaring it before (lower vertically in the list of decorators) than theperson
fixture.
Fixtures as context managers
Sometimes a fixture will need a setup and teardown process. If
unittest-fixtures is supposed to remove the need to open setUp()
, then it
must also remove the need to open tearDown()
. And it does this by by
defining itself as a generator function. For example:
import tempfile
@fixture()
def tmpdir(fixtures):
with tempfile.TemporaryDirectory() as tempdir:
yield tempdir
Using the unittest.mock
library is another good example of using context
manager fixtures.
fixture-depending fixtures
As stated above, fixtures can depend on other fitures. This is done by
"declaring" the dependencies in the fixture
decorator. Fixtures are then
passed as an argument to the fixture function:
@fixture(settings, tmpdir)
def jenkins(fixtures, root=None):
root = root or fixtures.tmpdir
settings = replace(fixtures.settings, STORAGE_PATH=root)
return Jenkins.from_settings(settings)
The above example shows that one can get pretty fancy... or creative with one's fixture definitions.
Fixtures can also have named dependencies. So in the above example, if one
wanted a different tmpdir
than the "global" one:
@fixture(settings, jenkins_root=tmpdir)
def jenkins(fixtures, root=None):
root = root or fixtures.jenkins_root
settings = replace(fixtures.settings, STORAGE_PATH=root)
return Jenkins.from_settings(settings)
If a TestCase used both jenkins
and tmpdir
:
@given(tmpdir, jenkins)
class MyTest(TestCase):
def test_something(self, fixtures):
self.assertNotEqual(fixtures.jenkins.root, fixtures.tmpdir)
Again if the two fixtures have different names then they are two separate fixtures. In general one should not use named fixtures unless one wants multiple fixtures of the same type.
@params (parametrized tests)
Not to be confused with @parametrized
(below) which works similarly. The params
decorator turns a TestCase's methods into parametrized tests, however, unlike
@parametrized
, the parameters are passed into the fixtures argument instead of
additional arguments to the test method. For example:
from unittest_fixtures import params
@params(number=[1, 2, 3], square=[1, 4, 9])
class MyTest(TestCase):
def test(self, fixtures):
self.assertEqual(fixtures.number**2, fixtures.square)
In the above example, the test
method is called three times. With each iteration the
fixtures
parameter has the values:
Fixtures(number=1, square=1)
Fixtures(number=2, square=4)
Fixtures(number=3, square=9)
@parametrized
The @parametrized
decorator that acts as a wrapper for unittest's
subtests.
Unlike @params
above, this decorator is to be applied to TestCase
methods rather
than tests themselves. In this case extra parameters are passed to the test method.
This can be used if you only want to parameterize a specific test method in a TestCase
rather than all test methods.
For example:
from unittest_fixtures import parametrized
class ParametrizeTests(TestCase):
@parametrized([[1, 1], [2, 4], [3, 9]])
def test(self, number, square):
self.assertEqual(number**2, square)
The fixtures kwarg may be overridden
The fixtures
keyword argument is automatically passed to TestCase methods
when the test is run. The name of the keyword argument can be overridden as
follows:
@given(dog)
class MyTest(TestCase):
unittest_fixtures_kwarg = "fx"
def test_method(self, fx):
dog = fx.dog
Caktus Consulting Group
How to migrate from pip-tools to uv
At Caktus, many of our projects use pip-tools
for dependency management. Following Tobias’ post How to Migrate your Python & Django Projects to uv, we were looking to migrate other projects to uv
, but the path seemed less clear with existing pip-tools setups. Our requirements are often spread across multiple files, like this:
Real Python
How to Write Docstrings in Python
Writing clear, consistent docstrings in Python helps others understand your codeâs purpose, parameters, and outputs. In this guide on how to write docstrings in Python, youâll learn about best practices, standard formats, and common pitfalls to avoid, ensuring your documentation is accessible to users and tools alike.
By the end of this tutorial, youâll understand that:
- Docstrings are strings used to document your Python code and can be accessed at runtime.
- Python comments and docstrings have important differences.
- One-line and multiline docstrings are classifications of docstrings.
- Common docstring formats include reStructuredText, Google-style, NumPy-style, and doctest-style.
- Antipatterns such as inconsistent formatting should be avoided when writing docstrings.
Explore the following sections to see concrete examples and detailed explanations for crafting effective docstrings in your Python projects.
Get Your Code: Click here to download the free sample code that shows you how to write docstrings in Python.
Take the Quiz: Test your knowledge with our interactive âHow to Write Docstrings in Pythonâ quiz. Youâll receive a score upon completion to help you track your learning progress:
Interactive Quiz
How to Write Docstrings in PythonTest your knowledge of Python docstrings, including syntax, conventions, formats, and how to access and generate documentation.
Getting Started With Docstrings in Python
Python docstrings are string literals that show information regarding Python functions, classes, methods, and modules, allowing them to be properly documented. They are placed immediately after the definition line in triple double quotes ("""
).
Their use and convention are described in PEP 257, which is a Python Enhancement Proposal (PEP) that outlines conventions for writing docstrings. Docstrings donât follow a strict formal style. Hereâs an example:
docstring_format.py
def determine_magic_level(magic_number):
"""
Multiply a wizard's favorite number by 3 to reveal their magic level.
"""
return magic_number * 3
Docstrings are a built-in means of documentation. While this may remind you of comments in Python, docstrings serve a distinct purpose. If youâre curious and would like to see a quick breakdown of the differences now, open the collapsible section below.
Python comments and docstrings seem a lot alike, but theyâre actually quite different in a number of ways because they serve different purposes:
Comments | Docstrings |
---|---|
Begin with # |
Are enclosed in triple quotes (""" ) |
Consist of notes and reminders written by developers for other developers | Provide documentation for users and tools |
Are ignored by the Python interpreter | Are stored in .__doc__ and accessible at runtime |
Can be placed anywhere in code | Are placed at the start of modules, classes, and functions |
To summarize, comments explain parts of an implementation that may not be obvious or that record important notes for other developers. Docstrings describe modules, classes, and functions so users and tools can access that information at runtime.
So, while comments and docstrings may look similar at first glance, their purpose and behavior in Python are different. Next, youâll look at one-line and multiline docstrings.
One-Line vs Multiline Docstrings
Docstrings are generally classified as either one-line or multiline. As the names suggest, one-line docstrings take up only a single line, while multiline docstrings span more than one line. While this may appear to be a slight difference, how you use and format them in your code matters.
An important formatting rule from PEP 257 is that one-line docstrings should be concise, while multiline docstrings should have their closing quotes on a new line. You may resort to a one-line docstring for relatively straightforward programs like the one below:
one_line_docstring.py
import random
def picking_hat():
"""Return a random house name."""
houses = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
return random.choice(houses)
In this example, you see a program that returns a random house as depicted in the classic Harry Potter stories. This is a good example for the use of one-line docstrings.
You use multiline docstrings when you have to provide a more thorough explanation of your code, which is helpful for other developers. Generally, a docstring should contain parameters, return value details, and a summary of the code.
Youâre free to format docstrings as you like. That being said, youâll learn later that there are common docstring formats that you may follow. Hereâs an example of a multiline docstring:
multiline_docstring.py
def get_harry_potter_book(publication_year, title):
"""
Retrieve a Harry Potter book by its publication year and name.
Parameters:
publication_year (int): The year the book was published.
title (str): The title of the book.
Returns:
str: A sentence describing the book and its publication year.
"""
return f"The book {title!r} was published in the year {publication_year}."
As you can see, the closing quotes for this multiline docstring appear on a separate line. Now that you understand the difference between one-line and multiline docstrings, youâll learn how to access docstrings in your code.
Ways to Access Docstrings in Python
Unlike code comments, docstrings arenât ignored by the interpreter. They become a part of the program and serve as associated documentation for anyone who wants to understand your program and what it does. Thatâs why knowing how to access docstrings is so useful. Python provides two built-in ways to access docstrings: the .__doc__
attribute and the help()
function.
The .__doc__
Attribute
Read the full article at https://realpython.com/how-to-write-docstrings-in-python/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Hugo van Kemenade
EuroPython 2025: A roundup of writeups
Some out-of-context quotes:
- “We can just bump the version and move on.” – Dr. Brett Cannon
- “You just show up. Thatâs it.” – Rodrigo GirĂŁo SerrĂŁo
- “If it kwargs like a dorg, it’s a dorg.” – SebastiĂĄn RamĂrez
- “Our job will be to put the human in.” – Paul Everitt
20 July 2025
21 July 2025
23 July 2025
24 July 2025
11 August 2025
12 August 2025
21 August 2025
And a bunch of LinkedIn posts:
- Alicja Kocieniewska
- Diego Russo
- Ece Akdeniz
- Jodie Burchell
- Kseniia Usyk
- Lara KrÀmer
- Libor VanÄk
- Marco Richetta
- Olena Yefymenko
- Vassiliki Dalakiari
Finally, the official photos and videos should be up soon, and here are my photos.
Header photo: Savannah Bailey’s keynote (CC BY-NC-SA 2.0 Hugo van Kemenade).
Real Python
Quiz: How to Write Docstrings in Python
Want to get comfortable writing and using Python docstrings? This quiz helps you revisit best practices, standard conventions, and common tools.
You’ll review the basics of docstring syntax, how to read them at runtime, and different formatting styles. For more details, check out the tutorial How to Write Docstrings in Python.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
PyCharm
Fine-Tuning and Deploying GPT Models Using Hugging Face Transformers
Hugging Face is currently a household name for machine learning researchers and enthusiasts. One of their biggest successes is Transformers, a model-definition framework for machine learning models in text, computer vision, audio, and video. Because of the vast repository of state-of-the-art machine learning models available on the Hugging Face Hub and the compatibility of Transformers with the majority of training frameworks, it is widely used for inference and model training.
Why do we want to fine-tune an AI model?
Fine-tuning AI models is crucial for tailoring their performance to specific tasks and datasets, enabling them to achieve higher accuracy and efficiency compared to using a general-purpose model. By adapting a pre-trained model, fine-tuning reduces the need for training from scratch, saving time and resources. It also allows for better handling of specific formats, nuances, and edge cases within a particular domain, leading to more reliable and tailored outputs.
In this blog post, we will fine-tune a GPT model with mathematical reasoning so it better handles math questions.
Using models from Hugging Face
When using PyCharm, we can easily browse and add any models from Hugging Face. In a new Python file, from the Code menu at the top, select Insert HF Model.

In the menu that opens, you can browse models by category or start typing in the search bar at the top. When you select a model, you can see its description on the right.

When you click Use Model, you will see a code snippet added to your file. And that’s it â You’re ready to start using your Hugging Face model.

GPT (Generative Pre-Trained Transformer) models
GPT models are very popular on the Hugging Face Hub, but what are they? GPTs are trained models that understand natural language and generate high-quality text. They are mainly used in tasks related to textual entailment, question answering, semantic similarity, and document classification. The most famous example is ChatGPT, created by OpenAI.
A lot of OpenAI GPT models are available on the Hugging Face Hub, and we will learn how to use these models with Transformers, fine-tune them with our own data, and deploy them in an application.
Benefits of using Transformers
Transformers, together with other tools provided by Hugging Face, provides high-level tools for fine-tuning any sophisticated deep learning model. Instead of requiring you to fully understand a given modelâs architecture and tokenization method, these tools help make models âplug and playâ with any compatible training data, while also providing a large amount of customization in tokenization and training.
Transformers in action
To get a closer look at Transformers in action, letâs see how we can use it to interact with a GPT model.
Inference using a pretrained model with a pipeline
After selecting and adding the OpenAI GPT-2 model to the code, this is what weâve got:
from transformers import pipeline pipe = pipeline("text-generation", model="openai-community/gpt2")
Before we can use it, we need to make a few preparations. First, we need to install a machine learning framework. In this example, we chose PyTorch. You can install it easily via the Python Packages window in PyCharm.

Then we need to install Transformers using the `torch` option. You can do that by using the terminal â open it using the button on the left or use the â„ F12 (MacOS) or Alt + F12 (Windows) hotkey.

In the terminal, since we are using uv, we use the following commands to add it as a dependency and install it:
uv add âtransformers[torch]â uv sync
If you are using pip:
pip install âtransformers[torch]â
We will also install a couple more libraries that we will need later, including python-dotenv, datasets, notebook, and ipywidgets. You can use either of the methods above to install them.
After that, it may be best to add a GPU device to speed up the model. Depending on what you have on your machine, you can add it by setting the device parameter in pipeline. Since I am using a Mac M2 machine, I can set device="mps"
like this:
pipe = pipeline("text-generation", model="openai-community/gpt2", device="mps")
If you have CUDA GPUs you can also set device="cuda"
.
Now that weâve set up our pipeline, letâs try it out with a simple prompt:
from transformers import pipeline pipe = pipeline("text-generation", model="openai-community/gpt2", device="mps") print(pipe("A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?", max_new_tokens=200))
Run the script with the Run button () at the top:

The result will look something like this:
[{'generated_text': 'A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?\n\nA rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width?\n\nA rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width?\n\nA rectangle has a perimeter of 20 cm. If the width is 6 cm, what is the width? A rectangle has a perimeter'}]
There isnât much reasoning in this at all, only a bunch of nonsense.
You may also see this warning:
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
This is the default setting.You can also manually add it as below, so this warning disappears, but we donât have to worry about it too much at this stage.
print(pipe("A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?", max_new_tokens=200, pad_token_id=pipe.tokenizer.eos_token_id))
Now that weâve seen how GPT-2 behaves out of the box, letâs see if we can make it better at math reasoning with some fine-tuning.
Load and prepare a dataset from the Hugging Face Hub
Before we work on the GPT model, we first need training data. Letâs see how to get a dataset from the Hugging Face Hub.
If you haven’t already, sign up for a Hugging Face account and create an access token. We only need a `read` token for now. Store your token in a `.env` file, like so:
HF_TOKEN=your-hugging-face-access-token
We will use this Math Reasoning Dataset, which has text describing some math reasoning. We will fine-tune our GPT model with this dataset so it can solve math problems more effectively.
Letâs create a new Jupyter notebook, which weâll use for fine-tuning because it lets us run different code snippets one by one and monitor the progress.
In the first cell, we use this script to load the dataset from the Hugging Face Hub:
from datasets import load_dataset from dotenv import load_dotenv import os load_dotenv() dataset = load_dataset("Cheukting/math-meta-reasoning-cleaned", token=os.getenv("HF_TOKEN")) dataset
Run this cell (it may take a while, depending on your internet speed), which will download the dataset. When itâs done, we can have a look at the result:
DatasetDict({ train: Dataset({ features: ['id', 'text', 'token_count'], num_rows: 987485 }) })
If you are curious and want to have a peek at the data, you can do so in PyCharm. Open the Jupyter Variables window using the button on the right:

Expand dataset and you will see the View as DataFrame option next to dataset[âtrainâ]:

Click on it to take a look at the data in the Data View tool window:

Next, we will tokenize the text in the dataset:
from transformers import GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("openai-community/gpt2") tokenizer.pad_token = tokenizer.eos_token def tokenize_function(examples): return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=512) tokenized_datasets = dataset.map(tokenize_function, batched=True)
Here we use the GPT-2 tokenizer and set the pad_token
to be the eos_token
, which is the token indicating the end of line. After that, we will tokenize the text with a function. It may take a while the first time you run it, but after that it will be cached and will be faster if you have to run the cell again.
The dataset has almost 1 million rows for training. If you have enough computing power to process all of them, you can use them all. However, in this demonstration weâre training locally on a laptop, so I’d better only use a small portion!
tokenized_datasets_split = tokenized_datasets["train"].shard(num_shards=100, index=0).train_test_split(test_size=0.2, shuffle=True) tokenized_datasets_split
Here I take only 1% of the data, and then perform train_test_split
to split the dataset into two:
DatasetDict({ train: Dataset({ features: ['id', 'text', 'token_count', 'input_ids', 'attention_mask'], num_rows: 7900 }) test: Dataset({ features: ['id', 'text', 'token_count', 'input_ids', 'attention_mask'], num_rows: 1975 }) })
Now we are ready to fine-tune the GPT-2 model.
Fine-tune a GPT model
In the next empty cell, we will set our training arguments:
from transformers import TrainingArguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=5, per_device_train_batch_size=8, per_device_eval_batch_size=8, warmup_steps=100, weight_decay=0.01, save_steps = 500, logging_steps=100, dataloader_pin_memory=False )
Most of them are pretty standard for fine-tuning a model. However, depending on your computer setup, you may want to tweak a few things:
- Batch size â Finding the optimal batch size is important, since the larger the batch size is, the faster the training goes. However, there is a limit to how much memory is available for your CPU or GPU, so you may find thereâs an upper threshold.
- Epochs â Having more epochs causes the training to take longer. You can decide how many epochs you need.
- Save steps â Save steps determine how often a checkpoint will be saved to disk. If the training is slow and there is a chance that it will stop unexpectedly, then you may want to save more often ( set this value lower).
After weâve configured our settings, we will put the trainer together in the next cell:
from transformers import Trainer, DataCollatorForLanguageModeling data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False) trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets_split['train'], eval_dataset=tokenized_datasets_split['test'], data_collator=data_collator, ) trainer.train(resume_from_checkpoint=False)
We set `resume_from_checkpoint=False`, but you can set it to `True` to continue from the last checkpoint if the training is interrupted.
After the training finishes, we will evaluate and save the model:
trainer.evaluate(tokenized_datasets_split['test']) trainer.save_model("./trained_model")
We can now use the trained model in the pipeline. Letâs switch back to `model.py`, where we have used a pipeline with a pretrained model:
from transformers import pipeline pipe = pipeline("text-generation", model="openai-community/gpt2", device="mps") print(pipe("A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?", max_new_tokens=200, pad_token_id=pipe.tokenizer.eos_token_id))
Now letâs change `model=”openai-community/gpt2″` to `model=”./trained_model”` and see what we get:
[{'generated_text': "A rectangle has a perimeter of 20 cm. If the length is 6 cm, what is the width?\nAlright, let me try to solve this problem as a student, and I'll let my thinking naturally fall into the common pitfall as described.\n\n---\n\n**Step 1: Attempting the Problem (falling into the pitfall)**\n\nWe have a rectangle with perimeter 20 cm. The length is 6 cm. We want the width.\n\nFirst, I need to find the area under the rectangle.\n\nLetâs set \\( A = 20 - 12 \\), where \\( A \\) is the perimeter.\n\n**Area under a rectangle:** \n\\[\nA = (20-12)^2 + ((-12)^2)^2 = 20^2 + 12^2 = 24\n\\]\n\nSo, \\( 24 = (20-12)^2 = 27 \\).\n\nNow, Iâll just divide both sides by 6 to find the area under the rectangle.\n"}]
Unfortunately, it still does not solve the problem. However, it did come up with some mathematical formulas and reasoning that it didnât use before. If you want, you can try fine-tuning the model a bit more with the data we didnât use.
In the next section, we will see how we can deploy a fine-tuned model to API endpoints using both the tools provided by Hugging Face and FastAPI.
Deploying a fine-tuned model
The easiest way to deploy a model in a server backend is to use FastAPI. Previously, I wrote a blog post about deploying a machine learning model with Fast API. While we wonât go into the same level of detail here, we will go over how to deploy our fine-tuned model.
With the help of Junie, weâve created some scripts which you can see here. These scripts let us deploy a server backend with FastAPI endpoints.
There are some new dependencies that we need to add:
uv add fastapi pydantic uvicorn uv sync
Letâs have a look at some interesting points in the scripts, in `main.py`:
# Initialize FastAPI app app = FastAPI( title="Text Generation API", description="API for generating text using a fine-tuned model", version="1.0.0" ) # Initialize the model pipeline try: pipe = pipeline("text-generation", model="../trained_model", device="mps") except Exception as e: # Fallback to CPU if MPS is not available try: pipe = pipeline("text-generation", model="../trained_model", device="cpu") except Exception as e: print(f"Error loading model: {e}") pipe = None
After initializing the app, the script will try to load the model into a pipeline. If a Metal GPU is not available, it will fall back to using the CPU. If you have a CUDA GPU instead of a Metal GPU, you can change `mps` to `cuda`.
# Request model class TextGenerationRequest(BaseModel): prompt: str max_new_tokens: int = 200 # Response model class TextGenerationResponse(BaseModel): generated_text: str
Two new classes are created, inheriting from Pydanticâs `BaseModel`.
We can also inspect our endpoints with the Endpoints tool window. Click on the globe next to `app = FastAPI` on line 11 and select Show All Endpoints.

We have three endpoints. Since the root endpoint is just a welcome message, we will look at the other two.
@app.post("/generate", response_model=TextGenerationResponse) async def generate_text(request: TextGenerationRequest): """ Generate text based on the provided prompt. Args: request: TextGenerationRequest containing the prompt and generation parameters Returns: TextGenerationResponse with the generated text """ if pipe is None: raise HTTPException(status_code=500, detail="Model not loaded properly") try: result = pipe( request.prompt, max_new_tokens=request.max_new_tokens, pad_token_id=pipe.tokenizer.eos_token_id ) # Extract the generated text from the result generated_text = result[0]['generated_text'] return TextGenerationResponse(generated_text=generated_text) except Exception as e: raise HTTPException(status_code=500, detail=f"Error generating text: {str(e)}")
The `/generate` endpoint collects the request prompt and generates the response text with the model.
@app.get("/health") async def health_check(): """Check if the API and model are working properly.""" if pipe is None: raise HTTPException(status_code=500, detail="Model not loaded") return {"status": "healthy", "model_loaded": True}
The `/health` endpoint checks whether the model is loaded correctly. This can be useful if the client-side application needs to check before making the other endpoint available in its UI.
In `run.py`, we use uvicorn to run the server:
import uvicorn if __name__ == "__main__": uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)
When we run this script, the server will be started at http://0.0.0.0:8000/.
After we start running the server, we can go to http://0.0.0.0:8000/docs to test out the endpoints.
We can try this with the `/generate` endpoint:
{ "prompt": "5 people give each other a present. How many presents are given altogether?", "max_new_tokens": 300 }
This is the response we get:
{ "generated_text": "5 people give each other a present. How many presents are given altogether?\nAlright, let's try to solve the problem:\n\n**Problem** \n1. Each person gives each other a present. How many presents are given altogether?\n2. How many \"gift\" are given altogether?\n\n**Common pitfall** \nAssuming that each present is a \"gift\" without considering the implications of the original condition.\n\n---\n\n### Step 1: Attempting the problem (falling into the pitfall)\n\nOkay, so I have two people giving each other a present, and I want to know how many are present. I remember that there are three types of giftsâgifts, gins, and ginses.\n\nLet me try to count how many of these:\n\n- Gifts: Letâs say there are three people giving each other a present.\n- Gins: Letâs say there are three people giving each other a present.\n- Ginses: Letâs say there are three people giving each other a present.\n\nSo, total gins and ginses would be:\n\n- Gins: \\( 2 \\times 3 = 1 \\), \\( 2 \\times 1 = 2 \\), \\( 1 \\times 1 = 1 \\), \\( 1 \\times 2 = 2 \\), so \\( 2 \\times 3 = 4 \\).\n- Ginses: \\( 2 \\times 3 = 6 \\), \\(" }

Feel free to experiment with other requests.
Conclusion and next steps
Now that you have successfully fine-tuned an LLM model like GPT-2 with a math reasoning dataset and deployed it with FastAPI, you can fine-tune a lot more of the open-source LLMs available on the Hugging Face Hub. You can experiment with fine-tuning other LLM models with either the open-source data there or your own datasets. If you want to (and the license of the original model allows), you can also upload your fine-tuned model on the Hugging Face Hub. Check out their documentation for how to do that.
One last remark regarding using or fine-tuning models with resources on the Hugging Face Hub â make sure to read the licenses of any model or dataset that you use to understand the conditions for working with those resources. Is it allowed to be used commercially? Do you need to credit the resources used?
In future blog posts, we will keep exploring more code examples involving Python, AI, machine learning, and data visualization.
In my opinion, PyCharm provides best-in-class Python support that ensures both speed and accuracy. Benefit from the smartest code completion, PEP 8 compliance checks, intelligent refactorings, and a variety of inspections to meet all your coding needs. As demonstrated in this blog post, PyCharm provides integration with the Hugging Face Hub, allowing you to browse and use models without leaving the IDE. This makes it suitable for a wide range of AI and LLM fine-tuning projects.
Python Bytes
#446 State of Python 2025
<strong>Topics covered in this episode:</strong><br> <ul> <li><em>* <a href="http://pypistats.org?featured_on=pythonbytes">pypistats.org</a> was down, is now back, and thereâs a CLI</em>*</li> <li><em>* <a href="https://blog.jetbrains.com/pycharm/2025/08/the-state-of-python-2025/?featured_on=pythonbytes">State of Python 2025</a></em>*</li> <li><em>* <a href="https://wrapt.readthedocs.io/en/develop/index.html?featured_on=pythonbytes">wrapt: A Python module for decorators, wrappers and monkey patching.</a></em>*</li> <li><strong><a href="https://pysentry.com?featured_on=pythonbytes">pysentry</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=eLBwqF-zc3I' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="446">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by us! Support our work through:</p> <ul> <li>Our <a href="https://training.talkpython.fm/?featured_on=pythonbytes"><strong>courses at Talk Python Training</strong></a></li> <li><a href="https://courses.pythontest.com/p/the-complete-pytest-course?featured_on=pythonbytes"><strong>The Complete pytest Course</strong></a></li> <li><a href="https://www.patreon.com/pythonbytes"><strong>Patreon Supporters</strong></a></li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy">@mkennedy@fosstodon.org</a> / <a href="https://bsky.app/profile/mkennedy.codes?featured_on=pythonbytes">@mkennedy.codes</a> (bsky)</li> <li>Brian: <a href="https://fosstodon.org/@brianokken">@brianokken@fosstodon.org</a> / <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes">@brianokken.bsky.social</a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes">@pythonbytes@fosstodon.org</a> / <a href="https://bsky.app/profile/pythonbytes.fm">@pythonbytes.fm</a> (bsky)</li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Brian #1: <a href="http://pypistats.org?featured_on=pythonbytes">pypistats.org</a> was down, is now back, and thereâs a CLI</strong></p> <ul> <li><p><a href="https://pypistats.org?featured_on=pythonbytes">pypistats.org</a> is a cool site to check the download stats for Python packages.</p></li> <li><p>It was <a href="https://github.com/psf/pypistats.org/issues/82?featured_on=pythonbytes">down for a while</a>, like 3 weeks?</p></li> <li><p>A couple days ago, <a href="https://fosstodon.org/@hugovk@mastodon.social/115074427645537464">Hugo van Kemenade announced that it was back up</a>.</p></li> <li><p>With some changes in stewardship</p> <ul> <li><p>â<a href="http://pypistats.org/?featured_on=pythonbytes">pypistats.org</a> is back online! đđ</p> <p>Thanks to @jezdez for suggesting the @ThePSF takes stewardship and connecting the right people, to @EWDurbin for migrating, and of course to Christopher Flynn for creating and running it for all these years!â</p></li> </ul></li> <li><p>Hugo has a CLI version, <a href="https://github.com/hugovk/pypistats?featured_on=pythonbytes">pypistats</a></p> <ul> <li>You can give it a command for what you want to search for <ul> <li>recent,overall, python_major, python_minor, system</li> </ul></li> <li>Then either a package name, a directory path, or if nothing, it will grab the current directory package via pyproject.toml or setup.cfg</li> <li>very cool</li> </ul></li> </ul> <p><strong>Michael #2: <a href="https://blog.jetbrains.com/pycharm/2025/08/the-state-of-python-2025/?featured_on=pythonbytes">State of Python 2025</a></strong></p> <ul> <li><strong>Michaelâs Themes</strong> <ul> <li>Python people use Python: 86% of respondents use Python as their main language</li> <li>We are mostly brand-new programmers: Exactly 50% of respondents have less than two years of professional coding experience</li> <li>Data science is now over half of all Python</li> <li>Most still use older Python versions despite benefits of newer releases: <a href="https://blog.jetbrains.com/pycharm/2025/08/the-state-of-python-2025/#most-still-use-older-python-versions-despite-benefits-of-newer-releases">Compelling math to make the change</a>.</li> <li>Python web devs resurgence</li> </ul></li> <li><strong>Forward-looking trends</strong> <ul> <li>Agentic AI will be wild</li> <li>Async, await, and threading are becoming core to Python</li> <li>Python GUIs and mobile are rising</li> </ul></li> <li><strong>Actionable ideas</strong> <ul> <li>Action 1: Learn uv</li> <li>Action 2: Use the latest Python</li> <li>Action 3: Learn agentic AI</li> <li>Action 4: Learn to read basic Rust</li> <li>Action 5: Invest in understanding threading</li> <li>Action 6: Remember the newbies</li> </ul></li> </ul> <p><strong>Brian #3: <a href="https://wrapt.readthedocs.io/en/develop/index.html?featured_on=pythonbytes">wrapt: A Python module for decorators, wrappers and monkey patching.</a></strong></p> <ul> <li><p>âThe aim of the <strong>wrapt</strong> module is to provide a transparent object proxy for Python, which can be used as the basis for the construction of function wrappers and decorator functions.</p> <p>An easy to use decorator factory is provided to make it simple to create your own decorators that will behave correctly in any situation they may be used.â</p></li> <li><p>Why not just use <code>functools.wraps()</code>?</p> <ul> <li>âThe <strong>wrapt</strong> module focuses very much on correctness. It therefore goes way beyond existing mechanisms such as <code>functools.wraps()</code> to ensure that decorators preserve introspectability, signatures, type checking abilities etc. The decorators that can be constructed using this module will work in far more scenarios than typical decorators and provide more predictable and consistent behaviour.â</li> </ul></li> <li><p>Thereâs a <a href="https://github.com/GrahamDumpleton/wrapt/tree/master/blog?featured_on=pythonbytes">bunch of blog posts</a> from 2014 / 2015 (and kept updated) that talk about how wrapt solves many issues with traditional ways to decorate and patch things in Python, including <a href="https://github.com/GrahamDumpleton/wrapt/blob/master/blog/01-how-you-implemented-your-python-decorator-is-wrong.md?featured_on=pythonbytes">âHow you implemented your Python decorator is wrongâ</a>.</p></li> <li><p><a href="https://wrapt.readthedocs.io/en/latest?featured_on=pythonbytes">Docs</a> are pretty good, with everything from simple wrappers to an example of building a wrapper to handle <a href="https://wrapt.readthedocs.io/en/latest/examples.html#thread-synchronization">thread synchronization</a></p></li> </ul> <p><strong>Michael #4:</strong> <a href="https://pysentry.com?featured_on=pythonbytes">pysentry</a></p> <ul> <li><p>via <a href="https://bsky.app/profile/owen7ba.bsky.social/post/3lwojcl4ycs2a?featured_on=pythonbytes">Owen Lamont</a></p></li> <li><p>Install via <code>uv tool install pysentry-rs</code></p></li> <li><p>Scan your Python dependencies for known security vulnerabilities with Rust-powered scanner.</p></li> <li><p>PySentry audits Python projects for known security vulnerabilities by analyzing dependency files (<code>uv.lock</code>, <code>poetry.lock</code>, <code>Pipfile.lock</code>, <code>pyproject.toml</code>, <code>Pipfile</code>, <code>requirements.txt</code>) and cross-referencing them against multiple vulnerability databases. It provides comprehensive reporting with support for various output formats and filtering options.</p></li> <li><p><strong>Key Features</strong>:</p> <ul> <li><p><strong>Multiple Project Formats</strong>: Supports <code>uv.lock</code>, <code>poetry.lock</code>, <code>Pipfile.lock</code>, <code>pyproject.toml</code>, <code>Pipfile</code>, and <code>requirements.txt</code> files</p></li> <li><p><strong>External Resolver Integration</strong>: Leverages <code>uv</code> and <code>pip-tools</code> for accurate requirements.txt constraint solving</p></li> <li><p><strong>Multiple Data Sources</strong>:</p> <ul> <li>PyPA Advisory Database (default)</li> <li>PyPI JSON API</li> <li>OSV.dev (Open Source Vulnerabilities)</li> </ul></li> <li><p><strong>Flexible Output for different workflows</strong>: Human-readable, JSON, SARIF, and Markdown formats</p></li> <li><p><strong>Performance Focused</strong>:</p> <ul> <li>Written in Rust for speed</li> <li>Async/concurrent processing</li> <li>Multi-tier intelligent caching (vulnerability data + resolved dependencies)</li> </ul></li> <li><p><strong>Comprehensive Filtering</strong>:</p> <ul> <li>Severity levels (low, medium, high, critical)</li> <li>Dependency scopes (main only vs all [optional, dev, prod, etc] dependencies)</li> <li>Direct vs. transitive dependencies</li> </ul></li> <li><p><strong>Enterprise Ready</strong>: SARIF output for IDE/CI integration</p></li> <li><p>I tried it on <a href="http://pythonbytes.fm">pythonbytes.fm</a> and found only one issue, sadly canât be fixed:</p> <pre><code>PYSENTRY SECURITY AUDIT ======================= SUMMARY: 89 packages scanned âą 1 vulnerable âą 1 vulnerabilities found SEVERITY: 1 LOW UNFIXABLE: 1 vulnerabilities cannot be fixed VULNERABILITIES --------------- 1. PYSEC-2022-43059 aiohttp v3.12.15 [LOW] [source: pypa-zip] AIOHTTP 3.8.1 can report a "ValueError: Invalid IPv6 URL" outcome, which can lead to a Denial of Service (DoS). NOTE:... Scan completed </code></pre></li> </ul></li> </ul> <p><strong>Extras</strong></p> <p>Michael:</p> <ul> <li>Iâve been rumbling with <a href="https://github.com/rvben/rumdl?featured_on=pythonbytes">rumdl</a>. <ul> <li>Ruben fixed one of my complaints about it with <a href="https://github.com/rvben/rumdl/issues/58?featured_on=pythonbytes">issue #58</a>.</li> <li>Config seems like it might be off. Hereâs mine <a href="https://gist.github.com/mikeckennedy/ec708e48b21d89c259eebf39e172b72c?featured_on=pythonbytes">.rumdl.toml</a>.</li> <li>Iâve been using it on the upcoming <a href="https://talkpython.fm/books/python-in-production?featured_on=pythonbytes">Talk Python in Production book</a> <ul> <li><a href="https://talkpython.fm/books/python-in-production?featured_on=pythonbytes">Read the first third online</a> and <a href="https://talkpython.fm/books/python-in-production/buy?featured_on=pythonbytes">get notified when its out</a>.</li> <li>20 or so Markdown files</li> <li>45,000 words of content</li> </ul></li> </ul></li> <li>I asked if 3.13.6 would be the last 3.13 release? <strong>No</strong>. <ul> <li><a href="https://mastodon.social/@hugovk/115051786032886280?featured_on=pythonbytes">Thanks Hugo</a>.</li> <li><a href="https://discuss.python.org/t/python-3-14-0rc2-and-3-13-7-are-go/102403?featured_on=pythonbytes">Python 3.13.7 is now out</a>.</li> </ul></li> </ul> <p><strong>Joke:</strong> <a href="https://x.com/pr0grammerhum0r/status/1956023038278840407?s=12&featured_on=pythonbytes">Marked for destruction</a></p>
Mahmoud Hashemi
What I've been up to in 2025
Been quiet around here. Time to change that!
The short version up front: Since starting a family and leaving Stripe, I've pursued the dream that brought me to Silicon Valley. I've founded a startup.
After taking some parental leave, helping found a Python non-profit, and a nice long visit back home, I was raring for a challenge. So these days, outside of family, I'm all in on something new.
Why now?
I've wanted to start my own business since building Access apps in high school. But, the reality of leaving my family and moving to study in the USA, combined with the technical and creative fulfillment of the software industry, took me on a scenic route through enterprise software, free culture, and open-source.
That very same reality has since conspired to convince me to return to my original aspirations. I've lived through some exciting times in software, but nothing like now. What better time to be building and launching my most ambitious project ever?
Full details on that are coming soon1. For now, here is a post about why.
Applications
To start my career, I worked on software infrastructure, security, observability, and developer productivity. But after eight years, around 2016, I started longing for something more human.
You can see this start to come out in The Packaging Gradient.
At a time where it seemed like everyone around me was talking about pip
, pipenv
, and PyPI
,
I couldn't help but remind people that the real end goal of software has always been the application (or even the appliance).
This impulse came to a head with APA.
Perhaps you, dear reader, have also been "lost in the sauce" of software: When you love computers and it dominates your thoughts, you might also spend most of your time thinking about the software that makes the software possible.
Don't get me wrong. Languages, libraries, compilers, devtools, we need every bit of help we can get. But I fell in love with software for its potential to effect change in the world writ large. I started eyeing product. The famous full stack.
That meant moving on from big tech, to a big startup, to a seed startup. One pandemic-fueled detour through a startup factory later, here we are. Finally, founding the startup. My own full stack.
Monetary misunderstandings
My 15+ year software engineering career can be summed up as:
- Building fintech software for pay
- Shipping open-source Python/wiki for free
Professionally enabling commerce while avoiding it in my personal time. I was young and conflicted. Truthfully, I still harbor some reservations, but I have to build what I know. I know about software and money.
"Money is the root of all evil."
If you look at the state of say, open banking in the USA, or web3isgoinggreat, or just read Money Stuff, you probably agree something's off. Money changes people. But so does the lack thereof.
I've watched more talented and deserving developers than myself befall a variety of fates. Hollowed out by monetary excess, blinded by greed, burned out by FOSS, literally working Doordash to keep the lights on. Dropping out of software completely. Shunning the world's favorite fungible has bad outcomes for individuals.
Bless my friends at Tidelift, OSTIF, and other orgs working to sustain the maintainers. Paying maintainers is a worthy battle. We just need to open more fronts to navigate what's in store.
Showing vs Telling
My favorite David Lynch (RIP) output isn't one of his films, it's this quote:
I think it perfectly captures the auteur mindset. Words are extraneous because the consummate creative expresses themselves better in their native medium.
Not that I mind words as a medium. After years of blogging and speaking, I've grown confident in my ability to tell.
But now it's time for the show.
-
For friends who can't wait a couple weeks, shoot me an email for early access. â©
August 24, 2025
Ned Batchelder
Finding unneeded pragmas
To answer a long-standing coverage.py feature request, I threw together an experiment: a tool to identify lines that have been excluded from coverage, but which were actually executed.
The program is a standalone file in the coverage.py repo. It is unsupported. I’d like people to try it to see what they think of the idea. Later we can decide what to do with it.
To try it: copy warn_executed.py from GitHub. Create a .toml file that looks something like this:
# Regexes that identify excluded lines:
warn-executed = [
"pragma: no cover",
"raise AssertionError",
"pragma: cant happen",
"pragma: never called",
]
# Regexes that identify partial branch lines:
warn-not-partial = [
"pragma: no branch",
]
These are exclusion regexes that you’ve used in your coverage runs. The program will print out any line identified by a pattern and that ran during your tests. It might be that you don’t need to exclude the line, because it ran.
In this file, none of your coverage settings or the default regexes are assumed: you need to explicitly specify all the patterns you want flagged.
Run the program with Python 3.11 or higher, giving the name of the coverage data file and the name of your new TOML configuration file. It will print the lines that might not need excluding:
$ python3.12 warn_executed.py .coverage warn.toml
The reason for a new list of patterns instead of just reading the existing coverage settings is that some exclusions are “don’t care” rather than “this will never happen.” For example, I exclude “def __repr__” because some __repr__’s are just to make my debugging easier. I don’t care if the test suite runs them or not. It might run them, so I don’t want it to be a warning that they actually ran.
This tool is not perfect. For example, I exclude “if TYPE_CHECKING:” because I want that entire clause excluded. But the if-line itself is actually run. If I include that pattern in the warn-executed list, it will flag all of those lines. Maybe I’m forgetting a way to do this: it would be good to have a way to exclude the body of the if clause while understanding that the if-line itself is executed.
Give warn_executed.py a try and comment on the issue about what you think of it.
Real Python
Quiz: Python Skill Test
đ How Strong Are Your Python Skills? đ
This quick quiz gives you a snapshot of where you stand, whether you’re just starting out with Python or have years of coding under your belt.
Test your Python skill by answering questions ranging from fundamentals to more advanced challenges. Each question is designed to test your understanding and maybe even teach you something new.
Tip: Read the Explanation for each answer and follow the included links to study up.
See where you currently place and get tips and resources to progress quickly:

Click below to start the quiz and find out!
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
August 22, 2025
Sebastian Pölsterl
scikit-survival 0.25.0 with improved documentation released
I am pleased to announce that scikit-survival 0.25.0 has been released.
This release adds support for scikit-learn 1.7, in addition to version 1.6. However, the most significant changes in this release affect the documentation. The API documentation has been completely overhauled to improve clarity and consistency. I hope this marks a significant improvement for users new to scikit-survival.
One of the biggest pain points for users seems to be understanding which metric can be used to evaluate the performance of a given estimator. The user guide now summarizes the different options.
Which Performance Metrics Exist?
The performance metrics for evaluating survival models can be broadly divided into three groups:
Concordance Index (C-index): Measures the rank correlation between predicted risk scores and observed event times. Two implementations are available in scikit-survival:
concordance_index_censored(): This implements Harrell’s estimator, which can be optimistic with high censoring.
concordance_index_ipcw(): An inverse probability of censoring weighted (IPCW) alternative that provides a less biased estimate, especially with high censoring. It is the preferred estimator of the C-Index.
Cumulative/Dynamic Area Under the ROC Curve (AUC): Extends the AUC to survival data, quantifying how well a model distinguishes subjects who experience an event by a given time from those who do not. It can handle time-dependent risk scores and is implemented in cumulative_dynamic_auc().
Brier Score: An extension of the mean squared error to right-censored data. The Brier score assesses both discrimination and calibration based on a model’s estimated survival functions. You can either compute the Brier score at specific time point(s) using brier_score() or compute an overall measure by integrating the Brier score over a range of time points via integrated_brier_score().
What Do Survival Models Predict?
Survival models can predict several quantities, depending on the model being used.
First of all, every estimator has a predict()
method,
which either returns a unit-less risk score
or the predicted time of an event.
If predictions are risk scores, higher values indicate an increased risk of experiencing an event. The scores have no unit and are only meaningful for ranking samples by their risk of experiencing an event. This is for example the case for CoxPHSurvivalAnalysis.
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.linear_model import CoxPHSurvivalAnalysis
from sksurv.metrics import concordance_index_censored
from sksurv.preprocessing import OneHotEncoder
# Load data
X, y = load_veterans_lung_cancer()
Xt = OneHotEncoder().fit_transform(X)
# Fit model
estimator = CoxPHSurvivalAnalysis().fit(Xt, y)
# Predict risk score
predicted_risk = estimator.predict(Xt)
# Evaluate risk scores
cindex = concordance_index_censored(
y["Status"], y["Survival_in_days"], predicted_risk
)
If predictions directly relate to the time point of an event, lower scores indicate shorter survival, while higher scores indicate longer survival. See for example IPCRidge.
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.linear_model import IPCRidge
from sksurv.metrics import concordance_index_censored
from sksurv.preprocessing import OneHotEncoder
# Load the data
X, y = load_veterans_lung_cancer()
Xt = OneHotEncoder().fit_transform(X)
# Fit the model
estimator = IPCRidge().fit(Xt, y)
# Predict time of an event
predicted_time = estimator.predict(Xt)
# Flip sign of predictions to obtain a risk score
cindex = concordance_index_censored(
y["Status"], y["Survival_in_days"], -1 * predicted_time
)
Both types of predictions can be evaluated by cumulative_dynamic_auc() too but not the Brier score.
While the concordance index is easy to interpret, it is not a useful measure of performance if a specific time range is of primary interest (e.g. predicting death within 2 years). This is particularly relevant for survival models that can make time-dependent predictions.
For instance,
RandomSurvivalForest,
can also predict survival functions (via predict_survival_function()
)
or cumulative hazard functions (via predict_cumulative_hazard_function()
).
These functions return lists of
StepFunction instances.
Each instance can be evaluated at a set of time points to obtain predicted
survival probabilities (or cumulative hazards).
The Brier score and
cumulative_dynamic_auc()
are capable of evaluating time-dependent predictions, but not the C-Index.
import numpy as np
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.ensemble import RandomSurvivalForest
from sksurv.metrics import integrated_brier_score
from sksurv.preprocessing import OneHotEncoder
# Load the data
X, y = load_veterans_lung_cancer()
Xt = OneHotEncoder().fit_transform(X)
# Fit the model
estimator = RandomSurvivalForest().fit(Xt, y)
# predict survival functions
surv_funcs = estimator.predict_survival_function(Xt)
# select time points to evaluate performance at
times = np.arange(7, 365)
# create predictions at selected time points
preds = np.asarray(
[[sfn(t) for t in times] for sfn in surv_funcs]
)
# compute integral
score = integrated_brier_score(y, y, preds, times)
For more details on evaluating survival models, please have a look at the user guide and the API documentation.