Planet Python
Last update: March 25, 2025 04:42 PM UTC
March 25, 2025
Real Python
What Can You Do With Python?
You’ve finished a course or finally made it to the end of a book that teaches you the basics of programming with Python. You’ve learned about variables, lists, tuples, dictionaries, for
and while
loops, conditional statements, object-oriented concepts, and more. So, what’s next? What can you do with Python nowadays?
Python is a versatile programming language with many use cases in a variety of different fields. If you’ve grasped the basics of Python and are itching to build something with the language, then it’s time to figure out what your next step should be.
In this video course, you’ll see how you can use Python for:
- Doing general software development
- Diving into data science and math
- Speeding up and automating your workflow
- Building embedded systems and robots
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Hugo van Kemenade
Free-threaded Python on GitHub Actions
GitHub Actions now supports experimental free-threaded CPython!
There are three ways to add it to your test matrix:
- actions/setup-python:
t
suffix - actions/setup-uv:
t
suffix - actions/setup-python:
freethreaded
variable
actions/setup-python: t
suffix #
Using actions/setup-python, you
can add the t
suffix for Python versions 3.13 and higher: 3.13t
and 3.14t
.
This is my preferred method, we can clearly see which versions are free-threaded and it’s straightforward to test both regular and free-threaded builds.
on: [push, pull_request, workflow_dispatch]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: [
"3.13",
"3.13t", # add this!
"3.14",
"3.14t", # add this!
]
os: ["windows-latest", "macos-latest", "ubuntu-latest"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true # needed for 3.14
- run: |
python --version --version
python -c "import sys; print('sys._is_gil_enabled:', sys._is_gil_enabled())"
python -c "import sysconfig; print('Py_GIL_DISABLED:', sysconfig.get_config_var('Py_GIL_DISABLED'))"
Regular builds will output something like:
Python 3.14.0a6 (main, Mar 17 2025, 02:44:29) [GCC 13.3.0]
sys._is_gil_enabled: True
Py_GIL_DISABLED: 0
And free-threaded builds will output something like:
Python 3.14.0a6 experimental free-threading build (main, Mar 17 2025, 02:44:30) [GCC 13.3.0]
sys._is_gil_enabled: False
Py_GIL_DISABLED: 1
For example: hugovk/test/actions/runs/14057185035
actions/setup-uv: t
suffix #
Similarly, you can install uv with
astral/setup-uv and use that to
set up free-threaded Python using the t
suffix.
on: [push, pull_request, workflow_dispatch]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: [
"3.13",
"3.13t", # add this!
"3.14",
"3.14t", # add this!
]
os: ["windows-latest", "macos-latest", "ubuntu-latest"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: astral-sh/setup-uv@v5 # change this!
with:
python-version: ${{ matrix.python-version }}
enable-cache: false # only needed for this example with no dependencies
- run: |
python --version --version
python -c "import sys; print('sys._is_gil_enabled:', sys._is_gil_enabled())"
python -c "import sysconfig; print('Py_GIL_DISABLED:', sysconfig.get_config_var('Py_GIL_DISABLED'))"
For example: hugovk/test/actions/runs/13967959519
actions/setup-python: freethreaded
variable #
Back to actions/setup-python, you can also set the freethreaded
variable for 3.13
and higher.
on: [push, pull_request, workflow_dispatch]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: ["3.13", "3.14"]
os: ["windows-latest", "macos-latest", "ubuntu-latest"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true # needed for 3.14
freethreaded: true # add this!
- run: |
python --version --version
python -c "import sys; print('sys._is_gil_enabled:', sys._is_gil_enabled())"
python -c "import sysconfig; print('Py_GIL_DISABLED:', sysconfig.get_config_var('Py_GIL_DISABLED'))"
For example: hugovk/test/actions/runs/39359291708
PYTHON_GIL=0
#
And you may want to set PYTHON_GIL=0
to force Python to keep the GIL disabled, even
after importing a module that doesn’t support running without it.
See Running Python with the GIL Disabled for more info.
With the t
suffix:
- name: Set PYTHON_GIL
if: endsWith(matrix.python-version, 't')
run: |
echo "PYTHON_GIL=0" >> "$GITHUB_ENV"
With the freethreaded
variable:
- name: Set PYTHON_GIL
if: "${{ matrix.freethreaded }}"
run: |
echo "PYTHON_GIL=0" >> "$GITHUB_ENV"
Please test! #
For free-threaded Python to succeed and become the default, it’s essential there is ecosystem and community support. Library maintainers: please test it and where needed, adapt your code, and publish free-threaded wheels so others can test their code that depends on yours. Everyone else: please test your code too!
See also #
- Help us test free-threaded Python without the GIL for other ways to test and how to check your build
- Python free-threading guide
- actions/setup-python#973
- actions/setup-python@v5.5.0
Header photo: “Spinning Room, Winding Bobbins with Woolen Yarn for Weaving, Philadelphia, PA” by Library Company of Philadelphia, with no known copyright restrictions.
Seth Michael Larson
Don't bring slop to a slop fight
Whenever I talk about generative AI slop being sent into every conceivable communication platform I see a common suggestion on how to stop the slop from reaching human eyes:
âJust use AI to detect the AIâ
We're already seeing companies offer this arrangement as a service. Just a few days ago Cloudflare announced they would use generative AI to create an infinite "labyrinth" for trapping AI crawlers in pages of content and links.
This suggestion is flawed because doing so props up the real problem: generative AI is heavily subsidized. In reality generative AI is so expensive we're talking about restarting nuclear and coal power plants and reopening copper mines, people. There is no universe that this service should allow users to run queries without even a credit card on file.
Today this subsidization is mostly done by venture capital who want to see the technology integrated into as many verticals as possible. The same strategy was used for Uber and WeWork where venture capital allowed those companies to undercut competition to have wider adoption and put competitors out of business.
So using AI to detect and filter AI content just means that there'll be even more generative AI in use, not less. This isn't the signal we want to send to the venture capitalists who are deciding whether to offer these companies more investment money. We want that "monthly active user" (MAU) graph to be flattening or decreasing.
We got a sneak peek at the real price of generative AI from OpenAI where a future top-tier model (as of March 5th, 2025) is supposedly going to be $20,000 USD per month.
That's sounds more like it. The sooner we get to unsubsidized generative AI pricing the better we'll all be, including the planet. So let's hold out for that future and think asymmetrically, not symmetrically, on methods to make generative AI slop not viable until we get there.
March 24, 2025
Real Python
Python Code Quality: Best Practices and Tools
Producing high-quality Python code involves using appropriate tools and consistently applying best practices. High-quality code is functional, readable, maintainable, efficient, and secure. It adheres to established standards and has excellent documentation.
You can achieve these qualities by following best practices such as descriptive naming, consistent coding style, modular design, and robust error handling. To help you with all this, you can use tools such as linters, formatters, and profilers.
By the end of this tutorial, youâll understand that:
- Checking the quality of Python code involves using tools like linters and static type checkers to ensure adherence to coding standards and detect potential errors.
- Writing quality code in Python requires following best practices, such as clear naming conventions, modular design, and comprehensive testing.
- Good Python code is characterized by readability, maintainability, efficiency, and adherence to standards like PEP 8.
- Making Python code look good involves using formatters to ensure consistent styling and readability, aligning with established coding styles.
- Making Python code readable means using descriptive names for variables, functions, classes, modules, and packages.
Read on to learn more about the strategies, tools, and best practices that will help you write high-quality Python code.
Get Your Code: Click here to download the free sample code that youâll use to learn about Python code quality best practices and tools.
Take the Quiz: Test your knowledge with our interactive âPython Code Quality: Best Practices and Toolsâ quiz. Youâll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Code Quality: Best Practices and ToolsIn this quiz, you'll test your understanding of Python code quality, tools, and best practices. By working through this quiz, you'll revisit the importance of producing high-quality Python code that's functional, readable, maintainable, efficient, and secure.
Defining Code Quality
Of course you want quality code. Who wouldnât? But what is code quality? It turns out that the term can mean different things to different people.
One way to approach code quality is to look at the two ends of the quality spectrum:
- Low-quality code: It has the minimal required characteristics to be functional.
- High-quality code: It has all the necessary characteristics that make it work reliably, efficiently, and effectively, while also being straightforward to maintain.
In the following sections, youâll learn about these two quality classifications and their defining characteristics in more detail.
Low-Quality Code
Low-quality code typically has only the minimal required characteristics to be functional. It may not be elegant, efficient, or easy to maintain, but at the very least, it meets the following basic criteria:
- It does what itâs supposed to do. If the code doesnât meet its requirements, then it isnât quality code. You build software to perform a task. If it fails to do so, then it canât be considered quality code.
- It doesnât contain critical errors. If the code has issues and errors or causes you problems, then you probably wouldnât call it quality code. If itâs too low-quality and becomes unusable, then if falls below even basic quality standards and you may stop using it altogether.
While simplistic, these two characteristics are generally accepted as the baseline of functional but low-quality code. Low-quality code may work, but it often lacks readability, maintainability, and efficiency, making it difficult to scale or improve.
High-Quality Code
Now, hereâs an extended list of the key characteristics that define high-quality code:
- Functionality: Works as expected and fulfills its intended purpose.
- Readability: Is easy for humans to understand.
- Documentation: Clearly explains its purpose and usage.
- Standards Compliance: Adheres to conventions and guidelines, such as PEP 8.
- Reusability: Can be used in different contexts without modification.
- Maintainability: Allows for modifications and extensions without introducing bugs.
- Robustness: Handles errors and unexpected inputs effectively.
- Testability: Can be easily verified for correctness.
- Efficiency: Optimizes time and resource usage.
- Scalability: Handles increased data loads or complexity without degradation.
- Security: Protects against vulnerabilities and malicious inputs.
In short, high-quality code is functional, readable, maintainable, and robust. It follows best practices, including clear naming, consistent coding style, modular design, proper error handling, and adherence to coding standards. Itâs also well-documented and easy to test and scale. Finally, high-quality code is efficient and secure, ensuring reliability and safe use.
All the characteristics above allow developers to understand, modify, and extend a Python codebase with minimal effort.
The Importance of Code Quality
To understand why code quality matters, youâll revisit the characteristics of high-quality code from the previous section and examine their impact:
- Functional code: Ensures correct behavior and expected outcomes.
- Readable code: Makes understanding and maintaining code easier.
- Documented code: Clarifies the correct and recommended way for others to use it.
- Compliant code: Promotes consistency and allows collaboration.
- Reusable code: Saves time by allowing code reuse.
- Maintainable code: Supports updates, improvements, and extensions with ease.
- Robust code: Minimizes crashes and produces fewer edge-case issues.
- Testable code: Simplifies verification of correctness through code testing.
- Efficient code: Runs faster and conserves system resources.
- Scalable code: Supports growing projects and increasing data loads.
- Secure code: Provides safeguards against system loopholes and compromised inputs.
The quality of your code matters because it produces code thatâs easier to understand, modify, and extend over time. It leads to faster debugging, smoother feature development, reduced costs, and better user satisfaction while ensuring security and scalability.
Read the full article at https://realpython.com/python-code-quality/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Quiz: Python Code Quality: Best Practices and Tools
In this quiz, you’ll test your understanding of Python Code Quality: Tools & Best Practices.
By working through this quiz, you’ll revisit the importance of producing high-quality Python code that’s functional, readable, maintainable, efficient, and secure. You’ll also review how to use tools such as linters, formatters, and profilers to help achieve these qualities.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Talk Python to Me
#498: Algorithms for high performance terminal apps
In this episode, we welcome back Will McGugan, the creator of the wildly popular Rich library and founder of Textualize. We'll dive into Will's latest article on "Algorithms for High Performance Terminal Apps" and explore how he's quietly revolutionizing what's possible in the terminal, from smooth animations and dynamic widgets to full-on TUI (or should we say GUI?) frameworks. Whether you're looking to supercharge your command-line tools or just curious how Python can push the limits of text-based UIs, you'll love hearing how Will's taking a modern, web-inspired approach to old-school terminals.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/ppm'>Posit</a><br> <a href='https://talkpython.fm/devopsbook'>Python in Production</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <h2 class="links-heading">Links from the show</h2> <div><strong>Algorithms for high performance terminal apps post</strong>: <a href="https://textual.textualize.io/blog/2024/12/12/algorithms-for-high-performance-terminal-apps/?featured_on=talkpython" target="_blank" >textual.textualize.io</a><br/> <strong>Textual Demo</strong>: <a href="https://github.com/textualize/textual-demo?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Textual</strong>: <a href="https://www.textualize.io/?featured_on=talkpython" target="_blank" >textualize.io</a><br/> <strong>Zero ver</strong>: <a href="https://0ver.org/?featured_on=talkpython" target="_blank" >0ver.org</a><br/> <strong>memray</strong>: <a href="https://github.com/bloomberg/memray?tab=readme-ov-file&featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Posting app</strong>: <a href="https://posting.sh/?featured_on=talkpython" target="_blank" >posting.sh</a><br/> <strong>Bulma CSS framewokr</strong>: <a href="https://bulma.io/?featured_on=talkpython" target="_blank" >bulma.io</a><br/> <strong>JP Term</strong>: <a href="https://davidbrochart.github.io/jpterm/usage/CLI/?featured_on=talkpython" target="_blank" >davidbrochart.github.io</a><br/> <strong>Rich</strong>: <a href="https://github.com/Textualize/rich?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>btop</strong>: <a href="https://github.com/aristocratos/btop?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>starship</strong>: <a href="https://starship.rs/?featured_on=talkpython" target="_blank" >starship.rs</a><br/> <strong>Watch this episode on YouTube</strong>: <a href="https://www.youtube.com/watch?v=S3oFhJKS264" target="_blank" >youtube.com</a><br/> <strong>Episode transcripts</strong>: <a href="https://talkpython.fm/episodes/transcript/498/algorithms-for-high-performance-terminal-apps" target="_blank" >talkpython.fm</a><br/> <br/> <strong>--- Stay in touch with us ---</strong><br/> <strong>Subscribe to Talk Python on YouTube</strong>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <strong>Talk Python on Bluesky</strong>: <a href="https://bsky.app/profile/talkpython.fm" target="_blank" >@talkpython.fm at bsky.app</a><br/> <strong>Talk Python on Mastodon</strong>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <strong>Michael on Bluesky</strong>: <a href="https://bsky.app/profile/mkennedy.codes?featured_on=talkpython" target="_blank" >@mkennedy.codes at bsky.app</a><br/> <strong>Michael on Mastodon</strong>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Python Bytes
#425 If You Were a Klingon Programmer
<strong>Topics covered in this episode:</strong><br> <ul> <li><strong><a href="https://x.com/mitsuhiko/status/1899928805742899231?featured_on=pythonbytes">Why aren't you using uv</a>?</strong></li> <li><strong><a href="https://pydevtools.com/handbook/?featured_on=pythonbytes">Python Developer Tooling Handbook</a></strong></li> <li><a href="https://github.com/adamchainz/blacken-docs?featured_on=pythonbytes"><strong>Calling all doc writers: blacken-docs</strong></a></li> <li><strong><a href="https://marimo.io/blog/python-not-json?_bhlid=137e05f1384ff987aef74d01decfeb08d76910c7&featured_on=pythonbytes">Reinventing notebooks as reusable Python programs</a></strong></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=cps-wnsRte8' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="425">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Brought to you by <a href="https://pythonbytes.fm/connect"><strong>Posit Connect: pythonbytes.fm/connect</strong></a>.</p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/mkennedy.codes?featured_on=pythonbytes"><strong>@mkennedy.codes</strong></a> <strong>(bsky)</strong></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes"><strong>@brianokken.bsky.social</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a> <strong>/</strong> <a href="https://bsky.app/profile/pythonbytes.fm"><strong>@pythonbytes.fm</strong></a> <strong>(bsky)</strong></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Michael #1:</strong> <a href="https://x.com/mitsuhiko/status/1899928805742899231?featured_on=pythonbytes">Why aren't you using uv</a>?</p> <ul> <li>Fun conversation on X by <a href="https://x.com/mitsuhiko?featured_on=pythonbytes">Armin Ronacher</a>.</li> <li>Interesting quotes from the thread <ul> <li>I get it replaces pip/pyenv, but should I also use it instead of the built in 'python -m venv .venv'?</li> <li>But I need python installed to make python programs?</li> <li>Because it places the venv in the project folder and I can't run executables from there due to corporate policy. Many such cases. No idea why astral doesn't address this with more urgency. <ul> <li>Sounds like a bad corporate policy :)</li> </ul></li> <li>iâm too lazy to switch from pyenv and pip</li> <li>trust issues, what if they do a bait and switch âŠ</li> <li>Because everyone said that about poetry and I'm not sure I'm really ready to get hurt again.</li> <li>Masochism</li> <li>Many times I tried a lot of similar tools and always come back to pip and pip-tools. Them are just work, why should I spend my time for something "cool" that will bring more problems?</li> <li>I tried this week but I was expecting a "uv install requests" instead of "uv add". Switched back to pipenv.</li> <li>we partially use it. will transition when Dependabot support is available.</li> </ul></li> <li>Iâll leave it with â Jared Scheel: Seeing a whole lotta Stockholm Syndrome in the replies to this question.</li> </ul> <p><strong>Brian #2:</strong> <a href="https://pydevtools.com/handbook/?featured_on=pythonbytes">Python Developer Tooling Handbook</a></p> <ul> <li>Tim Hopper</li> <li>âThis is not a book about programming Python. Instead, the goal of this book is to help you understand the ecosystem of tools used to make Python development easier and more productiveâ</li> <li>Covers tools related to packaging, linting, formatting, and managing dependencies. </li> </ul> <p><strong>Michael #3:</strong> <a href="https://github.com/adamchainz/blacken-docs?featured_on=pythonbytes"><strong>Calling all doc writers: blacken-docs</strong></a></p> <ul> <li>Run <code>black</code> on python code blocks in documentation files </li> <li>You can also install blacken-docs as a <a href="https://pre-commit.com/?featured_on=pythonbytes">pre-commit</a> hook.</li> <li>It supports Markdown, reStructuredText, and LaTex files. </li> <li>Additionally, you can run it on Python files to reformat Markdown and reStructuredText within docstrings.</li> </ul> <p><strong>Brian #4:</strong> <a href="https://marimo.io/blog/python-not-json?_bhlid=137e05f1384ff987aef74d01decfeb08d76910c7&featured_on=pythonbytes">Reinventing notebooks as reusable Python programs</a></p> <ul> <li>marimo allows you to store notebooks as plaintext Python files</li> <li>properties <ul> <li>Git-friendly: small code change => small diff</li> <li>easy for both humans and computers to read</li> <li>importable as a Python module, without executing notebook cells</li> <li>executable as a Python script</li> <li>editable with a text editor</li> </ul></li> <li>Also, ⊠testing with pytest</li> <li>âBecause marimo notebooks are just Python files, they are interoperable with other tools for Python â including pytest. â</li> <li>â<strong>Testing cells.</strong> Any cell named as test_* is automatically discoverable and testable by pytest. The same goes for any cell that contains only test_ functions and Test classes.â</li> <li>âImportantly, because cells are wrapped in functions, running pytest test_notebook.py doesnât execute the entire notebook â just its tests.â</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li>PyConUS announces <a href="https://pycon.blogspot.com/2025/03/refund-policy-for-international.html?featured_on=pythonbytes">Refund Policy for International Attendees</a></li> <li>New format now live for <a href="https://courses.pythontest.com?featured_on=pythonbytes">The Complete pytest Course Bundle</a> and component courses <ul> <li>Each course now available separately also <ul> <li><a href="https://courses.pythontest.com/pytest-primary-power?featured_on=pythonbytes">pytest Primary Power</a> is 13 lessons, 3.9 hours</li> <li><a href="https://courses.pythontest.com/using-pytest-with-projects?featured_on=pythonbytes">Using pytest with Projects</a>, 10 lessons, 3.4 hours</li> <li><a href="https://courses.pythontest.com/pytest-booster-rockets?featured_on=pythonbytes">pytest Booster Rockets</a>, 6 lessons, 1.3 hours of content</li> </ul></li> <li>New format is easier to navigate</li> <li>Better for people who like different speeds. Iâm usually a 1.25x-1.5x speed person.</li> <li>Now also with Congratulations! lessons (with fireworks) and printable certificates.</li> </ul></li> </ul> <p>Michael:</p> <ul> <li><a href="https://tw.pycon.org/2025/en-us/speaking/cfp?featured_on=pythonbytes">PyCon Taiwan is currently calling for proposals</a></li> <li>HN trends follow up via Shinjitsu</li> </ul> <p>I'm sure some other Hacker News reader has already given you the feedback, but in the unlikely case that they haven't, You read those headlines in this segment exactly wrong.</p> <p>âAsk HN: Who is hiring?" is a monthly post that asks employers to post about jobs they have available</p> <p>âAsk HN: Who wants to be hired?â is a monthly topic where they ask people who are looking for jobs to post about themselves in the hope that their skillset it is a good match (and not an LLM generated resume)</p> <p>So unfortunately your rosy analysis might need a less rosy interpretation.</p> <p><strong>Joke:</strong> </p> <ul> <li><a href="https://www.cs.cornell.edu/courses/cs100/1999su/handouts/klingons.htm?featured_on=pythonbytes">Top 12 things likely to be overheard if you had a Klingon Programmer</a> <ul> <li>From Holgi on Mastodon</li> </ul></li> </ul>
PyBites
Case Study: Developing and Testing Python Packages with uv
Structuring Python projects properly, especially when developing packages, can often be confusing.
Many developers struggle with common questions:
- How exactly should project folders be organised?
- Should tests be inside or outside the package directory?
- Does the package itself belong at the root level or in a special src directory?
- And how do you properly import and test package functionality from scripts or external test files?
To help clarify these common challenges, I’ll show how I typically set up Python projects and organise package structures using the Python package and environment manager, uv.
The challenge
A typical and recurring problem in Python is how to import code that lives in a different place from where it is called. There are two natural ways to organise your code: modules and packages.
Things are fairly straightforward in the beginning, when you start to organise your code and put some functionality into different modules (aka Python files), but keep all the files in the same directory. This works because Python looks in several places when it resolves import statements, and one of those places is the current working directory, where all the modules are:
$ tree
.
âââ main.py
âââ utils.py
from utils import helper
print(helper())
def helper():
return "I am a helper function"
$ python main.py
I am a helper function
But things get a bit tricky once you have enough code and decide to organise your code into folders. Let’s say you’ve moved your helper code into a src
directory and you still want to import it into main.py
, which is outside the src
folder. Will this work? Well, that depends⊠on your Python version!
With Python 3.3 or higher, you will not see any error:
$ tree
.
âââ main.py
âââ src
âââ utils.py
from src.utils import helper
print(helper())
def helper():
return "I am a helper function"
$ python main.py
I am a helper function
But if we run the same example using a Python version prior to 3.3, we will encounter the infamous ModuleNotFoundError
. This error only occurs prior 3.3 because Python 3.3 introduced implicit namespace packages. As a result, Python can treat directories without an __init__.py
as packages when used in import statementsâunder certain conditions. I wonât go into further detail here, but you can learn more about namespace packages in PEP 420.
Since namespace packages behave slightly differently from standard packages, we should explicitly include an __init__.py
file.
However, does this solve all our problems? For this one case, maybe. But, once we move away from the simple assumption that all caller modules reside in the root directory, we encounter the next issue:
$ tree
.
âââ scripts
â  âââ main.py
âââ src
  âââ __init__.py
  âââ utils.py
from src.utils import helper
print(helper()) # I am a helper function
def helper():
return "I am a helper function"
$ uv run python scripts/main.py
Traceback (most recent call last):
File "/Users/miay/uv_test/default/scripts/main.py", line 2, in <module>
from src.utils import helper
ModuleNotFoundError: No module named 'src'
You can solve this problem with some path manipulation, but this is fragile and also frowned upon. I will spend the rest of this article giving you a recipe and some best practices for solving this problem with uv
in such a way that you will hopefully never have to experience this problem againâŠ
What is uv
?
uv
is a powerful and fast Python package and project manager designed as a successor to traditional tools like pip
and virtualenv
. I’ve found it particularly helpful in providing fast package resolution and seamless environment management, improving my overall development efficiency. It is the closest thing we have to a unified and community accepted way of setting up and managing Python projects.
Astral is doing a great job of both respecting and serving the community, and I hope that we finally have a good chance of putting the old turf wars about the best Python tool behind us and concentrating on the things that really matter: Writing good code!
Setting Up Your Python Project Using uv
Step 1: Installation
The entry point to uv
is quite simple. First, you follow the installation instructions provided by the official documentation for your operating system.
In my case, as I am on MacOS and like to use Homebrew wherever possible, it comes down to a single command: brew install uv
.
And the really great thing here about uv
is: You don’t need to install Python first to install uv
! You can install uv
in an active Python environment using pip install uv
, but you shouldn’t. The reason is simple: uv
is written in Rust and is meant to be a self-contained tool. You do not want to be dependent on a specific Python version and you want to use uv
without the overhead or potential conflicts introduced by pip
âs Python ecosystem.
It is much better to have uv
available as a global command line tool that lives outside your Python setup. And that is what you get by installing it via curl
, wget
, brew
or any other way except using pip
.
Step 2: Creating a Project
Instead of creating folders manually, I use the uv init
command to efficiently set up my project. This command already has a lot of helpful parameters, so let’s try them out to understand what the different options mean for the actual project structure.
The basic command
$ uv init
sets up your current folder as a Python project. What does this mean? Well, create an empty folder and try it out. After that you can always run the tree
command (if that is available to you) to see a nice tree view of your project structure. In my case, I get the following output for an empty folder initialized with uv
:
$ tree -a -L 1
.
âââ .git
âââ .gitignore
âââ .python-version
âââ README.md
âââ hello.py
âââ pyproject.toml
âââ uv.lock
As you can see, uv init
has already taken care of a number of things and provided us with a good starting point. Specifically, it did the following:
- Initialising a git repository,
- Recording the current Python version with a
.python-version
file, which will be used by other developers (or our future selves) to initialise the project with the same Python version as we have used, - Creating a
README.md
file, - Providing a starting point for development with the
hello.py
module, - Providing a way to manage your project’s metadata with the
pyproject.toml
file, and finally, - Resolving the exact versions of the dependencies with the
uv.lock
file (currently mostly empty, but it does contain some information about the required Python version).
The pyproject.toml
file is special in a number of ways. One important thing to know is that uv
recognises a uv
-managed project by detecting and inspecting the pyproject.toml
file. As always, check the documentation for more information.
Let’s understand the uv init
command a little better. As I said, uv init
initialises a Python project in the current working directory as the root folder for the project. You can specify a project folder with uv init <project-name>
, which will give you a new folder in your current working directory named after your project and with the same contents as we discussed earlier.
However, as I want to develop a package, uv
directly supports this use case directly with the --package
option:
$ uv init --package my_package
Initialized project `my-package` at `/Users/miay/uv_test/my_package`
Looking again at the project structure (I will not include hidden files and folders from now on),
$ tree
.
âââ README.md
âââ pyproject.toml
âââ src
âââ my_package
âââ __init__.py
there is an interesting change to the uv init
command without the --package
option: Instead of a hello.py
module in the project’s root directory, we have an src
folder containing a Python package named after our project (because of the __init__.py
file, I guess you remember that about packages, don’t you?).
uv
does one more thing so let’s have a look at the pyproject.toml
file:
[project]
name = "my-package"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
authors = [
{ name = "Michael Aydinbas", email = "michael.aydinbas@gmail.com" }
]
requires-python = ">=3.13"
dependencies = []
[project.scripts]
my-package = "my_package:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
As well as the “usual” things you would expect to see here, there is also a build-system section, which is only present when using the --package
option, which instructs uv
to install the code under my_package
into the virtual environment managed by uv
. In other words, uv
will install your package and all its dependencies into the virtual environment so that it can be used by all other scripts and modules, wherever they are.
Let’s summarise the different options. However, for our purposes, we will always use the --package
option when working with packages.
Option | Description |
--app | Create a project for an application. Seems to be the default if nothing else is mentioned. |
--bare | Only create a pyproject.toml and nothing else. |
--lib | Create a project for a library. A library is a project that is intended to be built and distributed as a Python package. Works similar as --package but creates an additional py.typed marker file, used to support typing in a package (based on PEP 561). |
--package | Set up the project to be built as a Python package. Defines a [build-system] for the project.This is the default behavior when using --lib or --build-backend .Includes a [project.scripts] entrypoint and use a src/ project structure. |
--script | Add the dependency to the specified Python script, rather than to a project. |
Step 3: Managing the Virtual Environment
This section is quite short.
You can create the virtual environment with all defined dependencies and the source code developed under src
by running
$ uv venv
Using CPython 3.13.2 interpreter at: /opt/homebrew/Caskroom/miniforge/base/bin/python
Creating virtual environment at: .venv
Activate with: source .venv/bin/activate
You can also run uv sync
, which will also create the virtual environment if there is none, and synchronise the environment with the latest dependencies from the pyproject.toml
file on top.
Step 4: Testing Package Functionalities
Congratulations, you’ve basically reached the point from where it doesn’t really matter where you want to import the code you’re developing under my_package
, because it’s installed in the virtual environment and thus known to all Python modules, no matter where they are in the file system. Your package is basically like any other third-party or system package.
To see this in action, let’s demonstrate the two main use cases: testing our package with pytest
and importing our package into some scripts in the scripts
folder. First, the project structure:
$ tree
.
âââ scripts
â  âââ main.py
âââ src
â  âââ my_package
â  âââ __init__.py
â  âââ utils.py
âââ tests
âââ test_utils.py
from my_package.utils import helper
def main():
print(helper())
if __name__ == "__main__":
main()
def helper():
return "helper"
from my_package.utils import helper
def test_helper():
assert helper() == "helper"
Note what happens when you create or update your venv:
$ uv sync
Resolved 1 package in 9ms
Built my-package @ file:///Users/miay/uv_test/my_package
Prepared 1 package in 514ms
Installed 1 package in 1ms
+ my-package==0.1.0 (from file:///Users/miay/uv_test/my_package)
See that last line? So uv
installs the package my-package
along with all other dependencies and thus it can be used in any other Python script or module just like any other dependency or library.
As a first step, I am adding pytest
as a dev dependency, meaning it will be added in a separate section in the pyproject.toml
file as it is meant for development and not needed to run the actual package. You will notice that uv add
is not only adding the dependency but also installing it at the same time. With pytest
being available, we can run the tests:
$ uv add --dev pytest
$ uv run pytest
============================= test session starts ==============================
platform darwin -- Python 3.13.2, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/miay/uv_test/my_package
configfile: pyproject.toml
collected 1 item
tests/test_utils.py . [100%]
============================== 1 passed in 0.01s ===============================
Running scripts/main.py
:
$ uv run scripts/main.py
helper
Everything works the same and effortlessly.
What happens if I update my package code? Do I have to run uv sync
again? Actually, no. It turns out that uv sync
installs the package in the same way as uv pip install -e .
would.
Editable install means that changes to the source directory will immediately affect the installed package, without the need to reinstall. Just change your source code and try it out again:
def helper():
return "helper function"
$ uv run scripts/main.py
helper function
$ uv run pytest
============================= test session starts ==============================
platform darwin -- Python 3.13.2, pytest-8.3.5, pluggy-1.5.0
rootdir: /Users/miay/uv_test/my_package
configfile: pyproject.toml
collected 1 item
tests/test_utils.py F [100%]
=================================== FAILURES ===================================
_________________________________ test_helper __________________________________
def test_helper():
> assert helper() == "helper"
E AssertionError: assert 'helper function' == 'helper'
E
E - helper
E + helper function
tests/test_utils.py:5: AssertionError
The main script returning successfully the new return value of the helper()
function of the utils
module in the my_package
package without the need to reinstall the package first. Likewise, the tests are failing now.
On a final mark: Can you use uv
with a monorepo setup? This means that uv
is used to manage several packages in the same repository. It seems possible, although I have not tried it out, but there are good resources on how to get started using the concept of workspaces.
I hope you have enjoyed following me on this little journey and, as always, I welcome your comments, discussions and questions.
Keep Calm and Code in Python!
Wingware
Wing Python IDE Version 10.0.9 - March 24, 2025
Wing 10.0.9 fixes usability issues with AI supported development, selects text and shows callouts when visiting Find Uses matches, adds User Interface > Fonts > Editor Line Spacing preference, avoids spurious syntax errors on type annotation comments, increases allowed length for Image ID in the Container dialog, and fixes other minor bugs and usability issues.
See the change log for details.
Download Wing 10 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products
What's New in Wing 10

AI Assisted Development
Wing Pro 10 takes advantage of recent advances in the capabilities of generative AI to provide powerful AI assisted development, including AI code suggestion, AI driven code refactoring, description-driven development, and AI chat. You can ask Wing to use AI to (1) implement missing code at the current input position, (2) refactor, enhance, or extend existing code by describing the changes that you want to make, (3) write new code from a description of its functionality and design, or (4) chat in order to work through understanding and making changes to code.
Examples of requests you can make include:
"Add a docstring to this method" "Create unit tests for class SearchEngine" "Add a phone number field to the Person class" "Clean up this code" "Convert this into a Python generator" "Create an RPC server that exposes all the public methods in class BuildingManager" "Change this method to wait asynchronously for data and return the result with a callback" "Rewrite this threaded code to instead run asynchronously"
Yes, really!
Your role changes to one of directing an intelligent assistant capable of completing a wide range of programming tasks in relatively short periods of time. Instead of typing out code by hand every step of the way, you are essentially directing someone else to work through the details of manageable steps in the software development process.
Support for Python 3.12, 3.13, and ARM64 Linux
Wing 10 adds support for Python 3.12 and 3.13, including (1) faster debugging with PEP 669 low impact monitoring API, (2) PEP 695 parameterized classes, functions and methods, (3) PEP 695 type statements, and (4) PEP 701 style f-strings.
Wing 10 also adds support for running Wing on ARM64 Linux systems.
Poetry Package Management
Wing Pro 10 adds support for Poetry package management in the New Project dialog and the Packages tool in the Tools menu. Poetry is an easy-to-use cross-platform dependency and package manager for Python, similar to pipenv.
Ruff Code Warnings & Reformatting
Wing Pro 10 adds support for Ruff as an external code checker in the Code Warnings tool, accessed from the Tools menu. Ruff can also be used as a code reformatter in the Source > Reformatting menu group. Ruff is an incredibly fast Python code checker that can replace or supplement flake8, pylint, pep8, and mypy.
Try Wing 10 Now!
Wing 10 is a ground-breaking new release in Wingware's Python IDE product line. Find out how Wing 10 can turbocharge your Python development by trying it today.
Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products
See Upgrading for details on upgrading from Wing 9 and earlier, and Migrating from Older Versions for a list of compatibility notes.
March 23, 2025
Go Deh
Incremental combinations without caching
Irie server room |
Someone had a problem where they received initial data d1, worked on all r combinations of the data initially received, but by the time they had finished that, they checked and found there was now extra data d2, and they need to, in total, process the r combinations of all data d1+d2.
They don't want to process combinations twice, and the solutions given seemed to generate and store the combinations of d1, then generate the combinations of d1+d2 but test and reject any combination that was found previously.
Seems like a straight-forward answer that is easy to follow, but I thought: Is there a way to create just the extra combinations but without storing all the combinations from before?
My methods
It's out with Vscode as my IDE. I'll be playing with a lot of code that will end up deleted, modified, rerun. I could use a Jupyter notebook, but I can't publish them to my blog satisfactorily. I'll develop a .py file but with cells: a line comment of # %% visually splits the file into cells in the IDE adding buttons and editor commands to execute cells and selected code, in any order, on a restartable kernel running in an interactive window that also runs Ipython.
When doodling like this, I often create long lines, spaced to highlight comparisons between other lines. I refactor names to be concise at the time, as what you can take in at a glance helps find patterns. Because of that, this blog post is not written to be read on the small screens of phones.
So, its combinations; binomials, nCr.
A start
- First thing was to try and see patterns in the combinations of d1+d2 minus those of just d1.
- Dealing with sets of values; sets are unordered, so will at some time need a function to print them in order to aid pattern finding.
- Combinations can be large - use combinations of ints then later work with any type.
- Initial combinations of 0 to n-1 ints I later found to be more awkward to reason about so changed to work with combinations of 1 to n ints. extending to n+x added ints n+1 to n+x to combinations.
Diffs
In the following cell I create combinations for r = 3, and n in some range n_range in c and print successive combinations, and differences between successive combinations.
Cell output:
Patterns
Looking at diffs 4C3 - 3C3 each tuple is like they took 3C2 = {(1,2), (1,3), (2,3)} and tagged the extra 4 on to every inner tuple.
Lets call this modification extending,
5C3 - 4C3 seems to follow the same pattern.
Function extend
Rename nCr to bino and check extend works
nCr was originally working with 0..n-1 and bino was 1..n. Now they both do
Cell output:
Pascal
show that comb(n+1, r) = comb(n, r) + (n+1)* comb(n, r-1)
Pascals rule checker
Diff by 1 extra item "done", attempting diff by 2.
Diff by 2 pattern finding
It was incremental - find a pattern in 3C?, subtract it from 5C3, Find a pattern in 3C? that covers part of the remainder; repeat.
(Where ? <=3).
I also shortened function pf_set to pf so it would take less space when printing formatted expressions in f-strings
Cell output:
Behold Diff by 2
Cell output:
Pascals rules for increasing diffs
I followed the same method for diffs of three and ended up with these three functions:
Generalised Pascals rule
What can I say, I looked for patterns in the pascals rule functions for discrete diffs and tried to find patterns. I looked deeper into identities between the extend functions.
I finally found the following function that passed my tests, (many not shown).
I don't like the if r-i > 0 else {()} bit as it doesn't seem elegant. There is probably some identity to be found that would make it disappear but, you know.
Back to the original problem
If comb(d1, r) is processed and then we find an extra d2 items, then we want to process extra_comb(d1, r, d2) where extra_comb does not include or save comb of d1.
We just need to exclude the nCr term in reduction of function pascal_rule_x.
Eventually I arrive at
Tests
END.
Armin Ronacher
Bridging the Efficiency Gap Between FromStr and String
Sometimes in Rust, you need to convert a string into a value of a specific type (for example, converting a string to an integer).
For this, the standard library provides the rather useful FromStr trait. In short, FromStr can convert from a &str into a value of any compatible type. If the conversion fails, an error value is returned. It's unfortunately not guaranteed that this value is an actual Error type, but overall, the trait is pretty useful.
It has however a drawback: it takes a &str and not a String which makes it wasteful in situations where your input is a String. This means that you will end up with a useless clone if do not actually need the conversion. Why would you do that? Well consider this type of API:
let arg1: i64 = parser.next_value()?;
let arg2: String = parser.next_value()?;
In such cases, having a conversion that works directly with String values would be helpful. To solve this, we can introduce a new trait: FromString, which does the following:
- Converts from String to the target type.
- If converting from String to String, bypass the regular logic and make it a no-op.
- Implement this trait for all uses of FromStr that return a error that can be converted into Box<dyn Error> upon failure.
We start by defining a type alias for our error:
pub type Error = Box<dyn std::error::Error + Send + Sync + 'static>;
You can be more creative here if you want. The benefit of using this directly is that a lot of types can be converted into that error, even if they are not errors themselves. For instance a FromStr that returns a bare String as error can leverage the standard library's blanket conversion implementation to Error.
Then we define the FromString trait:
pub trait FromString: Sized {
fn from_string(s: String) -> Result<Self, Error>;
}
To implement it, we provide a blanket implementation for all types that implement FromStr, where the error can be converted into our boxed error. As mentioned before, this even works for FromStr where Err: String. We also add a special case for when the input and output types are both String, using transmute_copy to avoid a clone:
use std::any::TypeId;
use std::mem::{ManuallyDrop, transmute_copy};
use std::str::FromStr;
impl<T> FromString for T
where
T: FromStr<Err: Into<Error>> + 'static,
{
fn from_string(s: String) -> Result<Self, Error> {
if TypeId::of::<T>() == TypeId::of::<String>() {
Ok(unsafe { transmute_copy(&ManuallyDrop::new(s)) })
} else {
T::from_str(&s).map_err(Into::into)
}
}
}
Why transmute_copy? We use it instead of the regular transmute? because Rust requires both types to have a known size at compile time for transmute to work. Due to limitations a generic T has an unknown size which would cause a hypothetical transmute call to fail with a compile time error. There is nightly-only transmute_unchecked which does not have that issue, but sadly we cannot use it. Another, even nicer solution, would be to have specialization, but sadly that is not stable either. It would avoid the use of unsafe though.
We can also add a helper function to make calling this trait easier:
pub fn from_string<T, S>(s: S) -> Result<T, Error>
where
T: FromString,
S: Into<String>,
{
FromString::from_string(s.into())
}
The Into might be a bit ridiculous here (isn't the whole point not to clone?), but it makes it easy to test this with static string literals.
Finally here is an example of how to use this:
let s: String = from_string("Hello World").unwrap();
let i: i64 = from_string("42").unwrap();
Hopefully, this utility is useful in your own codebase when wanting to abstract over string conversions.
If you need it exactly as implemented, I also published it as a simple crate.
Postscriptum:
A big thank-you goes to David Tolnay and a few others who pointed out that this can be done with transmute_copy.
Another note: TypeId::of call requires V to be 'static. This is okay for this use, but there are some hypothetical cases where this is not helpful. In that case there is the excellent typeid crate which provides a ConstTypeId, which is like TypeId but is constructible in const in stable Rust.
March 22, 2025
Eli Bendersky
Understanding Numpy's einsum
This is a brief explanation and a cookbook for using numpy.einsum, which lets us use Einstein notation to evaluate operations on multi-dimensional arrays. The focus here is mostly on einsum's explicit mode (with -> and output dimensions explicitly specified in the subscript string) and use cases common in ML papers, though I'll also briefly touch upon other patterns.
Basic use case - matrix multiplication
Let's start with a basic demonstration: matrix multiplication using einsum. Throughout this post, A and B will be these matrices:
>>> A = np.arange(6).reshape(2,3)
>>> A
array([[0, 1, 2],
[3, 4, 5]])
>>> B = np.arange(12).reshape(3,4)+1
>>> B
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
The shapes of A and B let us multiply A @ B to get a (2,4) matrix. This can also be done with einsum, as follows:
>>> np.einsum('ij,jk->ik', A, B)
array([[ 23, 26, 29, 32],
[ 68, 80, 92, 104]])
The first parameter to einsum is the subscript string, which describes the operation to perform on the following operands. Its format is a comma-separated list of inputs, followed by -> and an output. An arbitrary number of positional operands follows; they match the comma-separated inputs specified in the subscript. For each input, its shape is a sequence of dimension labels like i (any single letter).
In our example, ij refers to the matrix A - denoting its shape as (i,j), and jk refers to the matrix B - denoting its shape as (j,k). While in the subscript these dimension labels are symbolic, they become concrete when einsum is invoked with actual operands. This is because the shapes of the operands are known at that point.
The following is a simplified mental model of what einsum does (for a more complete description, read An instructional implementation of einsum):
- The output part of the subscript specifies the shape of the output array, expressed in terms of the input dimension labels.
- Whenever a dimension label is repeated in the input and absent in the output - it is contracted (summed). In our example, j is repeated (and doesn't appear in the output), so it's contracted: each output element [ik] is a dot product of the i'th row of the first input with the k'th column of the second input.
We can easily transpose the output, by flipping its shape:
>>> np.einsum('ij,jk->ki', A, B)
array([[ 23, 68],
[ 26, 80],
[ 29, 92],
[ 32, 104]])
This is equivalent to (A @ B).T.
When reading ML papers, I find that even for such simple cases as basic matrix multiplication, the einsum notation is often preferred to the plain @ (or its function form like np.dot and np.matmul). This is likely because the einsum approach is self-documenting, helping the writer reason through the dimensions more explicitly.
Batched matrix multiplication
Using einsum instead of @ for matmuls as a documentation prop starts making even more sense when the ndim [1] of the inputs grows. For example, we may want to perform matrix multiplication on a whole batch of inputs within a single operation. Suppose we have these arrays:
>>> Ab = np.arange(6*6).reshape(6,2,3)
>>> Bb = np.arange(6*12).reshape(6,3,4)
Here 6 is the batch dimension. We're multiplying a batch of six (2,3) matrices by a batch of six (3,4) matrices; each matrix in Ab is multiplied by a corresponding matrix in Bb. The result is shaped (6,2,4).
We can perform batched matmul by doing Ab @ Bb - in Numpy this just works: the contraction happens between the last dimension of the first array and the penultimate dimension of the second array. This is repeated for all the dimensions preceding the last two. The shape of the output is (6,2,4), as expected.
With the einsum notation, we can do the same, but in a way that's more self-documenting:
>>> np.einsum('bmd,bdn->bmn', Ab, Bb)
This is equivalent to Ab @ Bb, but the subscript string lets us name the dimensions with single letters and makes it easier to follow w.r.t. what's going on. For example, in this case b may stand for batch, m and n may stand for sequence lengths and d could be some sort of model dimension/depth.
Note: while b is repeated in the inputs of the subscript, it also appears in the output; therefore it's not contracted.
Ordering output dimensions
The order of output dimensions in the subscript of einsum allows us to do more than just matrix multiplications; we can also transpose arbitrary dimensions:
>>> Bb.shape
(6, 3, 4)
>>> np.einsum('ijk->kij', Bb).shape
(4, 6, 3)
This capability is commonly combined with matrix multiplication to specify exactly the order of dimensions in a multi-dimensional batched array multiplication. The following is an example taken directly from the Fast Transformer Decoding paper by Noam Shazeer.
In the section on batched multi-head attention, the paper defines the following arrays:
- M: a tensor with shape (b,m,d) (batch, sequence length, model depth)
- P_k: a tensor with shape (h,d,k) (number of heads, model depth, head size for keys)
Let's define some dimension size constants and random arrays:
>>> m = 4; d = 3; k = 6; h = 5; b = 10
>>> Pk = np.random.randn(h, d, k)
>>> M = np.random.randn(b, m, d)
The paper performs an einsum to calculate all the keys in one operation:
>>> np.einsum('bmd,hdk->bhmk', M, Pk).shape
(10, 5, 4, 6)
Note that this involves both contraction (of the d dimension) and ordering of the outputs so that batch comes before heads. Theoretically, we could reverse this order by doing:
>>> np.einsum('bmd,hdk->hbmk', M, Pk).shape
(5, 10, 4, 6)
And indeed, we could have the output in any order. Obviously, bhmk is the one that makes sense for the specific operation at hand. It's important to highlight the readability of the einsum approach as opposed to a simple M @ Pk, where the dimensions involved are much less clear [2].
Contraction over multiple dimensions
More than one dimension can be contracted in a single einsum, as demonstrated by another example from the same paper:
>>> b = 10; n = 4; d = 3; v = 6; h = 5
>>> O = np.random.randn(b, h, n, v)
>>> Po = np.random.randn(h, d, v)
>>> np.einsum('bhnv,hdv->bnd', O, Po).shape
(10, 4, 3)
Both h and v appear in both inputs of the subscript but not in the output. Therefore, both these dimensions are contracted - each element of the output is a sum across both the h and v dimensions. This would be much more cumbersome to achieve without einsum!
Transposing inputs
When specifying the inputs to einsum, we can transpose them by reordering the dimensions. Recall our matrix A with shape (2,3); we can't multiply A by itself - the shapes don't match, but we can multiply it by its own transpose as in A @ A.T. With einsum, we can do this as follows:
>>> np.einsum('ij,kj->ik', A, A)
array([[ 5, 14],
[14, 50]])
Note the order of dimensions in the second input of the subscript: kj instead of jk as before. Since j is still the label repeated in inputs but omitted in the output, it's the one being contracted.
More than two arguments
einsum supports an arbitrary number of inputs; suppose we want to chain-multiply our A and B with this array C:
>>> C = np.arange(20).reshape(4, 5)
We get:
>>> A @ B @ C
array([[ 900, 1010, 1120, 1230, 1340],
[2880, 3224, 3568, 3912, 4256]])
With einsum, we do it like this:
>>> np.einsum('ij,jk,kp->ip', A, B, C)
array([[ 900, 1010, 1120, 1230, 1340],
[2880, 3224, 3568, 3912, 4256]])
Here as well, I find the explicit dimension names a nice self-documentation feature.
An instructional implementation of einsum
The simplified mental model of how einsum works presented above is not entirely correct, though it's definitely sufficient to understand the most common use cases.
I read a lot of "how einsum works" documents online, and unfortunately they all suffer from similar issues; to put it generously, at the very least they're incomplete.
What I found is that implementing a basic version of einsum is easy; and that, moreover, this implementation serves as a much better explanation and mental model of how einsum works than other attempts [3]. So let's get to it.
We'll use the basic matrix multiplication as a guiding example: 'ij,jk->ik'.
This calculation has two inputs; so let's start by writing a function that takes two arguments [4]:
def calc(__a, __b):
The labels in the subscript specify the dimensions of these inputs, so let's define the dimension sizes explicitly (and also assert that sizes are compatible when a label is repeated in multiple inputs):
i_size = __a.shape[0]
j_size = __a.shape[1]
assert j_size == __b.shape[0]
k_size = __b.shape[1]
The output shape is (i,k), so we can create an empty output array:
out = np.zeros((i_size, k_size))
And generate a loop over its every element:
for i in range(i_size):
for k in range(k_size):
...
return out
Now, what goes into this loop? It's time to look at the inputs in the subscript. Since there's a contraction on the j label, this means summation over this dimension:
for i in range(i_size):
for k in range(k_size):
for j in range(j_size):
out[i, k] += __a[i, j] * __b[j, k]
return out
Note how we access out, __a and __b in the loop body; this is derived directly from the subscript 'ij,jk->ik'. In fact, this is how the einsum came to be from Einstein notation - more on this later on.
As another example of how to reason about einsum using this approach, consider the subscript from Contraction over multiple dimensions:
'bhnv,hdv->bnd'
Straight away, we can write out the assignment to the output, following the subscript:
out[b, n, d] += __a[b, h, n, v] * __b[h, d, v]
All that's left is figure out the loops. As discussed earlier, the outer loops are over the output dimensions, with two additional inner loops for the contracted dimensions in the input (v and h in this case). Therefore, the full implementation (omitting the assignments of *_size variables and dimension checks) is:
for b in range(b_size):
for n in range(n_size):
for d in range(d_size):
for v in range(v_size):
for h in range(h_size):
out[b, n, d] += __a[b, h, n, v] * __b[h, d, v]
What happens when the einsum subscript doesn't have any contracted dimension? In this case, there's no summation loop; the outer loops (assigning each element of the output array) are simply assigning the product of the appropriate input elements. Here's an example: 'i,j->ij'. As before, we start by setting up dimension sizes and the output array, and then a loop over each output element:
def calc(__a, __b):
i_size = __a.shape[0]
j_size = __b.shape[0]
out = np.zeros((i_size, j_size))
for i in range(i_size):
for j in range(j_size):
out[i, j] = __a[i] * __b[j]
return out
Since there's no dimension in the input that doesn't appear in the output, there's no summation. The result of this computation is the outer product between two 1D input arrays.
I placed a well-documented implementation of this translation on GitHub. The function translate_einsum takes an einsum subscript and emits the text for a Python function that implements it.
Einstein notation
This notation is named after Albert Einstein because he introduced it to physics in his seminal 1916 paper on general relativity. Einstein was dealing with cumbersome nested sums to express operations on tensors and used this notation for brevity.
In physics, tensors typically have both subscripts and superscripts (for covariant and contravariant components), and it's common to encounter systems of equations like this:
\[\begin{align*} B^1=a_{11}A^1+a_{12}A^2+a_{13}A^3=\sum_{j=1}^{3} a_{ij}A^j\\ B^2=a_{21}A^1+a_{22}A^2+a_{23}A^3=\sum_{j=1}^{3} a_{2j}A^j\\ B^3=a_{31}A^1+a_{32}A^2+a_{33}A^3=\sum_{j=1}^{3} a_{3j}A^j\\ \end{align*}\]We can collapse this into a single sum, using a variable i:
\[B^{i}=\sum_{j=1}^{3} a_{ij}A^j\]And observe that since j is duplicated inside the sum (once in a subscript and once in a superscript), we can write this as:
\[B^{i}=a_{ij}A^j\]Where the sum is implied; this is the core of Einstein notation. An observant reader will notice that the original system of equations can easily be expressed as matrix-vector multiplication, but keep a couple of things in mind:
- Matrix notation only became popular in physics after Einstein's work on general relativity (in fact, it was Werner Heisenberg who first introduced it in 1925).
- Einstein notation extends to any number of dimensions. Matrix notation is useful for 2D, but much more difficult to visualize and work with in higher dimensions. In 2D, matrix notation is equivalent to Einstein's.
It should be easy to see the equivalence between this notation and the einsum subscripts discussed in this post. The implicit mode of einsum is even closer to Einstein notation conceptually.
Implicit mode einsum
In implicit mode einsum, the output specification (-> and the labels following it) doesn't exist. Instead, the output shape is inferred from the input labels. For example, here's 2D matrix multiplication:
>>> np.einsum('ij,jk', A, B)
array([[ 23, 26, 29, 32],
[ 68, 80, 92, 104]])
In implicit mode, the lexicographic order of labels in each input matters, as it determines the order of dimensions in the output. For example, if we want to (A @ B).T, we can do:
>>> np.einsum('ij,jh', A, B)
array([[ 23, 68],
[ 26, 80],
[ 29, 92],
[ 32, 104]])
Since h precedes i in lexicographic order, this is equivalent to the explicit subscript 'ij,jh->hi, whereas the original implicit matmul subscript is equivalent to 'ih,jk->ik'.
Implicit mode isn't used much in ML code and papers, as far as I can tell. From my POV, compared to explicit mode it loses a lot of readability and gains very little savings in typing out the output labels.
[1] | In the sense of numpy.ndim - the number of dimensions in the array. Alternatively this is sometimes called rank, but this is confusing because rank is already a name for something else in linear algebra. |
[2] | I personally believe that one of the biggest downsides of Numpy and all derived libraries (like JAX, PyTorch and TensorFlow) is that there's no way to annotate and check the shapes of operations. This makes some code much less readable than it could be. einsum mitigates this to some extent. |
[3] | First seen in this StackOverflow answer. |
[4] | The reason we use underscores here is to avoid collisions with potential dimension labels named a and b. Since we're doing code generation here, variable shadowing is a common issue; see hygienic macros for additional fun. |
Anarcat
Minor outage at Teksavvy business
This morning, internet was down at home. The last time I had such an issue was in February 2023, when my provider was Oricom. Now I'm with a business service at Teksavvy Internet (TSI), in which I pay 100$ per month for a 250/50 mbps business package, with a static IP address, on which I run, well, everything: email services, this website, etc.
Mitigation
The main problem when the service goes down like this for prolonged outages is email. Mail is pretty resilient to failures like this but after some delay (which varies according to the other end), mail starts to drop. I am actually not sure what the various settings are among different providers, but I would assume mail is typically kept for about 24h, so that's our mark.
Last time, I setup VMs at Linode and Digital Ocean to deal better with this. I have actually kept those VMs running as DNS servers until now, so that part is already done.
I had fantasized about Puppetizing the mail server configuration so
that I could quickly spin up mail exchangers on those machines. But
now I am realizing that my Puppet server is one of the service that's
down, so this would not work, at least not unless the manifests can be
applied without a Puppet server (say with puppet apply
).
Thankfully, my colleague groente did amazing work to refactor our Postfix configuration in Puppet at Tor, and that gave me the motivation to reproduce the setup in the lab. So I have finally Puppetized part of my mail setup at home. That used to be hand-crafted experimental stuff documented in a couple of pages in this wiki, but is now being deployed by Puppet.
It's not complete yet: spam filtering (including DKIM checks and
graylisting) are not implemented yet, but that's the next step,
presumably to do during the next outage. The setup should be
deployable with puppet apply
, however, and I have refined that
mechanism a little bit, with the run
script.
Heck, it's not even deployed yet. But the hard part / grunt work is done.
Other
The outage was "short" enough (5 hours) that I didn't take time to deploy the other mitigations I had deployed in the previous incident.
But I'm starting to seriously consider deploying a web (and caching) reverse proxy so that I endure such problems more gracefully.
Side note on proper servics
Typically, I tend to think of a properly functioning service as having four things:
- backups
- documentation
- monitoring
- automation
- high availability
Yes, I miscounted. This is why you have high availability.
Backups
Duh. If data is maliciously or accidentally destroyed, you need a copy somewhere. Preferably in a way that malicious joe can't get to.
This is harder than you think.
Documentation
I have an entire template for this. Essentially, it boils down to using https://diataxis.fr/ and this "audit" guide. For me, the most important parts are:
- disaster recovery (includes backups, probably)
- playbook
- install/upgrade procedures (see automation)
You probably know this is hard, and this is why you're not doing it. Do it anyways, you'll think it sucks, but you'll be really grateful for whatever scraps you wrote when you're in trouble.
Monitoring
If you don't have monitoring, you'll know it fails too late, and you won't know it recovers. Consider high availability, work hard to reduce noise, and don't have machine wake people up, that's literally torture and is against the Geneva convention.
Consider predictive algorithm to prevent failures, like "add storage within 2 weeks before this disk fills up".
This is harder than you think.
Automation
Make it easy to redeploy the service elsewhere.
Yes, I know you have backups. That is not enough: that typically restores data and while it can also include configuration, you're going to need to change things when you restore, which is what automation (or call it "configuration management" if you will) will do for you anyways.
This also means you can do unit tests on your configuration, otherwise you're building legacy.
This is probably as hard as you think.
High availability
Make it not fail when one part goes down.
Eliminate single points of failures.
This is easier than you think, except for storage and DNS (which, I guess, means it's harder than you think too).
Assessment
In the above 5 items, I check two:
- backups
- documentation
And barely: I'm not happy about the offsite backups, and my documentation is much better at work than at home (and even there, I have a 15 year backlog to catchup on).
I barely have monitoring: Prometheus is scraping parts of the infra, but I don't have any sort of alerting -- by which I don't mean "electrocute myself when something goes wrong", I mean "there's a set of thresholds and conditions that define an outage and I can look at it".
Automation is wildly incomplete. My home server is a random collection of old experiments and technologies, ranging from Apache with Perl and CGI scripts to Docker containers running Golang applications. Most of it is not Puppetized (but the ratio is growing). Puppet itself introduces a huge attack vector with kind of catastrophic lateral movement if the Puppet server gets compromised.
And, fundamentally, I am not sure I can provide high availability in the lab. I'm just this one guy running my home network, and I'm growing older. I'm thinking more about winding things down than building things now, and that's just really sad, because I feel we're losing (well that escalated quickly).
Resolution
In the end, I didn't need any mitigation and the problem fixed itself. I did do quite a bit of cleanup so that feels somewhat good, although I despaired quite a bit at the amount of technical debt I've accumulated in the lab.
Timeline
Times are in UTC-4.
- 6:52: IRC bouncer goes offline
- 9:20: called TSI support, waited on the line 15 minutes then was told I'd get a call back
- 9:54: outage apparently detected by TSI
- 11:00: no response, tried calling back support again
- 11:10: confirmed bonding router outage, no official ETA but "today", source of the 9:54 timestamp above
- 12:08: TPA monitoring notices service restored
- 12:34: call back from TSI; service restored, problem was with the "bonder" configuration on their end, which was "fighting between Montréal and Toronto"
Losing the war for the free internet
Warning: this is a long ramble I wrote after an outage of my home internet. You'll get your regular scheduled programming shortly.
I didn't realize this until relatively recently, but we're at war.
Fascists and capitalists are trying to take over the world, and it's bringing utter chaos.
We're more numerous than them, of course: this is only a handful of people screwing everyone else over, but they've accumulated so much wealth and media control that it's getting really, really hard to move around.
Everything is surveilled: people are carrying tracking and recording devices in their pockets at all time, or they drive around in surveillance machines. Payments are all turning digital. There's cameras everywhere, including in cars. Personal data leaks are so common people kind of assume their personal address, email address, and other personal information has already been leaked.
The internet itself is collapsing: most people are using the network only as a channel to reach a "small" set of "hyperscalers": mind-boggingly large datacenters that don't really operate like the old internet. Once you reach the local endpoint, you're not on the internet anymore. Netflix, Google, Facebook (Instagram, Whatsapp, Messenger), Apple, Amazon, Microsoft (Outlook, Hotmail, etc), all those things are not really the internet anymore.
Those companies operate over the "internet" (as in the TCP/IP network), but they are not an "interconnected network" as much as their own, gigantic silos so much bigger than everything else that they essentially dictate how the network operates, regardless of standards. You access it over "the web" (as in "HTTP") but the fabric is not made of interconnected links that cross sites: all those sites are trying really hard to keep you captive on their platforms.
Besides, you think you're writing an email to the state department, for example, but you're really writing to Microsoft Outlook. That app your university or border agency tells you to install, the backend is not hosted by those institutions, it's on Amazon. Heck, even Netflix is on Amazon.
Meanwhile I've been operating my own mail server first under my bed (yes, really) and then in a cupboard or the basement for almost three decades now. And what for?
So I can tell people I can? Maybe!
I guess the reason I'm doing this is the same reason people are suddenly asking me about the (dead) mesh again. People are worried and scared that the world has been taken over, and they're right: we have gotten seriously screwed.
It's the same reason I keep doing radio, minimally know how to grow food, ride a bike, build a shed, paddle a canoe, archive and document things, talk with people, host an assembly. Because, when push comes to shove, there's no one else who's going to do it for you, at least not the way that benefits the people.
The Internet is one of humanity's greatest accomplishments. Obviously, oligarchs and fascists are trying to destroy it. I just didn't expect the tech bros to be flipping to that side so easily. I thought we were friends, but I guess we are, after all, enemies.
That said, that old internet is still around. It's getting harder to host your own stuff at home, but it's not impossible. Mail is tricky because of reputation, but it's also tricky in the cloud (don't get fooled!), so it's not that much easier (or cheaper) there.
So there's things you can do, if you're into tech.
Share your wifi with your neighbours.
Build a LAN. Throw a wire over to your neighbour too, it works better than wireless.
Use Tor. Run a relay, a snowflake, a webtunnel.
Host a web server. Build a site with a static site generator and throw it in the wind.
Download and share torrents, and why not a tracker.
Run an IRC server (or Matrix, if you want to federate and lose high availability).
At least use Signal, not Whatsapp or Messenger.
And yes, why not, run a mail server, join a mesh.
Don't write new software, there's plenty of that around already.
(Just kidding, you can write code, cypherpunk.)
You can do many of those things just by setting up a FreedomBox.
That is, after all, the internet: people doing their own thing for their own people.
Otherwise, it's just like sitting in front of the television and watching the ads. Opium of the people, like the good old time.
Let a billion droplets build the biggest multitude of clouds that will storm over this world and rip apart this fascist conspiracy.
Disobey. Revolt. Build.
We are more than them.
March 21, 2025
Daniel Roy Greenfeld
Using pyinstrument to profile FastHTML apps
FastHTML is built on Starlette, so we use Starlette's middleware tooling and then pass in the result. Just make sure you install pyinstrument.
WARNING: NOT FOR PRODUCTION ENVIRONMENTS Including a profiler like this in a production environment is dangerous. As it exposes infrastructure it is highly risky to include in any location where end users can access it.
"""WARNING: NOT FOR PRODUCTION ENVIRONMENTS"""
from fasthtml.common import *
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.middleware import Middleware
try:
from pyinstrument import Profiler
except ImportError:
raise ImportError('Please install pyinstrument')
class ProfileMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
profiling = request.query_params.get("profile", False)
if profiling:
profiler = Profiler()
profiler.start()
response = await call_next(request)
profiler.stop()
return HTMLResponse(profiler.output_html())
return await call_next(request)
app, rt = fast_app(middleware=(Middleware(ProfileMiddleware)))
@rt("/")
def get():
return Titled("FastHTML", P("Hello, world!"))
serve()
To invoke, make any request to your application with the GET parameter
profile=1
and it will print the HTML result from pyinstrument.
Real Python
Quiz: How to Strip Characters From a Python String
In this quiz, you’ll test your understanding of Python’s .strip()
method.
You’ll also revisit the related methods .lstrip()
and .rstrip()
, as well as .removeprefix()
and .removesuffix()
. These methods are useful for tasks like cleaning user input, standardizing filenames, and preparing data for storage.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
The Real Python Podcast â Episode #244: A Decade of Automating the Boring Stuff With Python
What goes into updating one of the most popular books about working with Python? After a decade of changes in the Python landscape, what projects, libraries, and skills are relevant to an office worker? This week on the show, we speak with previous guest Al Sweigart about the third edition of "Automate the Boring Stuff With Python."
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Talk Python to Me
#497: Outlier Detection with Python
Have you ever wondered why certain data points stand out so dramatically? They might hold the key to everything from fraud detection to groundbreaking discoveries. This week on Talk Python to Me, we dive into the world of outlier detection with Python with Brett Kennedy. Youâll learn how outliers can signal errors, highlight novel insights, or even reveal hidden patterns lurking in the data you thought you understood. Weâll explore fresh research developments, practical use cases, and how outlier detection compares to other core data science tasks like prediction and clustering. If you're ready to spot those game-changing anomalies in your own projects, stay tuned.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/connect-cloud'>Posit</a><br> <a href='https://talkpython.fm/devopsbook'>Python in Production</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <h2 class="links-heading">Links from the show</h2> <div><strong>Data-morph</strong>: <a href="https://github.com/stefmolin/data-morph?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>PyOD</strong>: <a href="https://github.com/yzhao062/pyod?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Prophet</strong>: <a href="https://github.com/paullo0106/prophet_anomaly_detection?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Episode transcripts</strong>: <a href="https://talkpython.fm/episodes/transcript/497/outlier-detection-with-python" target="_blank" >talkpython.fm</a><br/> <br/> <strong>--- Stay in touch with us ---</strong><br/> <strong>Subscribe to Talk Python on YouTube</strong>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <strong>Talk Python on Bluesky</strong>: <a href="https://bsky.app/profile/talkpython.fm" target="_blank" >@talkpython.fm at bsky.app</a><br/> <strong>Talk Python on Mastodon</strong>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <strong>Michael on Bluesky</strong>: <a href="https://bsky.app/profile/mkennedy.codes?featured_on=talkpython" target="_blank" >@mkennedy.codes at bsky.app</a><br/> <strong>Michael on Mastodon</strong>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Techiediaries - Django
10xDev Newsletter #1: Vibe Coding, Clone UIs with AI; Python for Mobile Dev; LynxJS â Tiktok New Framework; New Angular 19, React 19, Laravel 12 Features; AI Fakers in Recruitment; Local-First AppsâŠ
Issue #1 of 10xDev Newsletter for modern dev technologies, productivity, and devneurship in the era of AI!
March 20, 2025
PyBites
Optimizing Python: Understanding Generator Mechanics, Expressions, and Efficiency
Python generators provide an elegant mechanism for handling iteration, particularly for large datasets where traditional approaches may be memory-intensive. Unlike standard functions that compute and return all values at once, generators produce values on demand through the yield statement, enabling efficient memory usage and creating new possibilities for data processing workflows.
Generator Function Mechanics
At their core, generator functions appear similar to regular functions but behave quite differently. The defining characteristic is the yield statement, which fundamentally alters the functionâs execution model:
def simple_generator():
print("First yield")
yield 1
print("Second yield")
yield 2
print("Third yield")
yield 3
When you call this function, it doesnât execute immediately. Instead, it returns a generator object:
gen = simple_generator()
print(gen)
# <generator object simple_generator at 0x000001715CA4B7C0>
This generator object controls the execution of the function, producing values one at a time when requested:
value = next(gen) # Prints "First yield" and returns 1
value = next(gen) # Prints "Second yield" and returns 2
State Preservation and Execution Pausing
What makes generators special is their ability to pause execution and preserve state. When a generator reaches a yield statement:
- Execution pauses
- The yielded value is returned to the caller
- All local state (variables, execution position) is preserved
- When
next()
is called again, execution resumes from exactly where it left off
This mechanism creates an efficient way to work with sequences without keeping the entire sequence in memory at once.
Execution Model and Stack Frame Suspension
Generators operate with independent stack frames, meaning their execution context remains intact between successive calls. Unlike standard functions, which discard their execution frames upon return, generators maintain their internal state until exhausted, allowing efficient handling of sequences without redundant recomputation.
When a normal function returns, its stack frame (containing local variables and execution context) is immediately destroyed. In contrast, a generatorâs stack frame is suspended when it yields a value and resumed when next()
is called again. This suspension and resumption is managed by the Python interpreter, maintaining the exact state of all variables and the instruction pointer.
This unique execution model is what enables generators to act as efficient iterators over sequences that would be impractical to compute all at once, such as infinite sequences or large data transformations.
Generator Control Flow and Multiple yield points
Generators can contain multiple yield statements and complex control flow:
def fibonacci_generator(limit):
a, b = 0, 1
while a < limit:
yield a
a, b = b, a + b
# Multiple yield points with conditional logic
def conditional_yield(data):
for item in data:
if item % 2 == 0:
yield f"Even: {item}"
else:
yield f"Odd: {item}"
This flexibility allows generators to implement sophisticated iteration patterns while maintaining their lazy evaluation benefits.
Memory Efficiency: The Key Advantage
The primary benefit of generators is their memory efficiency. Letâs compare standard functions and generators:
def get_all_numbers(numbers: list):
"""Normal function - allocates memory for entire list at once"""
result = []
for i in range(numbers):
result.append(i)
return result
def yield_all_numbers(numbers: list):
"""Generator - produces one value at a time"""
for i in range(numbers):
yield i
To quantify the difference:
import sys
regular_list = get_all_numbers(1000000)
generator = yield_all_numbers(1000000)
print(f"List size: {sys.getsizeof(regular_list)} bytes")
print(f"Generator size: {sys.getsizeof(generator)} bytes")
# List size: 8448728 bytes
# Generator size: 208 bytes
This dramatic difference in memory usage makes generators invaluable when working with large datasets that would otherwise consume excessive memory.
Generator Expressions
Python offers a concise syntax for creating generators called generator expressions. These are similar to list comprehensions but use parentheses and produce values lazily:
# List comprehension - creates the entire list in memory
squares_list = [x * x for x in range(10)]
# Generator expression - creates values on demand
squares_gen = (x * x for x in range(10))
The performance difference becomes significant with large datasets:
import sys
import time
# Compare memory usage and creation time for large dataset
start = time.time()
list_comp = [x for x in range(100_000_000)]
list_time = time.time() - start
list_size = sys.getsizeof(list_comp)
start_gen = time.time()
gen_exp = (x for x in range(100_000_000))
gen_time = time.time() - start_gen
gen_size = sys.getsizeof(gen_exp)
print(f"List comprehension: {list_size:,} bytes, created in {list_time:.4f} seconds")
# List comprehension: 835,128,600 bytes, created in 4.9007 seconds
print(f"Generator expression: {gen_size:,} bytes, created in {gen_time:.4f} seconds")
# Generator expression: 200 bytes, created in 0.0000 seconds
Minimal Memory, Maximum Speed
The generator expression is so fast (effectively zero seconds) because the Python interpreter doesnât actually compute or store any of those 100 million numbers yet. Instead, the generator expression simply creates an iterator object that remembers:
- How to produce the numbers
(x for x in range(100_000_000))
. - The current state (initially, the start point).
The size reported (200 bytes) is the memory footprint of the generator object itself, which includes a pointer to the generatorâs code object, and the Internal state required to track iteration, but none of the actual values yet.
Chaining and Composing Generators
One of the elegant aspects of generators is how easily they can be composed. Pythonâs itertools module provides utilities that enhance this capability:
from itertools import chain, filterfalse
# Chain multiple generator expressions together
result = chain((x * x for x in range(10)), (y + 10 for y in range(5)))
# Filter values from a generator
odd_squares = filterfalse(lambda x: x % 2 == 0, (x * x for x in range(10)))
# Transform values from a generator
doubled_values = map(lambda x: x * 2, range(10))
Final Thoughts: When to Use Generators
Python generators offer an elegant, memory-efficient approach to iteration. By yielding values one at a time as they’re needed, generators allow you to handle datasets that would otherwise overwhelm available memory. Their distinct execution model, combining state preservation with lazy evaluation, makes them exceptionally effective for various data processing scenarios.
Generators particularly shine in these use cases:
- Large Dataset Processing: Manage extensive datasets that would otherwise exceed memory constraints if loaded entirely.
- Streaming Data Handling: Effectively process data that continuously arrives in real-time.
- Composable Pipelines: Create data transformation pipelines that benefit from modular and readable design.
- Infinite Sequences: Generate sequences indefinitely, processing elements until a specific condition is met.
- File Processing: Handle files line-by-line without needing to load them fully into memory.
For smaller datasets (typically fewer than a few thousand items), the memory advantages of generators may not be significant, and standard lists could provide better readability and simplicity.
In an upcoming companion article, I’ll delve deeper into how these fundamental generator concepts support sophisticated techniques to tackle real-world challenges, such as managing continuous data streams.
Seth Michael Larson
I fear for the unauthenticated web
LLM and AI companies seem to all be in a race to breathe the last breath of air in every room they stumble into. This practice started with larger websites, ones that already had protection from malicious usage like denial-of-service and abuse in the form of services like Cloudflare or Fastly.
But the list of targets has been getting longer. At this point we're seeing LLM and AI scrapers targeting small project forges like the GNOME GitLab server.
How long until scrapers start hammering Mastodon servers? Individual websites? Are we going to have to require authentication or JavaScript challenges on every web page from here on out?
All this for what, shitty chat bots? What an awful thing that these companies are doing to the web.
I suggest everyone that uses cloud infrastructure for hosting set-up a billing limit to avoid an unexpected bill in case they're caught in the cross-hairs of a negligent company. All the abusers anonymize their usage at this point, so good luck trying to get compensated for damages.
March 19, 2025
PyCon
Refund Policy for International Attendees
International travel to the United States has become more complex for many in our community. PyCon US welcomes all community members to Pittsburgh and we are committed to running a safe and friendly event for everyone who is joining us for PyCon US in Pittsburgh.
Each nation has its own relationship with the United States, so please contact your countryâs State Department, Travel Ministry or Department of Foreign Affairs for travel information specific to traveling from your country to the US. Ultimately, each person must make their own decision based on their personal risk assessment and the travel conditions.
If it feels feasible and safe for you to attend PyCon US this year, then weâd love to see you! It is more important than ever to connect with our fellow community members. In light of current conditions, PyCon US would like to highlight the support we provide for international travelers.
Refund Policy Details
If your PyCon US trip is canceled due to not being able to obtain a visa or you are denied entry at the US border with a valid visa; or if you have COVID, influenza, measles, or other communicable diseases, PyCon US will grant you a refund of your ticket and waive the cancellation fee.
Additionally, if you have a valid visa to travel to the United States and are denied entry upon arrival to the United States, please see the details below and note that you will need to provide documentation that you arrived in the United States and were denied entry.
- Airfare:
- Please request a refund of your airfare from your airline carrier or booking agent.
- PyCon US will reimburse you the portion of the PyCon US airfare the airline will not refund.
- Hotel:
- If you used the PyCon US registration system to book a hotel room, the PyCon US registration team will personally work with the hotel to make sure you do not have to pay a cancellation fee.
Please note the above policy only applies to attendees not traveling to or from OFAC sanctioned countries.
In the event that you have a refund request or questions, please contact our registration team.
PyCon US hopes that the expanded refund and travel support policy offers attendees the ability to plan more confidently and will continue to make PyCon US 2025 an option for as many Pythonistas from around the world as possible.
Real Python
LangGraph: Build Stateful AI Agents in Python
LangGraph is a versatile Python library designed for stateful, cyclic, and multi-actor Large Language Model (LLM) applications. LangGraph builds upon its parent library, LangChain, and allows you to build sophisticated workflows that are capable of handling the complexities of real-world LLM applications.
By the end of this tutorial, youâll understand that:
- You can use LangGraph to build LLM workflows by defining state graphs with nodes and edges.
- LangGraph expands LangChainâs capabilities by providing tools to build complex LLM workflows with state, conditional edges, and cycles.
- LLM agents in LangGraph autonomously process tasks using state graphs to make decisions and interact with tools or APIs.
- You can use LangGraph independently of LangChain, although theyâre often used together to complement each other.
Explore the full tutorial to gain hands-on experience with LangGraph, including setting up workflows and building a LangGraph agent that can autonomously parse emails, send emails, and interact with API services.
While youâll get a brief primer on LangChain in this tutorial, youâll benefit from having prior knowledge of LangChain fundamentals. Youâll also want to ensure you have intermediate Python knowledgeâspecifically in object-oriented programming concepts like classes and methods.
Get Your Code: Click here to download the free sample code that youâll use to build stateful AI agents with LangGraph in Python.
Take the Quiz: Test your knowledge with our interactive âLangGraph: Build Stateful AI Agents in Pythonâ quiz. Youâll receive a score upon completion to help you track your learning progress:
Interactive Quiz
LangGraph: Build Stateful AI Agents in PythonTake this quiz to test your understanding of LangGraph, a Python library designed for stateful, cyclic, and multi-actor Large Language Model (LLM) applications. By working through this quiz, you'll revisit how to build LLM workflows and agents in LangGraph.
Install LangGraph
LangGraph is available on PyPI, and you can install it with pip
. Open a terminal or command prompt, create a new virtual environment, and then run the following command:
(venv) $ python -m pip install langgraph
This command will install the latest version of LangGraph from PyPI onto your machine. To verify that the installation was successful, start a Python REPL and import LangGraph:
>>> import langgraph
If the import runs without error, then youâve successfully installed LangGraph. Youâll also need a few more libraries for this tutorial:
(venv) $ python -m pip install langchain-openai "pydantic[email]"
Youâll use langchain-openai
to interact with OpenAI LLMs, but keep in mind that you can use any LLM provider you like with LangGraph and LangChain. Youâll use pydantic
to validate the information your agent parses from emails.
Before moving forward, if you choose to use OpenAI, make sure youâre signed up for an OpenAI account and that you have a valid API key. Youâll need to set the following environment variable before running any examples in this tutorial:
OPENAI_API_KEY=<YOUR-OPENAI-API-KEY>
Note that while LangGraph was made by the creators of LangChain, and the two libraries are highly compatible, itâs possible to use LangGraph without LangChain. However, itâs more common to use LangChain and LangGraph together, and youâll see throughout this tutorial how they complement each other.
With that, youâve installed all the dependencies youâll need for this tutorial, and youâre ready to create your LangGraph email processor. Before diving in, youâll take a brief detour to set up quick sanity tests for your app. Then, youâll go through an overview of LangChain chains and explore LangGraphâs core conceptâthe state graph.
Create Test Cases
When developing AI applications, testing and performance tracking is crucial for understanding how your chain, graph, or agent performs in the real world. While performance tracking is out of scope for this tutorial, youâll use several example emails to test your chains, graphs, and agent, and youâll empirically inspect whether their outputs are correct.
To avoid redefining these examples each time, create the following Python file with example emails:
example_emails.py
EMAILS = [
# Email 0
"""
Date: October 15, 2024
From: Occupational Safety and Health Administration (OSHA)
To: Blue Ridge Construction, project 111232345 - Downtown Office
Complex Location: Dallas, TX
During a recent inspection of your construction site at 123 Main
Street,
the following safety violations were identified:
Lack of fall protection: Workers on scaffolding above 10 feet
were without required harnesses or other fall protection
equipment. Unsafe scaffolding setup: Several scaffolding
structures were noted as
lacking secure base plates and bracing, creating potential
collapse risks.
Inadequate personal protective equipment (PPE): Multiple
workers were
found without proper PPE, including hard hats and safety
glasses.
Required Corrective Actions:
Install guardrails and fall arrest systems on all scaffolding
over 10 feet. Conduct an inspection of all scaffolding
structures and reinforce unstable sections. Ensure all
workers on-site are provided
with necessary PPE and conduct safety training on proper
usage.
Deadline for Compliance: All violations must be rectified
by November 10, 2024. Failure to comply may result in fines
of up to
$25,000 per violation.
Contact: For questions or to confirm compliance, please reach
out to the
OSHA regional office at (555) 123-4567 or email
compliance.osha@osha.gov.
""",
# Email 1
"""
From: debby@stack.com
Hey Betsy,
Here's your invoice for $1000 for the cookies you ordered.
""",
# Email 2
"""
From: tdavid@companyxyz.com
Hi Paul,
We have an issue with the HVAC system your team installed in
apartment 1235. We'd like to request maintenance or a refund.
Thanks,
Terrance
""",
# Email 3
"""
Date: January 10, 2025
From: City of Los Angeles Building and Safety Department
To: West Coast Development, project 345678123 - Sunset Luxury
Condominiums
Location: Los Angeles, CA
Following an inspection of your site at 456 Sunset Boulevard, we have
identified the following building code violations:
Electrical Wiring: Exposed wiring was found in the underground parking
garage, posing a safety hazard. Fire Safety: Insufficient fire
extinguishers were available across multiple floors of the structure
under construction.
Structural Integrity: The temporary support beams in the eastern wing
do not meet the load-bearing standards specified in local building
codes.
Required Corrective Actions:
Replace or properly secure exposed wiring to meet electrical safety
standards. Install additional fire extinguishers in compliance with
fire code requirements. Reinforce or replace temporary support beams
to ensure structural stability. Deadline for Compliance: Violations
must be addressed no later than February 5,
2025. Failure to comply may result in
a stop-work order and additional fines.
Contact: For questions or to schedule a re-inspection, please contact
the Building and Safety Department at
(555) 456-7890 or email inspections@lacity.gov.
""",
]
You can read through these right now if you want, but youâll get links back to these test emails throughout the tutorial.
Work With State Graphs
Read the full article at https://realpython.com/langgraph-python/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Quiz: Python's Bytearray
In this quiz, you’ll test your understanding of Python’s Bytearray: A Mutable Sequence of Bytes.
By working through this quiz, you’ll revisit the key concepts and uses of bytearray
in Python.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Django Weblog
Django 5.2 release candidate 1 released
Django 5.2 release candidate 1 is the final opportunity for you to try out a composite of new features before Django 5.2 is released.
The release candidate stage marks the string freeze and the call for translators to submit translations. Provided no major bugs are discovered that can't be solved in the next two weeks, Django 5.2 will be released on or around April 2. Any delays will be communicated on the on the Django forum.
Please use this opportunity to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the release candidate package from our downloads page or on PyPI.
The PGP key ID used for this release is Sarah Boyce: 3955B19851EA96EF