skip to navigation
skip to content

Planet Python

Last update: June 21, 2025 04:42 AM UTC

June 20, 2025


The Python Coding Stack

I Want to Remove Duplicates from a Python List • How Do I Do It?

Another short article today to figure out ways to remove duplicate values from a list. The ideal solution depends on what you really need.

Let's explore.

Start With a List

Well, we need a list first–ideally, one with duplicate values. So, let's assume we have an online queue (line). But some people put their name in the queue more than once:

All code blocks are available in text format at the end of this article • #1 • The code images used in this article are created using Snappify. [Affiliate link]

Note how James and Kate were eager to ensure they were in the queue, so they put their name down twice.

Removing Duplicates: The Ugly Way

I was initially tempted not to include this section, but I changed my mind, as you can see. You can come up with several algorithms to perform this task "manually". It's only a few lines of code. Here's one option:

#2

You have a queue_unique empty list ready to collect unique names. Next, you iterate using enumerate() and add names to queue_unique if they don't appear in the rest of the original list. Note that I'm using slicing in the if statement to slice the list from index + 1 to the end of the list.

Let me show you another option. I'll discuss the outputs from these two manual versions later in this article:

#3

This time, you reverse the list so you can loop through the names in reverse order. The queue_unique doesn't start as an empty list this time but as a copy of the original reversed list.

In the loop, you remove names from queue_unique if the name appears later in the reversed list. A reminder that the .remove() list method only removes the first occurrence of an item. It doesn't remove all of them.

Both algorithms remove duplicates. Great. But compare the output from the two versions. The difference between these output lists gives a clue to what's coming next.

But I won't dwell on these versions any longer.

and PS: there are better versions of manual algorithms for this, but that's not the point of this first section, so let's move on!


Support The Python Coding Stack


Removing Duplicates: The Set Way

When you learn about data structures, you learn about the various characteristics they have. Then, you start comparing data structures based on these characteristics. For example, lists, dictionaries, tuples, and strings are all iterable. But lists and dictionaries are mutable, whereas tuples and strings are immutable. And lists, tuples, and strings are all sequences, but dictionaries are not–they're mappings. You can read more about some of these categories here: The Python Data Structure Categories Series

And some data structures enforce uniqueness while others don't. Lists, as you've seen above, can have several equal items–in the example above, you have several strings that are equal to each other.

However, sets are a Python data structure that can only have unique values:

#4

So, the easiest way to remove duplicates from a list is to cast it into a set:

#5

Or, if you prefer the output to still be a list, and perhaps you also want to overwrite the original variable name, then you can write the following:

#6

Now, that was easy! Much better than the several lines of code in the previous section.

However, there's an issue. If this is a queue of customers, then the order in which they joined the queue is somewhat important, I would say!

Note how the new queue list, the one without duplicates, no longer maintains the original order of the people within it. James was the first to join the queue, but Andy appears to have moved to the front when you removed duplicates.

Note that this also happened with the first of the manual algorithms in the previous section.

Sometimes, you don't care about the order of the elements in a list. If that's the case, you can cast the list into a set and then back into a list to remove duplicates.

But sometimes, the order matters. It certainly matters when dealing with a queue of customers. Let's look at another option.

Removing Duplicates: The Dictionary Way

A quick question before I carry on. Did you read the latest article I published just before this one? Here's the link: Are Python Dictionaries Ordered Data Structures?

If you haven't, now is a good time to read it. Like this one, it's a short article, so it won't take you too long.

You're back. Great.

So, you now know that since Python 3.7, there's a guarantee that the order of insertion of items in a dictionary is maintained. And dictionary keys must also be unique–you cannot have the same key appear twice in a dictionary.

Therefore, if you could create a dictionary from the elements in the list queue, you would remove duplicates but also maintain the order. And there's a dictionary class method for that:

#7

You create a dictionary from the list queue. The items in the list become keys, and each key has a default value of None. You can customise this default value, but you don't need to in this case, as you'll see in the next paragraph.

Great, you removed duplicates while maintaining order since dictionaries maintain order. The dictionary is created by iterating through the list, which explains why this version maintains the order of the items. But you don't want a dictionary, and you don't care about the values within it. So, you can cast this dictionary back into a list. You only keep the keys when you cast a dictionary into a list:

#8

You've now removed duplicates from the list and maintained the original order by converting the list into a dictionary and then back into a list.

Simple–once you know this idiom.


Do you want to join a forum to discuss Python further with other Pythonistas? Upgrade to a paid subscription here on The Python Coding Stack to get exclusive access to The Python Coding Place's members' forum. More Python. More discussions. More fun.

Subscribe now

And you'll also be supporting this publication. I put plenty of time and effort into crafting each article. Your support will help me keep this content coming regularly and, importantly, will help keep it free for everyone.


A Limitation

Both the set and dictionary routes have an important limitation. Items in a set must be hashable objects. And keys in a dictionary must also be hashable. Therefore, you can't use these techniques if you have a list that includes non-hashable objects, such as a list that contains other lists.

You can read more about hashability and hashable objects in this post: Where's William? How Quickly Can You Find Him? • What's a Python Hashable Object?

Final Words

You may need to remove duplicates from a list in Python.

Don't write your own algorithm. Life's too short for that.

If you don't care about the order of the items in the list, cast the list into a set and then back into a list: list(set(queue))

If you do care about the order, create a dictionary from the list using dict.fromkeys() and then cast it back into a list: list(dict.fromkeys(queue)).

And the set and dictionary routes to removing duplicates are also more efficient than the manual ones shown above. So, it’s a win-win.

That's it.

Photo by Lisett Kruusimäe: https://www.pexels.com/photo/flowers-in-line-on-white-background-9510861/

Code in this article uses Python 3.13

The code images used in this article are created using Snappify. [Affiliate link]

You can also support this publication by making a one-off contribution of any amount you wish.

Support The Python Coding Stack


For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!

Also, are you interested in technical writing? You’d like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.

And you can find out more about me at stephengruppetta.com

Further reading related to this article’s topic:


Appendix: Code Blocks

Code Block #1
queue = ["James", "Kate", "Andy", "James", "Isabelle", "Kate"]
Code Block #2
queue_unique = []
for index, name in enumerate(queue):
    if name not in queue[index + 1:]:
        queue_unique.append(name)


queue_unique
# ['Andy', 'James', 'Isabelle', 'Kate']
Code Block #3
queue = ['James', 'Kate', 'Andy', 'James', 'Isabelle', 'Kate']
queue.reverse()
queue    
# ['Kate', 'Isabelle', 'James', 'Andy', 'Kate', 'James']

queue_unique = queue.copy()

for index, name in enumerate(queue):
    if name in queue[index + 1:]:
        queue_unique.remove(name)
       

queue_unique.reverse()
queue_unique
# ['James', 'Kate', 'Andy', 'Isabelle']
Code Block #4
set([1, 2, 3, 4, 3, 2, 1])
# {1, 2, 3, 4}
Code Block #5
queue = ["James", "Kate", "Andy", "James", "Isabelle", "Kate"]
set(queue)
# {'Andy', 'James', 'Kate', 'Isabelle'}
Code Block #6
queue = list(set(queue))
queue
# ['Andy', 'James', 'Kate', 'Isabelle']	
Code Block #7
queue = ["James", "Kate", "Andy", "James", "Isabelle", "Kate"]
dict.fromkeys(queue)
# {'James': None, 'Kate': None, 'Andy': None, 'Isabelle': None}
Code Block #8
queue = list(dict.fromkeys(queue))
queue
# ['James', 'Kate', 'Andy', 'Isabelle']

For more Python resources, you can also visit Real Python—you may even stumble on one of my own articles or courses there!

Also, are you interested in technical writing? You’d like to make your own writing more narrative, more engaging, more memorable? Have a look at Breaking the Rules.

And you can find out more about me at stephengruppetta.com

June 20, 2025 06:36 PM UTC


Ruslan Spivak

Book Notes: Full Frontal Calculus by Seth Braver — Chapter 1 Review

Where there is life, there is change; where there is change, there is calculus.” — Seth Braver

I recently went back to studying math to rebuild my foundations for AI and machine learning. I didn’t expect to enjoy a calculus book this much. Shocking, I know. But that’s exactly what happened with Full Frontal Calculus.

Can calculus feel intuitive? Even fun? From the first few pages? For me, the answer was yes. (Okay, from page 8 to be exact.)


Why This Book Clicked for Me

As part of my self-study, I’m reviewing select chapters from the books I work through. This post covers Chapter 1 of Full Frontal Calculus by Seth Braver.

Before I stumbled on Full Frontal Calculus, I tried a few limit-based calculus books and textbooks, but none of them spoke to me. Luckily, there’s no shortage of calculus material these days, so it’s easy to shop around and try different sources.

Braver’s book grabbed me right away. The early focus on infinitesimals, the tight writing, and the emphasis on intuition won me over. I even caught myself smiling more than once. Rare for a math book.


Chapter 1 Highlights

Chapter 1 starts with infinitesimals: “an infinitely small number, smaller than any positive real number, yet greater than zero.” One early example shows how a circle, imagined as a polygon with infinitely many infinitesimal sides, leads to the familiar area formula πr². If your geometry or trig is rusty, don’t worry - it still makes sense. Braver then uses the same idea to show how curves appear straight on a small enough (infinitesimal) scale, which is the heart of differential calculus.

Things really clicked for me in the section titled A Gift From Leibniz: d-Notation. Braver’s explanation of dy/dx shows how it captures infinitesimal change in a way that just makes sense. It helped me understand why derivatives represent slopes and rates in a way I could explain to a 10-year-old. Working through the derivative of x² from first principles was also deeply satisfying.

Practically speaking, Chapter 1 covers:

The chapter ends with two powerful tools: the power rule and linearity properties. These let you compute derivatives of polynomials using just basic mental math.

The writing is sharp and often funny, in a math kind of way. There’s even a cameo by the Sumerian beer goddess Ninkasi, who helps explain rate of change and derivatives using a vat of beer. It sounds quirky, but it works.

The book’s style, clarity, and focus on intuition made me want to keep going. That’s not something I’ve felt with many math books.


Final Thoughts and Tips

If you’re following along or just curious about studying calculus again, I recommend giving Chapter 1 a shot. It’s not always light reading, and the exercises are essential, but it might click for you like it did for me. Chapter 1 is available for free on the author’s site, so you can explore it before deciding whether to dive in.

If you do decide to dive into the book, here are a few tips to get the most out of it:

  1. If you’re rusty on pre-calculus (I was), make sure you’ve got slope, rate of change, the point-slope formula, and the slope-intercept form down cold before the Rates of Change section on page 10. For that, Seth Braver’s other book Precalculus Made Difficult has excellent material on those topics. You can probably get through it in a day.
  2. Read slowly, with a pen or pencil in hand. Write in the margins (get a paperback copy). It might feel painfully slow at times (pun intended), but it’s a recipe for deeper understanding.
  3. The book includes answers to many exercises and is great for self-study. But the solutions are compact, so I recommend using Grok or ChatGPT to expand on them and deepen your understanding.
  4. Once you’ve finished the chapter and exercises, check out the author’s YouTube videos that go along with the book. They’re criminally underrated and oddly hard to find. You might enjoy them as much as I do.
  5. For topics that are hard to retain, try spaced repetition with active recall. Anki works great for that, or use whatever tool you prefer.


Chapter 1 sealed the deal. This is the calculus book I’m sticking with. Looking forward to seeing how Braver develops the ideas from here.

More to come. Stay tuned.

Originally published in my newsletter Beyond Basics. If you’d like to get future posts like this by email, you can subscribe here.

P.S. I’m not affiliated with the author. I just really enjoy the book and wanted to share it.

June 20, 2025 02:07 PM UTC


Real Python

The Real Python Podcast – Episode #254: Scaling Python Web Applications With Kubernetes and Karpenter

What goes into scaling a web application today? What are resources for learning and practicing DevOps skills? This week on the show, Calvin Hendryx-Parker is back to discuss the tools and infrastructure for autoscaling web applications with Kubernetes and Karpenter.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 20, 2025 12:00 PM UTC

June 19, 2025


EuroPython

June Newsletter: Last Chance for Tickets!

Hello, Pythonistas! 🐍

We added a lot of new subscribers since the last newsletter – if this is your first newsletter – Welcome! 🎉

TL;DR:

⏰ Tickets

We’re excited to share that tutorial and combined tickets are now sold out

Conference tickets are still available – but don’t wait too long. Late Bird pricing kicks in on June 27, and prices will go up! If you can’t attend in person please check our Remote tickets – those are already available in the tickets store. 

More details at https://europython.eu/tickets 

Platinum, Gold and Silver Sponsorship packages are now fully booked. If you’re interested in sponsoring, please contact us at sponsoring@europython.eu. We’d love to explore options with you! We’ve also added a new startup tier – contact us for more details 🙂

🎥 Documentary on the History of Python

The filmmakers from Cult Repo, formerly known as Honeypot, are working on a documentary about the history of Python and its vibrant community. It features over 20 core developers and takes us on a journey from the first days of Python to the latest developments. 

At EuroPython, we’re excited to share a special preview of the film, followed by a Q&A with Brett Cannon, Paul Everitt, and Armin Ronacher. 

🖤 Memorial session for Michael Foord

As part of EuroPython, we will be holding a memorial session to commemorate Michael Foord. 

Michael Foord (1974-2025) was a central figure in the Python community. He was an original thinker whose warmth, passion, and unfiltered humor touched the lives of many. A core Python developer and the creator of the influential unittest.mock module, he played a pivotal role in shaping testing practices and helped establish the Language Summit at PyCon. More than a brilliant engineer, Michael was a beloved mentor and friend, known for his elaborate vaping setup, colorful attire, and heartfelt conversations. His passing earlier this year left a profound void in the community, but his legacy lives on through his contributions to Python, his generous spirit, and the countless moments of camaraderie he inspired.

Friends of Michael are invited to attend this session and share their memories. We will provide more details about it closer to the event.

© All rights reserved by Nicholas Tollervey

© All rights reserved by Kushal Das

❣️Beginners’ Day

On Saturday 19th July, we’ll be hosting a Beginners’ Day to help introduce people to Python programming and its applications. Beginners’ Day will feature three tracks running in parallel; The Unconference, Django Girls, and Humble Data. The events are designed to welcome newcomers to the Python ecosystem, including a series of talks and panels by junior developers and two workshops designed to introduce complete beginners to web development and data science.

We are running the following three tracks:

Beginners’ Day is open to everyone, and you don’t need a EuroPython ticket to attend (although note that some tracks will cost €5 to attend otherwise). From students to those exploring a career change, we warmly invite anyone curious about starting their programming journey. Expect a friendly, fun, and supportive environment that will leave you feeling more confident and inspired to continue learning.

Please see this page for more details and to apply. Places are limited and will be given on a first come, first serve basis.

👩‍💻 Contribute, collaborate, code: sprints weekend 

Join us for EuroPython&aposs traditional Sprint Weekend on Saturday and Sunday (19–20 July) following the main conference. The conference team provides space, lunch, and coffee—you bring the projects, energy, and ideas. Whether you’re a seasoned maintainer or trying your first contribution, sprints are informal hackathons to collaborate on open‑source, share knowledge, and solve problems together. 

More info: europython.eu/sprints 

alt

🏖️ Pack your instruments, sportswear, or board games

We’ll host a laid‑back social evening on Thursday, 17 July at 19:30 CEST on Střelecký Island—right in the heart of Prague. Expect riverside seating, live music and jam sessions (feel free to bring an instrument), plus board games and plenty of relaxation spots. There&aposs also a mix of outdoor sports (volleyball, croquet, pétanque) and light snacks and drinks for a summery, informal vibe. 

A limited number of social-event tickets will be available separately—keep an eye out so you don’t miss out. 

More info: europython.eu/social-event 

alt

👥 Community Organisers & PyLadies Events

The Python community is an essential part of the language, and for many people, it’s the reason they stick around and keep meetups, conferences, forums, and so much more running to help others.

We have several activities focused on communities across Europe and around the world, as well as initiatives centered around Python itself.

https://europython.eu/community-activities/

We’re excited to announce a range of events for underrepresented groups in computing this year! 🎉 Whether you’re new to PyLadies or a long-time supporter, we warmly welcome you to join us and be part of our supportive community.

These events are open only to those who have a conference ticket, giving our participants an opportunity to connect, share, and grow together.

https://europython.eu/pyladies/

🍬 Snacks exchange

Have you ever wondered what people snack on in Spain? Or wanted to try chocolates from Australia? Then participate in the EuroPython snack exchange! 

Simply bring snacks typical of your home country, country of residence, or just a country you think has really delicious food with you to EuroPython. At the conference you’ll be able to swap what you brought with other participants in the exchange. Don’t miss your chance to discover your new favourite snack, and share in the fun with our attendees from across Europe and the globe!

🎤 Speaker guidelines

We’ve uploaded a number of suggestions to help you to prepare your session. The guidelines include information about:

👉 Check them out here: https://ep2025.europython.eu/guidelines/

🎤 Speaker Mentorship Programme

First Time Speakers’ Workshop

We had such a fun, interactive session—thank you to everyone who showed up. A huge thank you to Cristián Maureira-Fredes from the Programme team for walking us through the details of giving a talk at EuroPython. We also loved hearing from Iryna Kondrashchenko, who shared how much last year’s Speaker Mentorship Programme helped her speaking journey.

A huge shoutout to our inspiring panel—Abigail Mesrenyame Dogbe, Laís Carvalho, and Rodrigo Girão Serrão. Thank you for sharing your personal experiences as speakers, answering the questions, and offering honest and encouraging advice.

🎥 Missed it? You can watch the recording here: https://youtu.be/a2ZajKY6bm0

❓What does Guido van Rossum like about EuroPython?

And what about speakers, core developers, and other community members? Find out by following us on YouTube and social media! We&aposre sharing short clips where community members talk about what they’re most excited for at the next EuroPython. 

alt

💰Sponsors Highlight

We would like to thank our sponsors for supporting the conference. Their generous contributions help us keep the event more accessible and ticket prices lower. Sponsors play a vital role in making this community gathering possible.

Special thanks go our platinum sponsors: 

alt

💞 Upcoming Events in the Python Community

👋 See You All Next Month 

Enjoyed this update? Help us spread the word! Like, share, and subscribe — and don’t forget to tell your friends about us.

Someone shared this with you? Join the list at blog.europython.eu to get these directly every month.

Think others in your Python circle would be interested? Forward the email and share it with them. 🙂

Stay connected with us on social media:

June 19, 2025 08:55 PM UTC


PyCharm

Training Your ML Models With Cadence

In the rapidly evolving domains of machine learning (ML) and artificial intelligence (AI), the tools and technologies used by developers can significantly influence the speed, efficiency, and effectiveness of their projects. Recognizing this, we introduced Cadence in PyCharm 2025.1, a plugin that merges the ease of local development with advanced cloud computing capabilities.

Why Cadence?

Cadence makes it possible to run your code on powerful cloud hardware directly from PyCharm. This integration alleviates the typical complexities and extensive setup usually associated with cloud computing. Whether you’re a solo developer experimenting with new models or part of a larger team pushing the boundaries of ML applications, Cadence ensures that your transition to powerful cloud resources is seamless and straightforward.

Serverless computing on demand

Reduce overhead with Cadence’s serverless computing options, allowing you to access and manage GPUs with transparent and predictable per-second billing. This removes the need for significant upfront investments in hardware, making advanced computing power accessible at any scale.

Run your code as is

With Cadence, your existing PyCharm projects require no modifications to fit into the cloud environment. Upload and execute your code as usual; Cadence handles all of the adjustments on the back end, ensuring your cloud session feels like an extension of your local setup.

Tailored for PyCharm users

Debug and deploy using the PyCharm interface you’re familiar with. Set breakpoints, monitor outputs, and interact with your remote environment with no additional learning curve.

Data management simplified

Say goodbye to manual data transfers. Cadence automatically synchronizes your projects’ data to the cloud, allowing you to download the results of each experiment directly in the IDE.

Reliable experimentation

Review, refine, and rerun your past experiments. Cadence provides consistent replication of results, facilitating continuous improvements.

Optimized resource allocation

Choose from a wide array of cloud settings, including configurations like 8xA100 and 8xH100, to scale your resources according to project demands. Schedule as many tasks as you need simultaneously, and Cadence will automatically check for available hosts in different regions and zones.

Ready for teams

Adopting Cadence isn’t just about improving individual productivity; it’s about enhancing team dynamics and output. Share setup configurations, results, and insights effortlessly within your team. 

Getting started with Cadence

You can try Cadence for free with a USD 30 welcome credit by installing the plugin from JetBrains Marketplace or by enabling it directly in PyCharm via Settings | Plugins | Marketplace

To see how easy it is to start training your ML models in PyCharm, check out this tutorial video.

June 19, 2025 12:17 PM UTC

June 18, 2025


Talk Python Blog

New Theme Song: Served In A Flask

Those of you who were early listeners of Talk Python To Me might remember the amazing theme song we launched with: Developers, Developers, Developers by Smixx. Thanks to Smixx for letting us use his music for our intros.

Over the years, people have asked “What happened to the rap song”? I took it down for a couple of reasons not worth digging into but have definitely missed the fun and irreverant intro to the show.

June 18, 2025 06:55 PM UTC


Real Python

Python Project: Build a Word Count Command-Line App

The word count command (wc) is a classic utility that you might use to determine the number of lines, words, and bytes in files or standard input. It’s a staple tool for anyone working with text files on Unix-like systems. But have you ever wondered how such a tool is designed and implemented?

In this practice exercise, you’ll dive into the inner workings of the Unix wc command by building its simplified version from scratch using Python. Not only will this coding challenge solidify your understanding of file handling and text processing, but it’ll also give you a taste of how to structure command-line utilities in Python.

By the end of this challenge, you’ll have a functional version of the wc command that can faithfully reproduce the outputs you’re accustomed to seeing in a Unix terminal. However, it won’t be an exact replica of the wc command, as you’ll omit or adapt some features for simplicity.

In this coding challenge, you’ll:

While working on this challenge, you’ll gain hands-on experience with several modules from Python’s standard library, such as pathlib for manipulating the file system and argparse for parsing command-line arguments. Familiarity with basic Python programming and file handling will be beneficial, but detailed instructions and helpful tips will guide you through each step of the process.

The challenge is broken down into a number of tasks, each accompanied by clear instructions and illustrative examples. You’ll receive automated feedback on your solutions when you follow along using the accompanying materials. If you run into any issues or have questions, then don’t hesitate to ask for help in the comments section below the corresponding lesson.

Note: You can also reach out to the Real Python community on Slack or join live conversations during Office Hours, where you’ll have an opportunity to share your screen remotely.

Completing each task unlocks the next one. Once you’ve completed a task, you can compare your code with the sample solution provided in the following lesson. Remember that there’s often more than one way to solve a problem. If your solution differs slightly but meets the acceptance criteria and adheres to good programming practices, then that’s perfectly fine.

Good luck!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 18, 2025 02:00 PM UTC


Talk Python to Me

#510: 10 Polars Tools and Techniques To Level Up Your Data Science

Are you using Polars for your data science work? Maybe you've been sticking with the tried-and-true Pandas? There are many benefits to Polars directly of course. But you might not be aware of all the excellent tools and libraries that make Polars even better. Examples include Patito which combines Pydantic and Polars for data validation and polars_encryption which adds AES encryption to selected columns. We have Christopher Trudeau back on Talk Python To Me to tell us about his list of excellent libraries to power up your Polars game and we also talk a bit about his new Polars course.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/agntcy'>Agntcy</a><br> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <h2 class="links-heading">Links from the show</h2> <div><strong>New Theme Song (Full-Length Download and backstory)</strong>: <a href="https://talkpython.fm/flasksong/" target="_blank" >talkpython.fm/blog</a><br/> <br/> <strong>Polars for Power Users Course</strong>: <a href="https://training.talkpython.fm/courses/polars-for-power-users" target="_blank" >training.talkpython.fm</a><br/> <strong>Awesome Polars</strong>: <a href="https://github.com/ddotta/awesome-polars?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Polars Visualization with Plotly</strong>: <a href="https://docs.pola.rs/user-guide/misc/visualization/#plotly" target="_blank" >docs.pola.rs</a><br/> <strong>Dataframely</strong>: <a href="https://github.com/Quantco/dataframely?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Patito</strong>: <a href="https://github.com/JakobGM/patito?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars_iptools</strong>: <a href="https://github.com/erichutchins/polars_iptools?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars-fuzzy-match</strong>: <a href="https://github.com/bnmoch3/polars-fuzzy-match?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Nucleo Fuzzy Matcher</strong>: <a href="https://github.com/helix-editor/nucleo?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars-strsim</strong>: <a href="https://github.com/foxcroftjn/polars-strsim?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars_encryption</strong>: <a href="https://github.com/zlobendog/polars_encryption?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars-xdt</strong>: <a href="https://github.com/pola-rs/polars-xdt?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>polars_ols</strong>: <a href="https://github.com/azmyrajab/polars_ols?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Least Mean Squares Filter in Signal Processing</strong>: <a href="https://www.geeksforgeeks.org/least-mean-squares-filter-in-signal-processing/?featured_on=talkpython" target="_blank" >www.geeksforgeeks.org</a><br/> <strong>polars-pairing</strong>: <a href="https://github.com/apcamargo/polars-pairing?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Pairing Function</strong>: <a href="https://en.wikipedia.org/wiki/Pairing_function?featured_on=talkpython" target="_blank" >en.wikipedia.org</a><br/> <strong>polars_list_utils</strong>: <a href="https://github.com/dashdeckers/polars_list_utils?featured_on=talkpython" target="_blank" >github.com</a><br/> <strong>Harley Schema Helpers</strong>: <a href="https://tomburdge.github.io/harley/reference/harley/schema_helpers/?featured_on=talkpython" target="_blank" >tomburdge.github.io</a><br/> <strong>Marimo Reactive Notebooks Episode</strong>: <a href="https://talkpython.fm/episodes/show/501/marimo-reactive-notebooks-for-python#links-section" target="_blank" >talkpython.fm</a><br/> <strong>Marimo</strong>: <a href="https://marimo.io/?featured_on=talkpython" target="_blank" >marimo.io</a><br/> <strong>Ahoy Narwhals Podcast Episode Links</strong>: <a href="https://talkpython.fm/episodes/show/480/ahoy-narwhals-are-bridging-the-data-science-apis" target="_blank" >talkpython.fm</a><br/> <strong>Watch this episode on YouTube</strong>: <a href="https://www.youtube.com/watch?v=aIdvlJN1bNQ" target="_blank" >youtube.com</a><br/> <strong>Episode #510 deep-dive</strong>: <a href="https://talkpython.fm/episodes/show/510/10-polars-tools-and-techniques-to-level-up-your-data-science#takeaways-anchor" target="_blank" >talkpython.fm/510</a><br/> <strong>Episode transcripts</strong>: <a href="https://talkpython.fm/episodes/transcript/510/10-polars-tools-and-techniques-to-level-up-your-data-science" target="_blank" >talkpython.fm</a><br/> <br/> <strong>--- Stay in touch with us ---</strong><br/> <strong>Subscribe to Talk Python on YouTube</strong>: <a href="https://talkpython.fm/youtube" target="_blank" >youtube.com</a><br/> <strong>Talk Python on Bluesky</strong>: <a href="https://bsky.app/profile/talkpython.fm" target="_blank" >@talkpython.fm at bsky.app</a><br/> <strong>Talk Python on Mastodon</strong>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" ><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <strong>Michael on Bluesky</strong>: <a href="https://bsky.app/profile/mkennedy.codes?featured_on=talkpython" target="_blank" >@mkennedy.codes at bsky.app</a><br/> <strong>Michael on Mastodon</strong>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" ><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>

June 18, 2025 08:00 AM UTC

June 17, 2025


PyCoder’s Weekly

Issue #686: Free-Threaded Update, GPU Programming, GitHub Actions, and More (June 17, 2025)

#686 – JUNE 17, 2025
View in Browser »

The PyCoder’s Weekly Logo


State of Free-Threaded Python

This is a blog post from the Python Language Summit 2025 giving an update on the progress of free-threaded Python. You may also be interested in the complete list of Language Summit Blogs.
PYTHON SOFTWARE FOUNDATION

GPU Programming in Pure Python

Talk Python interviews Bryce Adelstein Lelbach and they talk about using Python to harness the insane power of modern GPUs for data science and ML.
KENNEDY & LELBACH podcast

Making Friends with Agents: A Mental Model for Agentic AI

alt

Explore a mental model to befriend your AI agent. This blog walks through designing goal-driven, tool-savvy agents that think in loops, speak your language, and bounce back from failure through durable execution →
TEMPORAL sponsor

Continuous Integration and Deployment Using GitHub Actions

Agile methodologies rely on robust DevOps systems to manage and automate common tasks in a continually changing codebase. GitHub Actions can help.
REAL PYTHON course

NumPy v2.3.0 Released

GITHUB.COM/NUMPY

Call for Applicants for a Django Fellow

DJANGO SOFTWARE FOUNDATION

Django Bugfix Releases: 5.2.3, 5.1.11, and 4.2.23

DJANGO SOFTWARE FOUNDATION

Python 3.13.5 Released

PYTHON.ORG

scikit-learn 1.7 Released

SCIKIT-LEARN.ORG

Python Jobs

Sr. Software Developer (Python, Healthcare) (USA)

Prenosis

Senior Software Engineer – Quant Investment Platform (LA or Dallas) (Los Angeles, CA, USA)

Causeway Capital Management LLC

More Python Jobs >>>

Articles & Tutorials

A dict That Can Report Which Keys Weren’t Used

When testing, you may want to make sure that all parts of a dictionary get accessed to get full coverage. This post shows a modified dict that tracks which keys got used.
PETER BENGTSSON

Better Django Management Commands

Writing Django management commands can involve a ton of boilerplate code. This article shows you how to use two libraries that could cut your management command code in half: django-click and django-typer.
REVSYS

Easy-to-Deploy, Enterprise-Ready GenAI

Check out the Intel GenAI code library for ready-to-deploy and easy-to-integrate solutions.
INTEL CORPORATION sponsor

How Can You Structure Your Python Script?

Structure your Python script like a pro. This guide shows you how to organize your code, manage dependencies with PEP 723, and handle command-line arguments.
REAL PYTHON

Quiz: How Can You Structure Your Python Script?

In this quiz, you’ll test your understanding of organizing and structuring Python scripts. You’ll revisit key concepts about best practices for writing clear, maintainable, and executable Python code.
REAL PYTHON

Wyvern’s Open Satellite Feed

Wyvern is a satellite startup who has recently launched an open data program. This article plays with that data using Python libraries such as astropy, geocoder, rich and more.
MARKSBLOGG.COM

Pointblank: Data Validation Made Beautiful

This post introduces pointblank a library for doing data validation. It includes chainable execution and interactive reports to see what is working in your data pipeline.
POSIT-DEV.GITHUB.IO

5 Non-LLM Software Trends to Be Excited About

Tired of reading about AI and LLMs? This post talks about other tech that is rapidly changing in the software world, including local-first applications, web assembly, the improvement of cross-platform tools, and more.
LEONARDO CREED

Concurrency in async/await and Threading

Want to write faster Python code? Discover the difference between async/await and threading and how concurrency works in Python with real-world examples.
CHEUK TING HO

Defining Your Own Python Function

Learn how to define your own Python function, pass data into it, and return results to write clean, reusable code in your programs.
REAL PYTHON

Quiz: Defining Your Own Python Function

REAL PYTHON

TIL: HTML 404 Errors for FastHTML

A quick “Things I’ve Learned” post showing how to write a custom HTTP 404 handler for FastHTML.
DANIEL ROY GREENFIELD

PyData Virginia 2025 Talks

A list of the recorded talks from PyData Virginia 2025.
YOUTUBE.COM video

Projects & Code

py-shiny: Shiny for Python Web Apps

GITHUB.COM/POSIT-DEV

quarto-cli: Scientific and Technical Publishing System

GITHUB.COM/QUARTO-DEV

paramiko: Native Python SSHv2 Library

GITHUB.COM/PARAMIKO

toolz: A Functional Standard Library for Python

GITHUB.COM/PYTOOLZ

ahocorasick_rs: Check for Multiple Patterns in a Single String

GITHUB.COM/G-RESEARCH

Events

Weekly Real Python Office Hours Q&A (Virtual)

June 18, 2025
REALPYTHON.COM

PyData Bristol Meetup

June 19, 2025
MEETUP.COM

PyLadies Dublin

June 19, 2025
PYLADIES.COM

Python Nordeste 2025

June 20 to June 23, 2025
PYTHONNORDESTE.ORG

Python Coding Club for Teens (PyTahoua)

June 20 to June 23, 2025
PYTHONNIGER.ORG


Happy Pythoning!
This was PyCoder’s Weekly Issue #686.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

June 17, 2025 07:30 PM UTC


Adrarsh Divakaran

Will AI Replace Junior Developers? I Asked Experts at Pycon US

June 17, 2025 05:42 PM UTC


Django Weblog

DSF member of the month - Elena Williams

For June 2025, we welcome Elena Williams as our DSF member of the month! ⭐

Elena in DjangoGirls Brisbane

Elena is a dedicated member of the Django community. She is part of the Code of Conduct Working Group and she is a Django Girls organizer in Australia. She has been a DSF member since July 2014.
You can learn more about Elena by visiting Elena's website and her GitHub Profile.

Let’s spend some time getting to know Elena better!

Can you tell us a little about yourself (hobbies, education, etc)

My background is that I was always interested in computers, though my parents were more creative types, my Dad was an Architect (of built structures). When I was a kid we had computers for CAD around the house before it was common. I was always into STEM subjects, but unfortunately in that era for girls to do engineering it was a bit too hostile for me, so I trained in finance instead and worked in that industry (finance and banking, MNE orgs) for nearly a decade. I kept coming back to coding and was always building computers, and was obsessed with the internet as a technology from pretty early on. Just after I discovered Django I did a Masters in Computing at ANU. To this day my main hobbies are programming/webdev (very much a person who codes for fun) and the open source community. My persistent other hobbies are hackspace activities, I like CNC and laser stuff, but will pick up any and all tools/mediums and give them a go, lately been spending time with blender and cabinetry. When I can, I like to get away to snowboard or kitesurf, and this wild Australian long distance endurance navigation sport called rogaining. Really at the moment I’m mostly focussed on being a parent (which is an awesome experience), my friends (mostly python related), my job and working on the community here in Australia. With my family we go camping/hiking more than most. I’ve also continued to be a sessional academic at ANU teaching software engineering for many years.

How did you start using Django?

Golly, I’ve been using Django forever. I’d started doing web stuff in the early ‘00s and worked in a range of languages and paradigms. I was working in a physics research institute at a high profile university in Australia doing web stuff and made friends with a bunch of the doctoral students. In around 2007, one of these students, and my good mate, Dave, randomly recommended this new framework Django and Python (and emacs also actually but that’s a different story). Basically I got into it immediately and never looked back and went on to build a career around Django (actually Dave later gave up physics and did the same thing too). I’ve been engaged with the Python and Django communities to varying degrees since about 2011 as well. To be honest when I discovered the language and the framework I really didn’t expect to still be passionate about them all these years later but I really am! Hopefully I can continue to be well into the future also.

What other framework do you know and if there is anything you would like to have in Django if you had magical powers?

Over the years (being a curious person) I’ve worked with many many web frameworks and technologies, the vast majority of the big ones. In recent years I’ve been spending time with FastAPI and SQLAlchemy as well as non-python technologies. Django is better though.

Not using Django as much at the moment makes me love it even more and realise how lucky we are with such a well designed and well supported framework. It’s not perfect but it’s outstanding.

Having said that: at a technical level I’d love to have “cheaper” ways (in every sense) to deploy. Even though deployment methods have changed beyond recognition several times over the years, I always thought this would get easier over time and am kind of surprised that it hasn’t.

Very specific to me is that I need Django to have stronger support for many database schemas in the same project, but honestly this is just a specific problem I have inherited in a project at the moment, but it’ll pass eventually.

What projects are you working on now?

Over the last few years I’ve helped organise a number of events, including PyConAU, though realised I’d been taking on too many projects and trying to pull back actually! Still: Internationally I’m on DSF CoC with a great team. Nationally this year I’ve been serving on the committee of our main Australian open source foundation body, Linux Australia, as well as working in a small team trying to bring together all of the Australian python user groups under a banner we hope to call Python Australia and I’ve had a keen interest in python user groups around the world. In my home town I’ve been organising our local user groups for some time with an awesome team, as well as our fantastic local PyLadies.

For work I’m flat-chat working in a senior role on a Platform team in a small data company that provides “critical digital infrastructure” for Australia. Though my most important project of all at the moment really is my family, and I do really prioritise my friends and being healthy nowadays. I’m an avid hackerspace person and do have a couple of purportedly active projects (I’m obsessed with maps among other things) but these are relatively neglected at the moment as I just don’t have the bandwidth.

Which Django libraries are your favorite (core or 3rd party)?

I just love the ORM. We’re so spoiled in the Django community we don’t realise how mature and feature-rich the ORM is. Maybe I’m biased because I’ve been using it for so long I just “think” in Django ORM and I’ve been working away from it lately. It’s such a (comparative) pleasure to use. You can nit-pick at it but compared to anything else it’s so beautifully thought through.

The admin was the Django “killer app” in 2008 and I’d argue still is in 2025. To be some dozens of seconds away from a custom CMS backend at any time is still magical. Pony magical. It’s still as impressive as ever to show off to people. Also in the same way that Guido says python makes a great calculator: Django makes a great quick tool for really fast data munging, can’t describe how liberating it feels using it for this purpose.

Writing tests in Django is under-rated too.

There are so many amazing 3rd party libraries, too many to mention. For shout-outs I don’t think I have any projects without Debug Toolbar. The 3rd party caching libraries Memcache and Redis are both great. I’m also usually happy when I turn on Celery, and excited to see DEP-0014 on its way. Danny and Audrey’s Django Cookiecutter project is a great reference even if you don’t take the whole enchilada.

What are the top three things in Django that you like?

I’ve been lucky to generally have had a pretty great time with Django. Generally I’ve used it for projects where it was a really good fit and so it wasn’t painful. As such I like weird little quirky things about Django. Haters-can-hate but I actually really like a bunch of this controversial stuff, for example I like settings.py as a pattern for projects that aren’t out of control; I enjoy using and customising the management commands framework; I think Meta class as an approach to that type of awkward problem is neat; I’ve generally had a pretty nice time with the template language; I dig into utils and reuse them probably more often than most; ORM and the Tests obviously (it’s trivial to plugin pytest of course). Everything is a trade-off in software engineering and while I’m very biased: I just like the trade-offs that Django has chosen, they’re some of the best-in-class.

The top 3 things though? This is tough. I just like it. To nail down actual answers though:

I know you have start Django with one of the first version, what do you think of the evolution of the framework?

This is a great question! Thanks for being interested in this history, the Django history is a nice story of having good values and persisting and this actually being successful over the long run.

For me there’s all the “back in my day” stuff that’s not obvious now, like Python not being taken seriously as a “real” programming language, let alone javascript, but now those tides have very much turned, and web development is considered extremely respectable and high profile, which was unimaginable when I started. Django started in Web1.0 (whatever that meant), and actually grew substantially during Web2.0 and now even in the modern Web3 era is kind of establishing itself into being part of the backbone of the large parts of the internet that aren’t obvious. Thibaud has a list he maintains of websites that he believes use Django, this is great if you haven’t seen it.

One of the most impressive parts of the evolution has been how decisions have been made and implemented. In normal “work” you just have to make things as fast as possible and endlessly add features consequences-be-damned. Open source gets to be fundamentally the opposite. Traditionally one of the defining characteristics of Open Source is that “time is no object”. That is good design and implementation can be allowed the time to breathe and be excessively thought through. There is no rush or deadline. While there’s always conflict and drama I think there has been less so in Django than in most other projects as design decisions have been painstakingly threshed out and perfected in mailing lists, tickets, DEPs and forums over the months and years it takes to make them. The people inside see the drama but we’re in the news almost never compared to most projects in the same space. The point is that hypothetically it’s possible to try to make the best possible design decisions. In practice most projects don’t do this, but I think Django has demonstrated exemplary maturity in trying to pursue this ideal, and is regularly recognised for it.

The original founding team deserve full credit for instilling this culture and each successive group of stewards deserve credit for preserving it.

There have (and always will be) missteps. For example CBVs are such an obviously good idea on paper, but in practice people don’t think so. On the other hand Andrew Godwin’s implementation of migrations back in the day, that was completely re-writing South from scratch, was truly lovely, even though it was a battle to get to the point of having migrations at all. There’s the history around the db module, which pretty much everyone was too scared to touch after Malcolm died until there were some impressive breakthroughs in it during the “under the hood” sessions not long after DjangoGirls people started coming on board.

Django consciously has decided to be extremely considered in its adoption of change and this has been a great thing. Other frameworks have generally been more cavalier, while Django has been steady, careful and reliable. The other full-feature frameworks are kind of in decline, or have hurt themselves by too-much-change-too-fast, while Django has steadily slowly grown and is the trusty go-to tool for a certain kind of job.

Now moving forward I see focus on the very subtle things that make the framework nicer to use and understand, On just making the core capabilities better and more reliable and performant, and only very very carefully adding features.

In an age where so much quality degradation is occurring, it inspires hope that projects like Django can persist as beacons of high quality, held together by a small group and big community of thoughtful, caring individuals. Hopefully this is something we can continue for a long time into the future also!

You are part of the Code of Conduct working group, how is it to work with the working group? Do you have space available for new members? What does it require according to you?

Code of Conduct WGs are slightly niche and exposed to a certain kind of work and responsibility. Not to mention that respecting many sensitives and view-points is necessary. It also means having the guts to tell people “that’s not how it’s done here” when it needs to be said. Personally it’s a kind of work I’ve grown to be passionate about. I truly believe having a great culture is at the core of community (and really anything good) and can be a complex balancing act of competing factors and emotions. It’s certainly not the kind of thing everyone is into, but if you are, the WG is looking for more diversity, if nothing else it’s tending slightly older at the moment.

Having said that: Within all of the open source communities from local to international levels there’s always space for people who are willing to turn up and help!

Join your local community! Find the parts of community that “speak” to you. Maybe it’s starting a meetup, helping your local conference, running a DjangoGirls. Maybe it’s something engineer-related like finally adding something to an open source library that you’re into, adding some beginner docs somewhere, or engaging with Djangonaut Space. Maybe it’s something online like helping out in forum.djangoproject.com, Reddit or Discord.

As organisers we have this cheat code for finding new people to invite to help more, it’s called “looking for chair-stackers”, that is people who engage to help in the little ways, such as helping stack chairs at the end of an event or generally pack down, wipe up, carry boxes or put things away. Or online: people who go out of their way to try to understand and chip in to manage extra rules, or answer the unanswered thing that’s been sitting there for a while. Or people who just ask “can I help out with that?” when the organisers seem tired or stressed out. Having people around who help in these ways has huge value and has been the beginning of many people being involved in communities and making life-long friends and connections.

Now more than ever though, it’s so important to connect to your community. We are stronger, better and healthier when we are connected to and relied on by other people and we have others we can share our experiences with.

Particularly us computer people tend not to be as good with connecting with other people, but everyone should find their way to get out and connect! It’s sometimes hard but it’s always better.

You have organized many DjangoGirls in Australia, how did you start? Do you have any advice for someone who would like to organize a DjangoGirls event?

In 2014 I was living in Perth, Australia, where Russell Keith Magee is based and we had a budding Python/Django User Group. At one of the meetings news emerged about how Ola and Ola were running this thing called “DjangoGirls” at EuroPython in a few weeks. PyConAU was scheduled a couple of weeks after this. I was like, that’s a great idea, I can absolutely have a go at doing that and emailed them immediately asking if I could copy their materials and plan. We pulled it together with an amazing bunch of people and I think this was technically the 2nd DjangoGirls event ever. In the following years I’ve been involved in many more, including the first North American DjangoGirls. From our Perth series of events a successful organisation was spun off called SheCodes.

In the more-than-a-decade since then the world has changed so much! Particularly in the tech world. I would say specifically for DjangoGirls events, they are very region specific. My first advice for organising an event in your region is to see if there’s been one previously and reach out to the event organisers, or at least the nearest organisers – I think these days there are few places on earth that haven’t had a DjangoGirls event nearish-by. The resources on the website are actually great for getting going and the international DjangoGirls team are lovely, but also always looking for more help.

Where I live now, back in the capital, Canberra, we are very well supported for education services. We held a DjangoGirls event a couple of years ago, but for the attendees what emerged was that what we really wanted was just to connect with other technical women.

Now what has been very successful for us is an ongoing PyLadies/Women’s Software group who meet up regularly and talk about things that matter to our experience. We use the “lean-coffee” model and it’s been unexpectedly functional. This has been one of the best groups I’ve ever been in with a range of technical women regularly sharing our weird and statistically unusual experiences together, it feeds the soul, and is strongly recommended if you don’t participate in a group like this already.

Is there anything else you’d like to say?

A final shout out to the original leaders of the Django community, for me personally Russell, Jeff, Jacob, Andrew and Baptiste in particular, but everyone who has persisted over the years in just turning up over the long haul and keeping our part of the world as beautiful as can be. My friends Dave, Matt and Jonah. Thibaud is a great president right now. Rarely is there a dedicated Django person who is not absolutely delightful and I feel both proud and honoured to be part of this community. A big thank you to everyone (especially you Sarah! And all the Sarahs, Natalias, Lillys and Olas) who help to make Django what it is.


Thank you for doing the interview, Elena !

June 17, 2025 05:09 PM UTC


Python Insider

Python 3.14.0 beta 3 is here!

It’s 3.14 beta 3!

https://www.python.org/downloads/release/python-3140b3/

This is a beta preview of Python 3.14

Python 3.14 is still in development. This release, 3.14.0b3, is the third of four planned beta releases.

Beta release previews are intended to give the wider community the opportunity to test new features and bug fixes and to prepare their projects to support the new feature release.

We strongly encourage maintainers of third-party Python projects to test with 3.14 during the beta phase and report issues found to the Python bug tracker as soon as possible. While the release is planned to be feature-complete entering the beta phase, it is possible that features may be modified or, in rare cases, deleted up until the start of the release candidate phase (Tuesday 2025-07-22). Our goal is to have no ABI changes after beta 4 and as few code changes as possible after the first release candidate. To achieve that, it will be extremely important to get as much exposure for 3.14 as possible during the beta phase.

This includes creating pre-release wheels for 3.14, as it helps other projects to do their own testing. However, we recommend that your regular production releases wait until 3.14.0rc1, to avoid the risk of ABI breaks.

Please keep in mind that this is a preview release and its use is not recommended for production environments.

Major new features of the 3.14 series, compared to 3.13

Some of the major new features and changes in Python 3.14 are:

New features

Note that PEPs 734 and 779 are exceptionally new in beta 3!

(Hey, fellow core developer, if a feature you find important is missing from this list, let Hugo know.)

For more details on the changes to Python 3.14, see What’s new in Python 3.14. The next pre-release of Python 3.14 will be the final beta, 3.14.0b4, scheduled for 2025-07-08.

Build changes

Incompatible changes, removals and new deprecations

Python install manager

The installer we offer for Windows is being replaced by our new install manager, which can be installed from the Windows Store or our FTP page. See our documentation for more information. The JSON file available for download below contains the list of all the installable packages available as part of this release, including file URLs and hashes, but is not required to install the latest release. The traditional installer will remain available throughout the 3.14 and 3.15 releases.

More resources

And now for something completely different

If you’re heading out to sea, remember the Maritime Approximation:

π mph = e knots

Enjoy the new release

Thanks to all of the many volunteers who help make Python Development and these releases possible! Please consider supporting our efforts by volunteering yourself or through organisation contributions to the Python Software Foundation.

Regards from sunny Helsinki with 19 hours of daylight,

Your release team,
Hugo van Kemenade
Ned Deily
Steve Dower
Łukasz Langa

June 17, 2025 02:43 PM UTC


Real Python

Exploring Python's list Data Type With Examples

The list class is a fundamental built-in data type in Python. It has an impressive and useful set of features, allowing you to efficiently organize and manipulate heterogeneous data. Knowing how to use lists is a must-have skill for you as a Python developer. Lists have many use cases, so you’ll frequently reach for them in real-world coding.

By working through this video course, you’ll dive deep into lists and get a solid understanding of their key features. This knowledge will allow you to write more effective code by taking advantage of lists.

In this video course, you’ll learn how to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 17, 2025 02:00 PM UTC


Mike Driscoll

Python 201 – All About the TypedDict

Python has supported the concept of type hinting for quite a while now. However, unlike other programming languages, Python does not enforce type hints. You must use an external tool, such as Mypy, for that.

In this tutorial, you will learn all about TypedDict, a special way of adding type hinting to Heterogeneous dictionaries. A heterogeneous dictionary is a dictionary that has values that are not all the same type.

But before you learn how to use the TypedDict, you should review how to type hint a regular dictionary.

Type Hinting a Regular Dictionary

A regular Python dictionary is defined as follows:

my_dictionary = {"some_key": "some_value"}

You can use any hashable type for the key, such as a string or an integer. The value of a dictionary can be any type whatsoever.

When you want to type hint a dictionary, you would use the following: dict[key_type, value_type]

Now let’s apply that to the example above:

my_dictionary: dict[str, str] = {"some_key": "some_value"}

If you are using a version of Python before 3.9, you will need to do the following instead:

from typing import Dict

my_dictionary: Dict[str, str] = {"some_key": "some_value"}

Fortunately, modern Python no longer requires that extra import.

Now you’re ready to learn about how and why you might want to use the TypedDict

Creating a TypedDict

The TypedDict was introduced to Python in 3.8. You can read the full details about it in PEP 589. The reason you would use a TypedDict over a regular dictionary is when you have a dictionary with values of different types.

Here’s an example:

my_dictionary = {"names": ["Mike", "Andrea", "John"],
                 "type": "employee",
                 "code": 123456
                }

Type hinting this type of dictionary is more complex. You can do something like this, though:

my_dictionary: dict[str, list | str | int] = {"names": ["Mike", "Andrea", "John"], "otype": "employee", "code": 123456 }

Depending on how your type checker is configured, this might work. However, if you write code that modifies the list, your type checker may complain that a string doesn’t have an append method or vice versa.

To make the type checker happier, you should use a TypedDict.

Here’s how you would use one with this example:

from typing import TypedDict

class MultiTypeDict(TypedDict):
    names: list
    otype: str
    code: int

my_dictionary: MultiTypeDict = {"names": ["Mike", "Andrea", "John"], "otype": "employee", "code": 123456 }

Isn’t that great? There’s just one problem. What if your dictionary’s keys have spaces in them? You cannot create class attributes with spaces!

There’s a workaround for that. Check it out in the next section.

Creating a TypedDict with Keys that Have Spaces

For this example, you will create a new dictionary with four keys, three of which contain spaces.

To make a TypedDict for this type of dictionary, you need to call the TypedDict constructor instead of subclassing it:

from typing import TypedDict

Results = TypedDict("Results",{"Animal Habitats": list,
                               "Tested": bool,
                               "Animal Name": str,
                               "Animal Location": str})

actual_results: Results = {
    "Animal Habitats": ["Asia", "N. America"],
    "Tested": False,
    "Animal Name": "Tigris",
    "Animal Location": "North Bay",
}

When you call TypedDict, you pass in the typename (what you would have named the class) and the fields the dictionary should have. You’ll note that the fields are a dictionary. This is where you will put the keys that contain spaces and those without spaces.

Give it a try and you’ll find it works great!

Wrapping Up

TypedDict is a handy tool for storing a complex dictionary. You will find that sometimes you even have these complex dictionaries inside of lists, tuples or even other dictionaries. Using the TypedDict can make type-hinting these data structures easier and prevent hard-to-detect defects from creeping in.

The post Python 201 – All About the TypedDict appeared first on Mouse Vs Python.

June 17, 2025 01:25 PM UTC


Armin Ronacher

We Can Just Measure Things

This week I spent time with friends to letting agents go wild and see what we could build in 24 hours. I took some notes for myself to reflect on that experience. I won't bore you with another vibecoding post, but you can read Peter's post about how that went.

As fun as it was, it also was frustrating in other ways and in entire predictable ways. It became a meme about how much I hated working with Xcode for this project. This got me thinking quite a bit more that this has been an entirely unacceptable experience for a long time, but with programming agents, the pain becomes measurable.

When I first dove into programming I found the idea of RTFM quite hilarious. “Why are you asking dumb questions, just read it up.” The unfortunate reality is that the manual often doesn't exist — or is wrong. In fact, we as engineers are quite willing to subject each others to completely inadequate tooling, bad or missing documentation and ridiculous API footguns all the time. “User error” is what we used to call this, nowadays it's a “skill issue”. It puts the blame on the user and absolves the creator, at least momentarily. For APIs it can be random crashes if you use a function wrong, for programs it can be impossible to navigate UI or lack of error messages. There are many different ways in which we humans get stuck.

What agents change about this is, is that I can subject them to something I wouldn't really want to subject other developers to: measuring. I picked the language for my current project by running basic evals and it worked well. I learned from that, that there are objectively better and worse language when it comes to my particular problem. The choice however is not just how much the AI knows about the language from the corpus of examples during training. It's also tooling, the inherent capabilities of the language, ecosystem churn and other aspects.

Using agents to measure code quality is great because agents don't judge me, but they do judge the code they are writing. Not all agents will swear, but they will express frustration with libraries when loops don't go well or give up. That opens up an opportunity to bring some measurements into not agent performance, but the health of a project.

We should pay more attention to how healthy engineering teams are, and that starts with the code base. Using agents we can put some numbers to it in which we cannot do with humans (or in a very slow and expensive way). We can figure out how successful agents are in using the things are are creating in rather objective ways which is in many ways a proxy for how humans experience working with the code. Getting together with fresh souls to walk them through a tutorial or some tasks is laborious and expensive. Getting agents that have never seen a codebase start using a library is repeatable, rather cheap, fast and if set up the right way very objective. It also takes the emotion out of it or running the experiment multiple times.

Now obviously we can have debates over if the type of code we would write with an agent is objectively beautiful or if the way agents execute tools creates the right type of tools. This is a debate worth having. Right at this very moment though what programming agents need to be successful is rather well aligned with what humans need.

So what works better than other things? For now these are basic indicators, for agents and humans alike:

When an agent struggles, so does a human. There is a lot of code and tooling out there which is objectively not good, but because of one reason or another became dominant. If you want to start paying attention to technology choices or you want to start writing your own libraries, now you can use agents to evaluate the developer experience.

Because so can your users. I can confidently say it's not just me that does not like Xcode, my agent also expresses frustration — measurably so.

June 17, 2025 12:00 AM UTC

June 16, 2025


Python Engineering at Microsoft

Python in Visual Studio Code – June 2025 Release

We’re excited to announce the June 2025 release of the Python, Pylance and Jupyter extensions for Visual Studio Code!

This release includes the following announcements:

If you’re interested, you can check the full list of improvements in our changelogs for the Python, Jupyter and Pylance extensions.

Python chat tools

The Python extension now includes the following chat tools: “Get information for a Python Environment”, “Get executable information for a Python Environment”, “Install Python Package” and “Configure Python Environment”. You can either directly reference them in your prompt by adding #getPythonEnvironmentInfo and #installPythonPackage, or agent mode will automatically call the tool as applicable based on your prompt. These tools seamlessly detect appropriate environment information, based on file or workspace context, and handle package installation with accurate environment resolution.

The “Configure Python Environment” tool ensures that the Python environment is set up correctly for the workspace. This includes creating a virtual environment if needed, and selecting it as the active Python environment for your workspace.

Tools that were previously introduced in the Python Environments extension (preview) have been migrated to the Python extension, thereby making these tools available to all users with the Python extension installed.

Language Server based terminal suggest in the Python REPL

Language server completions are now available in the terminal for interactive Python REPL sessions. This brings the same language completions you receive in the editor, now inside the terminal making terminal interactions more efficient.

To try it out, ensure the following settings are enabled:

Create Project from a template in the Python Environments extension

The Python Environments extension (preview) now supports project creation for Python packages and basic scripts, allowing you to bypass scaffolding and get coding more quickly. Use the Python Envs: Create Project from Template command in the Command Palette to select whether you want to create a package or a script and let the command handle the rest!

For package creation, you can expect to name the package, create a virtual environment, and receive a scaffolded project which includes a tests subfolder, pyproject.toml, dev-requirements.txt, and boilerplate __main__.py and __init__.py files.

For scripts, a new Python file with the name of your choice and boilerplate code will be created.

PyEnv and Poetry support in the Python Environments extension

We added support for pyenv for environment management, and poetry for both package and environment management in the Python Environments extension. This ensures you can manage pyenv and poetry environments as your normally would in the support UI contributed by the Python Environments extension. When pyenv or poetry are installed on your machine, they will appear as support environment managers in the Python panel accessed in the Activity Bar.

Screenshot showing various environment managers in the Python environments view.

Controlled rollout of the Python Environments extension

We’re starting to roll-out the Python Environments extension as an optional dependency with the Python extension beginning with a subset of pre-release users this month. What this means is you may now begin seeing the Python Environments extension automatically installed along side the Python extension, similar to the Python Debugger and Pylance extensions. This controlled rollout allows us to gather early feedback and ensure reliability before general availability. The Python Environments extension includes all the core capabilities we’ve introduced so far including: Quick Create for one-click environment setup using Quick Create, automatic terminal activation (via "python-envs.terminal.autoActivationType" setting), and all supported UI for environment an package management.

You can install the preview version of the Python Environments extension from the Extension Marketplace if you would like to try it out. Please let us know if there are any issues or feature requests via our vscode-python-environments repo.

We would also like to extend special thanks to this month’s contributors:

Try out these new improvements by downloading the Python extension and the Jupyter extension from the Marketplace, or install them directly from the extensions view in Visual Studio Code (Ctrl + Shift + X or ⌘ + ⇧ + X). You can learn more about Python support in Visual Studio Code in the documentation. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code – June 2025 Release appeared first on Microsoft for Python Developers Blog.

June 16, 2025 04:38 PM UTC


Real Python

Write Pythonic and Clean Code With namedtuple

Python’s namedtuple in the collections module allows you to create immutable sequences with named fields, providing a more readable and Pythonic way to handle tuples. You use namedtuple to access values with descriptive field names and dot notation, which improves code clarity and maintainability.

By the end of this tutorial, you’ll understand that:

  • Python’s namedtuple is a factory function that creates tuple subclasses with named fields.
  • The main difference between tuple and namedtuple is that namedtuple allows attribute access via named fields, enhancing readability.
  • The point of using namedtuple is to improve code clarity by allowing access to elements through descriptive names instead of integer indices.
  • Some alternatives to namedtuple include dictionaries, data classes, and typing.NamedTuple.

Dive deeper into creating namedtuple classes, exploring their powerful features, and writing Python code that’s easier to read and maintain.

Get Your Code: Click here to download the free sample code that shows you how to use namedtuple to write Pythonic and clean code.

Take the Quiz: Test your knowledge with our interactive “Write Pythonic and Clean Code With namedtuple” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Write Pythonic and Clean Code With namedtuple

In this quiz, you'll test your understanding of Python's namedtuple() factory function from the collections module.

Getting to Know namedtuple in Python

Python’s namedtuple() is a factory function that’s available in the collections module. It allows you to create a tuple subclass with named fields. These named fields let you to access the values in a given named tuple using dot notation and field names—for example, my_tuple.field_name.

Python’s namedtuple was created to improve code readability by providing a way to access values using descriptive field names instead of integer indices, which often don’t provide any context on what the values are. This feature also makes the code cleaner and more maintainable.

In contrast, accessing values by index in a regular tuple can be frustrating, hard to read, and error-prone. This is especially true if the tuple has a lot of fields and is constructed far away from where you’re using it.

Note: In this tutorial, you’ll find different terms used to refer to Python’s namedtuple, its factory function, and its instances.

To avoid confusion, here’s a summary of how each term is used throughout the tutorial:

Term Meaning
namedtuple() The factory function
namedtuple, namedtuple class The tuple subclass returned by namedtuple()
namedtuple instance, named tuple An instance of a specific namedtuple class

You’ll find these terms used with their corresponding meaning throughout the tutorial.

Besides providing named fields, named tuples in Python offer the following features:

You can use namedtuple instances wherever you need a tuple-like object. They offer the added benefit of accessing values using field names and dot notation, which makes your code more readable and Pythonic.

With this brief introduction to namedtuple and its general features, you’re ready to explore how to create and use them effectively in your own code.

Creating Tuple-Like Classes With the namedtuple() Function

You use a namedtuple() to create an immutable, tuple-like sequence with named fields. A popular example that you’ll often find in resources about namedtuple is defining a class to represent a mathematical point.

Depending on the problem, you’ll probably want to use an immutable data structure to represent your points. Here’s how you can create a two-dimensional point using a regular tuple:

Python
>>> # Create a 2D point as a regular tuple
>>> point = (2, 4)
>>> point
(2, 4)

>>> # Access coordinate x
>>> point[0]
2
>>> # Access coordinate y
>>> point[1]
4

>>> # Try to update a coordinate value
>>> point[0] = 100
Traceback (most recent call last):
    ...
TypeError: 'tuple' object does not support item assignment
Copied!

In this example, you create an immutable, two-dimensional point using a regular tuple. This code works. You have a point with two coordinates that you can access by index. The point is immutable, so you can’t modify the coordinates. However, do you think this code is readable? Can you tell upfront what the 0 and 1 indices mean?

To improve clarity, you can use a namedtuple like in the following code. Note that you need to import the function from the collections module first:

Python
>>> from collections import namedtuple

>>> # Create a namedtuple type, Point
>>> Point = namedtuple("Point", "x y")

>>> point = Point(2, 4)
>>> point
Point(x=2, y=4)

>>> # Access the coordinates by field name
>>> point.x
2
>>> point.y
4

>>> # Access the coordinates by index
>>> point[0]
2
>>> point[1]
4

>>> point.x = 100
Traceback (most recent call last):
    ...
AttributeError: can't set attribute

>>> issubclass(Point, tuple)
True
Copied!

Now you have a Point class with two appropriately named fields, .x and .y. Your point provides a descriptive string representation by default: Point(x=2, y=4).

You can access the coordinates with dot notation and the field names, which is convenient, readable, and explicit. You can also use indices to access each coordinate’s value if you prefer.

Note: As with regular tuples, named tuples are immutable. However, the values they store don’t necessarily have to be immutable.

It’s completely valid to create a tuple or a named tuple that holds mutable values:

Python
>>> from collections import namedtuple

>>> Person = namedtuple("Person", "name children")
>>> john = Person("John Doe", ["Timmy", "Jimmy"])
>>> john
Person(name='John Doe', children=['Timmy', 'Jimmy'])
>>> id(john.children)
139695902374144

>>> john.children.append("Tina")
>>> john
Person(name='John Doe', children=['Timmy', 'Jimmy', 'Tina'])
>>> id(john.children)
139695902374144

>>> hash(john)
Traceback (most recent call last):
    ...
TypeError: unhashable type: 'list'
Copied!

You can create named tuples that contain mutable objects. Then, you can modify the mutable objects in the underlying tuple. However, this doesn’t mean that you’re modifying the tuple itself. The tuple will continue being the same object.

Finally, tuples or named tuples with mutable values aren’t hashable, as you saw in the above example.

Read the full article at https://realpython.com/python-namedtuple/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 16, 2025 02:00 PM UTC


Python Bytes

#436 Slow tests go last

<strong>Topics covered in this episode:</strong><br> <ul> <li><em>* Free-threaded Python no longer “experimental” as of Python 3.14</em>*</li> <li><strong><a href="https://github.com/livingbio/typed-ffmpeg?featured_on=pythonbytes">typed-ffmpeg</a></strong></li> <li><strong><a href="https://github.com/deepankarm/pyleak?featured_on=pythonbytes">pyleak</a></strong></li> <li><em>* <a href="https://timonweb.com/django/optimizing-test-execution-running-live_server-tests-last-with-pytest/?featured_on=pythonbytes">Optimizing Test Execution: Running live_server Tests Last with pytest</a></em>*</li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=Mt7X3Q54lU4' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="436">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by <strong>PropelAuth</strong>: <a href="https://pythonbytes.fm/propelauth66">pythonbytes.fm/propelauth66</a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy">@mkennedy@fosstodon.org</a> / <a href="https://bsky.app/profile/mkennedy.codes?featured_on=pythonbytes">@mkennedy.codes</a> (bsky)</li> <li>Brian: <a href="https://fosstodon.org/@brianokken">@brianokken@fosstodon.org</a> / <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes">@brianokken.bsky.social</a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes">@pythonbytes@fosstodon.org</a> / <a href="https://bsky.app/profile/pythonbytes.fm">@pythonbytes.fm</a> (bsky)</li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually <strong>Monday</strong> at 10am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Brian #1: Free-threaded Python no longer “experimental” as of Python 3.14</strong></p> <ul> <li>“PEP 779 ("Criteria for supported status for free-threaded Python") has been accepted, which means free-threaded Python is now a supported build!” <a href="https://fosstodon.org/@hugovk@mastodon.social/114689715316210829">- Hugo van Kemenade</a></li> <li><a href="https://peps.python.org/pep-0779/?featured_on=pythonbytes">PEP 779 – Criteria for supported status for free-threaded Python</a></li> <li>As noted in the <a href="https://discuss.python.org/t/pep-779-criteria-for-supported-status-for-free-threaded-python/84319/123?featured_on=pythonbytes">discussion of PEP 779</a>, “The Steering Council (SC) approves PEP 779, with the effect of removing the “experimental” tag from the free-threaded build of Python 3.14.”</li> <li>We are in Phase II then.</li> <li>“We are confident that the project is on the right path, and we appreciate the continued dedication from everyone working to make free-threading ready for broader adoption across the Python community.”</li> <li>“Keep in mind that any decision to transition to Phase III, with free-threading as the default or sole build of Python is still undecided, and dependent on many factors both within CPython itself and the community. We leave that decision for the future.”</li> <li>How long will all this take? According to Thomas Wouters, <a href="https://social.coop/@Yhg1s/114692495471337607?featured_on=pythonbytes">a few years, at least</a>: “In other words: it'll be a few years at least. It can't happen before 3.16 (because we won't have Stable ABI support until 15) and may well take longer.”</li> </ul> <p><strong>Michael #2:</strong> <a href="https://github.com/livingbio/typed-ffmpeg?featured_on=pythonbytes">typed-ffmpeg</a></p> <ul> <li><p><strong>typed-ffmpeg</strong> offers a modern, Pythonic interface to FFmpeg, providing extensive support for complex filters with detailed typing and documentation.</p></li> <li><p>Inspired by ffmpeg-python, this package enhances functionality by addressing common limitations, such as lack of IDE integration and comprehensive typing, while also introducing new features like JSON serialization of filter graphs and automatic FFmpeg validation.</p></li> <li><p>Features :</p> <ul> <li><strong>Zero Dependencies:</strong> Built purely with the Python standard library, ensuring maximum compatibility and security.</li> <li><strong>User-Friendly:</strong> Simplifies the construction of filter graphs with an intuitive Pythonic interface.</li> <li><strong>Comprehensive FFmpeg Filter Support:</strong> Out-of-the-box support for most FFmpeg filters, with IDE auto-completion.</li> <li><strong>Integrated Documentation:</strong> In-line docstrings provide immediate reference for filter usage, reducing the need to consult external documentation.</li> <li><strong>Robust Typing:</strong> Offers static and dynamic type checking, enhancing code reliability and development experience.</li> <li><strong>Filter Graph Serialization:</strong> Enables saving and reloading of filter graphs in JSON format for ease of use and repeatability.</li> <li><strong>Graph Visualization:</strong> Leverages graphviz for visual representation, aiding in understanding and debugging.</li> <li><strong>Validation and Auto-correction:</strong> Assists in identifying and fixing errors within filter graphs.</li> <li><strong>Input and Output Options Support:</strong> Provide a more comprehensive interface for input and output options, including support for additional codecs and formats.</li> <li><strong>Partial Evaluation:</strong> Enhance the flexibility of filter graphs by enabling partial evaluation, allowing for modular construction and reuse.</li> <li><strong>Media File Analysis:</strong> Built-in support for analyzing media files using FFmpeg's ffprobe utility, providing detailed metadata extraction with both dictionary and dataclass interfaces.</li> </ul></li> </ul> <p><strong>Michael #3:</strong> <a href="https://github.com/deepankarm/pyleak?featured_on=pythonbytes">pyleak</a></p> <ul> <li>Detect leaked asyncio tasks, threads, and event loop blocking with stack trace in Python. Inspired by goleak.</li> <li>Use as context managers or function dectorators</li> <li>When using no_task_leaks, you get detailed stack trace information showing exactly where leaked tasks are executing and where they were created.</li> <li>Even has great examples and a pytest plugin.</li> </ul> <p><strong>Brian #4: <a href="https://timonweb.com/django/optimizing-test-execution-running-live_server-tests-last-with-pytest/?featured_on=pythonbytes">Optimizing Test Execution: Running live_server Tests Last with pytest</a></strong></p> <ul> <li><p>Tim Kamanin</p></li> <li><p>“When working with <strong>Django</strong> applications, it's common to have a mix of fast unit tests and slower end-to-end (E2E) tests that use p<strong>ytest</strong>'s <code>live_server</code> fixture and browser automation tools like <strong>Playwright</strong> or <strong>Selenium</strong>. ”</p></li> <li><p>Tim is running E2E tests last for</p> <ul> <li>Faster feedback from quick tests</li> <li>To not tie up resources early in the test suite.</li> </ul></li> <li><p>He did this with</p> <ul> <li><p>custom “e2e” marker</p></li> <li><p>Implementing a </p> <pre><code>pytest_collection_modifyitems </code></pre> <p>hook function to look for tests using the </p> <pre><code>live_server </code></pre> <p>fixture, and for them</p> <ul> <li>automatically add the <code>e2e</code> marker to those tests</li> <li>move those tests to the end</li> </ul></li> </ul></li> <li><p>The reason for the marker is to be able to</p> <ul> <li>Just run e2e tests with <code>-m e2e</code></li> <li>Avoid running them sometimes with <code>-m "not e2e"</code></li> </ul></li> <li><p>Cool small writeup.</p> <ul> <li>The technique works for any system that has some tests that are slower or resource bound based on a particular fixture or set of fixtures.</li> </ul></li> </ul> <p><strong>Extras</strong></p> <p>Brian:</p> <ul> <li><a href="https://discuss.python.org/t/is-free-threading-our-only-option/91775?featured_on=pythonbytes">Is Free-Threading Our Only Option?</a> - Interesting discussion started by Eric Snow and recommended by John Hagen</li> <li><a href="https://hugovk.dev/blog/2025/free-threaded-python-on-github-actions/?featured_on=pythonbytes">Free-threaded Python on GitHub Actions</a> - How to add FT tests to your projects, by Hugo van Kemenade</li> </ul> <p>Michael:</p> <ul> <li>New course! <a href="https://training.talkpython.fm/courses/llm-building-blocks-for-python?featured_on=pythonbytes">LLM Building Blocks in Python</a></li> <li><a href="https://talkpython.fm/blog/posts/deep-dive-retrospective-at-talk-python/?featured_on=pythonbytes">Talk Python Deep Dives Complete: 600K Words of Talk Python Insights</a></li> <li>.folders on Linux <ul> <li>Write up on <a href="https://blobs.pythonbytes.fm/xdg-config-home-v2.html">XDG for Python devs</a>.</li> </ul></li> <li><a href="https://blobs.pythonbytes.fm/keep-pulling-back-in.jpg">They keep pulling me back</a> - <a href="https://help.openai.com/en/articles/9624314-model-release-notes?featured_on=pythonbytes">ChatGPT Pro with o3-pro</a></li> <li>Python Bytes is the <a href="https://goodpods.com/leaderboard/top-100-shows-by-category/news/tech-news?period=month#67232899">#1 Python news podcast and #17 of all tech news podcasts</a>.</li> <li><a href="https://pythoninsider.blogspot.com/2025/06/python-3134-31211-31113-31018-and-3923.html?featured_on=pythonbytes">Python 3.13.4, 3.12.11, 3.11.13, 3.10.18 and 3.9.23 are now available</a></li> <li><a href="https://pythoninsider.blogspot.com/2025/06/python-3135-is-now-available.html?featured_on=pythonbytes">Python 3.13.5 is now available!</a></li> </ul> <p><strong>Joke:</strong> <a href="https://x.com/PR0GRAMMERHUM0R/status/1930655881718382721?featured_on=pythonbytes">Naming is hard</a></p>

June 16, 2025 08:00 AM UTC


Ned Batchelder

Math factoid of the day: 63

63 is a centered octahedral number. That means if you build an approximation of an octahedron with cubes, one size of octahedron will have 63 cubes.

In the late 1700’s René Just Haüy developed a theory about how crystals formed: successive layers of fundamental primitives in orderly arrangements. One of those arrangements was stacking cubes together to make an octahedron.

Start with one cube:

Just one lonely cube

Add six more cubes around it, one on each face. Now we have seven:

Seven cubes as a crude octahedron

Add another layer, adding a cube to touch each visible cube, making 25:

25 cubes arranged like an octahedron five cubes wide

One more layer and we have a total of 63:

63 cubes arranged like an octahedron seven cubes wide

The remaining numbers in the sequence less than 10,000 are 129, 231, 377, 575, 833, 1159, 1561, 2047, 2625, 3303, 4089, 4991, 6017, 7175, 8473, 9919.

63 also shows up in the Delannoy numbers: the number of ways to traverse a grid from the lower left corner to upper right using only steps north, east, or northeast. Here are the 63 ways of moving on a 3×3 grid:

63 different ways to traverse a 3x3 grid

(Diagram from Wikipedia)

In fact, the number of cubes in a Haüy octahedron with N layers is the same as the number of Delannoy steps on a 3×N grid!

Since the two ideas are both geometric and fairly simple, I would love to find a geometric explanation for the correspondence. The octahedron is three-dimensional, and the Delannoy grids have that tantalizing 3 in them. It seems like there should be a way to convert Haüy coordinates to Delannoy coordinates to show how they relate. But I haven’t found one...

•    •    •

Colophon: I made the octahedron diagrams by asking Claude to write a Python program to do it. It wasn’t a fast process because it took pushing and prodding to get the diagrams to come out the way I liked. But Claude was very competent, and I could think about the results rather than about projections or color spaces. I could dip into it for 10 minutes at a time over a number of days without having to somehow reconstruct a mental context.

This kind of casual hobby programming is perfect for AI assistance. I don’t need the code to be perfect or even good, I just want the diagrams to be nice. I don’t have the focus time to learn how to write the program, so I can leave it to an imperfect assistant.

June 16, 2025 04:00 AM UTC

June 15, 2025


Ed Crewe

Talk about Cloud Prices at PyConLT 2025


Introduction to Cloud Pricing

I am looking forward to speaking at PyConLT 2025
My talk is called Cutting the Price of Scraping Cloud Costs (video)

Its been a while (12 years!) since my last Python conference EuroPython Florence 2012, when I spoke as a Django web developer, although I did give a Golang talk at Kubecon USA last year.

I work at EDB, the Postgres company, on our Postgres AI product. The cloud version of which runs across the main cloud providers, AWS, Azure and GCP.

The team I am in handles the identity management and billing components of the product. So whilst I am mainly a Golang micro-service developer, I have dipped my toe into Data Science, having rewritten our Cloud prices ETL using Python & Airflow. The subject of my talk in Lithuania.

Cloud pricing can be surprisingly complex ... and the price lists are not small.

The full price lists for the 3 CSPs together are almost 5 million prices - known as SKUs (Stock Keeping Unit prices)

csp x service x type x tier x region
3    x  200      x 50     x 3     x 50        = 4.5 million

csp = AWS, Azure and GCP

service = vms, k8s, network, load balancer, storage etc.

type = e.g. storage - general purpose E2, N1 ... accelerated A1, A2  multiplied by various property sizes

tier  = T-shirt size tiers of usage, ie more use = cheaper rate - small, medium, large

region = us-east-1, us-west-2, af-south-1, etc.

We need to gather all the latest service SKU that our Postgres AI may use and total them up as a cost estimate for when customers are selecting the various options for creating or adding to their installation.
Applying the additional pricing for our product and any private offer discounts for it, as part of this process.

Therefore we needed to build a data pipeline to gather the SKUs and keep them current.

Previously we used a 3rd party kubecost based provider's data, however our usage was not sufficient to justify for paying for this particular cloud service when its free usage expired.

Hence we needed to rewrite our cloud pricing data pipeline. This pipeline is in Apache Airflow but it could equally be in Dagster or any other data pipeline framework.

My talk deals with the wider points around cloud pricing, refactoring a data pipeline and pipeline framework options. But here I want to provide more detail on the data pipeline's Python code, its use of Embedded Postgres and Click, and the benefits for development and testing.  Some things I didn't have room for in the talk.


Outline of our use of Data Pipelines

Airflow, Dagster, etc. provide many tools for pipeline development.
Notably local development mode for running up the pipeline framework locally and doing test runs.
Including some reloading on edit, it can still be a long process, running up a pipeline and then executing the full set of steps, known as a directed acyclic graph, DAG.

One way to improve the DEVX is if the DAG step's code is encapsulated as much as possible per step.
Removing use of shared state where that is viable and allowing individual steps to be separately tested, rapidly, with fixture data. With fast stand up and tear down, of temporary embedded storage.

To avoid shared state persistence across the whole pipeline we use extract transform load (ETL) within each step, rather than across the whole pipeline. This enables functional running and testing of individual steps outside the pipeline.


The Scraper Class

We need a standard scraper class to fetch the cloud prices from each CSP so use an abstract base class.


from abc import ABC

class BaseScraper(ABC):

   """Abstract base class for Scrapers"""

   batch = 500

   conn = None

   unit_map = {"FAIL": ""}

   root_url = ""


   def map_units(self, entry, key):

       """To standardize naming of units between CSPs"""

       return self.unit_map.get(entry.get(key, "FAIL"), entry[key])


   def scrape_sku(self):

       """Scrapes prices from CSP bulk JSON API - uses CSP specific methods"""

       Pass


   def bulk_insert_rows(self, rows):

       """Bulk insert batches of rows - Note that Psycopg >= 3.1 uses pipeline mode"""

       query = """INSERT INTO api_price.infra_price VALUES

       (%(sku_id)s, %(cloud_provider)s, %(region)s, %(sku_name)s, %(end_usage_amount)s)"""

       with self.conn.cursor() as cur:

           cur.executemany(query, rows)


This has 3 common methods:

  1. mapping units to common ones across all CSP
  2. Top level scrape sku methods some CSP differences within sub methods called from it
  3. Bulk insert rows - the main concrete method used by all scrapers

To bulk insert 500 rows per query we use Psycopg 3 pipeline mode - so it can send batch updates again and again without waiting for response.

The database update against local embedded Postgres is faster than the time to scrape the remote web site SKUs.


The largest part of the Extract is done at this point. Rather than loading all 5 million SKU as we did with the kubecost data dump, to query out the 120 thousand for our product. Scraping the sources directly we only need to ingest those 120k SKU. Which saves handling 97.6% of the data!


So the resultant speed is sufficient although not as performant as pg_dump loading which uses COPY.


Unfortunately Python Psycopg is significantly slower when using cursor.copy and it mitigated against using zipped up Postgres dumps. Hence all the data artefact creation and loading simply uses the pg_dump utility wrapped as a Python shell command. 

There is no need to use Python here when there is the tried and tested C based pg_dump utility for it that ensures compatibility outside our pipeline. Later version pg_dump can always handle earlier Postgres dumps.


We don't need to retain a long history of artefacts, since it is public data and never needs to be reverted.

This allows us a low retention level, cleaning out most of the old dumps on creation of a new one. So any storage saving on compression is negligible.

Therefore we avoid pg_dump compression, since it can be significantly slower, especially if the data already contains compressed blobs. Plain SQL COPY also allows for data inspection if required - eg grep for a SKU, when debugging why a price may be missing.


Postgres Embedded wrapped with Go

Unlike MySQL, Postgres doesn't do in memory databases. The equivalent for temporary or test run database lifetime, is the embedded version of Postgres. Run from an auto-created temp folder of files. 
Python doesn’t have maintained wrapper for Embedded Postgres, sadly project https://github.com/Simulmedia/pyembedpg is abandoned 😢

Hence use the most up to date wrapper from Go. Running the Go binary via a Python shell command.
It still lags behind by a version of Postgres, so its on Postgres 16 rather than latest 17.
But for the purposes of embedded use that is irrelevant.

By using separate temporary Postgres per step we can save a dumped SQL artefact at the end of a step and need no data dependency between steps, meaning individual step retry in parallel, just works.
The performance of localhost dump to socket is also superior.
By processing everything in the same (if embedded) version of our final target database as the Cloud Price, Go micro-service, we remove any SQL compatibility issues and ensure full Postgresql functionality is available.

The final data artefacts will be loaded to a Postgres cluster price schema micro-service running on CloudNativePG

Use a Click wrapper with Tests

The click package provides all the functionality for our pipeline..

> pscraper -h

Usage: pscraper [OPTIONS] COMMAND [ARGS]...

   price-scraper: python web scraping of CSP prices for api-price

Options:

  -h, --help  Show this message and exit.


Commands:

  awsscrape     Scrape prices from AWS

  azurescrape  Scrape prices from Azure

  delold            Delete old blob storage files, default all over 12 weeks old are deleted

  gcpscrape     Scrape prices from GCP - set env GCP_BILLING_KEY

  pgdump        Dump postgres file and upload to cloud storage - set env STORAGE_KEY
                      > pscraper pgdump --port 5377 --file price.sql 

  pgembed      Run up local embeddedPG on a random port for tests

> pscraper pgembed

  pgload           Load schema to local embedded postgres for testing

> pscraper pgload --port 5377 --file price.sql


This caters for developing the step code entirely outside the pipeline for development and debug.
We can run pgembed to create a local db, pgload to add the price schema. Then run individual scrapes from a pipenv pip install -e version of the the price scraper package.


For unit testing we can create a mock response object for the data scrapers that returns different fixture payloads based on the query and monkeypatch it in. This allows us to functionally test the whole scrape and data artefact creation ETL cycle as unit functional tests.

Any issues with source data changes can be replicated via a fixture for regression tests.

class MockResponse:

"""Fake to return fixture value of requests.get() for testing scrape parsing"""

name = "Mock User"
payload = {}
content = ""
status_code = 200
url = "http://mock_url"

def __init__(self, payload={}, url="http://mock_url"):
self.url = url
self.payload = payload
self.content = str(payload)

def json(self):
return self.payload


def mock_aws_get(url, **kwargs):
    """Return the fixture JSON that matches the URL used"""
for key, fix in fixtures.items():
if key in url:
return MockResponse(payload=fix, url=url)
return MockResponse()

class TestAWSScrape(TestCase):
"""Tests for the 'pscraper awsscrape' command"""

def setUpClass():
"""Simple monkeypatch in mock handlers for all tests in the class"""
psycopg.connect = MockConn
requests.get = mock_aws_get
# confirm that requests is patched hence returns short fixture of JSON from the AWS URLs
result = requests.get("{}/AmazonS3/current/index.json".format(ROOT))
assert len(result.json().keys()) > 5 and len(result.content) < 2000

A simple DAG with Soda Data validation

The click commands for each DAG are imported at the top, one for the scrape and one for postgres embedded, the DAG just becomes a wrapper to run them, adding Soda data validation of the scraped data ...

def scrape_azure():
   """Scrape Azure via API public json web pages"""
   from price_scraper.commands import azurescrape, pgembed
   folder, port = setup_pg_db(PORT)
   error = azurescrape.run_azure_scrape(port, HOST)
   if not error:
       error = csp_dump(port, "azure")
   if error:
       pgembed.teardown_pg_embed(folder) 
       notify_slack("azure", error)
       raise AirflowFailException(error)
  
   data_test = SodaScanOperator(
       dag=dag,
       task_id="data_test",
       data_sources=[
           {
               "data_source_name": "embedpg",
               "soda_config_path": "price-scraper/soda/configuration_azure.yml",
           }
       ],
       soda_cl_path="price-scraper/soda/price_azure_checks.yml",
   )
   data_test.execute(dict())
   pgembed.teardown_pg_embed(folder)
 


We setup a new Embedded Postgres (takes a few seconds) and then scrape directly to it.


We then use the SodaScanOperator to check the data we have scraped, if there is no error we dump to blob storage otherwise notify Slack with the error and raise it ending the DAG

Our Soda tests check that the number of and prices are in the ranges that they should be for each service. We also check we have the amount of tiered rates that we expect. We expect over 10 starting usage rates and over 3000 specific tiered prices.

If the Soda tests pass, we dump to cloud storage and teardown temporary Postgres. A final step aggregates together each steps data. We save the money and maintenance of running a persistent database cluster in the cloud for our pipeline.


June 15, 2025 06:00 PM UTC


PyPy

How fast can the RPython GC allocate?

While working on a paper about allocation profiling in VMProf I got curious about how quickly the RPython GC can allocate an object. I wrote a small RPython benchmark program to get an idea of the order of magnitude.

The basic idea is to just allocate an instance in a tight loop:

class A(object):
    pass

def run(loops):
    # preliminary idea, see below
    for i in range(loops):
        a = A()
        a.i = i

The RPython type inference will find out that instances of A have a single i field, which is an integer. In addition to that field, every RPython object needs one word of GC meta-information. Therefore one instance of A needs 16 bytes on a 64-bit architecture.

However, measuring like this is not good enough, because the RPython static optimizer would remove the allocation since the object isn't used. But we can confuse the escape analysis sufficiently by always keeping two instances alive at the same time:

class A(object):
    pass

def run(loops):
    a = prev = None
    for i in range(loops):
        prev = a
        a = A()
        a.i = i
    print(prev, a) # print the instances at the end

(I confirmed that the allocation isn't being removed by looking at the C code that the RPython compiler generates from this.)

This is doing a little bit more work than needed, because of the a.i = i instance attribute write. We can also (optionally) leave the field uninitialized.

def run(initialize_field, loops):
    t1 = time.time()
    if initialize_field:
        a = prev = None
        for i in range(loops):
            prev = a
            a = A()
            a.i = i
        print(prev, a) # make sure always two objects are alive
    else:
        a = prev = None
        for i in range(loops):
            prev = a
            a = A()
        print(prev, a)
    t2 = time.time()
    print(t2 - t1, 's')
    object_size_in_words = 2 # GC header, one integer field
    mem = loops * 8 * object_size_in_words / 1024.0 / 1024.0 / 1024.0
    print(mem, 'GB')
    print(mem / (t2 - t1), 'GB/s')

Then we need to add some RPython scaffolding:

def main(argv):
    loops = int(argv[1])
    with_init = bool(int(argv[2]))
    if with_init:
        print("with initialization")
    else:
        print("without initialization")
    run(with_init, loops)
    return 0

def target(*args):
    return main

To build a binary:

pypy rpython/bin/rpython targetallocatealot.py

Which will turn the RPython code into C code and use a C compiler to turn that into a binary, containing both our code above as well as the RPython garbage collector.

Then we can run it (all results again from my AMD Ryzen 7 PRO 7840U, running Ubuntu Linux 24.04.2):

$ ./targetallocatealot-c 1000000000 0
without initialization
<A object at 0x7c71ad84cf60> <A object at 0x7c71ad84cf70>
0.433825 s
14.901161 GB
34.348322 GB/s
$ ./targetallocatealot-c 1000000000 1
with initialization
<A object at 0x71b41c82cf60> <A object at 0x71b41c82cf70>
0.501856 s
14.901161 GB
29.692100 GB/s

Let's compare it with the Boehm GC:

$ pypy rpython/bin/rpython --gc=boehm --output=targetallocatealot-c-boehm targetallocatealot.py 
...
$ ./targetallocatealot-c-boehm 1000000000 0
without initialization
<A object at 0xffff8bd058a6e3af> <A object at 0xffff8bd058a6e3bf>
9.722585 s
14.901161 GB
1.532634 GB/s
$ ./targetallocatealot-c-boehm 1000000000 1
with initialization
<A object at 0xffff88e1132983af> <A object at 0xffff88e1132983bf>
9.684149 s
14.901161 GB
1.538717 GB/s

This is not a fair comparison, because the Boehm GC uses conservative stack scanning, therefore it cannot move objects, which requires much more complicated allocation.

Let's look at perf stats

We can use perf to get some statistics about the executions:

$ perf stat -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./targetallocatealot-c 10000000000 0
without initialization
<A object at 0x7aa260e35980> <A object at 0x7aa260e35990>
4.301442 s
149.011612 GB
34.642245 GB/s

 Performance counter stats for './targetallocatealot-c 10000000000 0':

     7,244,117,828      cache-references                                                      
        23,446,661      cache-misses                     #    0.32% of all cache refs         
    21,074,240,395      cycles                                                                
   110,116,790,943      instructions                     #    5.23  insn per cycle            
    20,024,347,488      branches                                                              
             1,287      faults                                                                
                24      migrations                                                            

       4.303071693 seconds time elapsed

       4.297557000 seconds user
       0.003998000 seconds sys

$ perf stat -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations ./targetallocatealot-c 10000000000 1
with initialization
<A object at 0x77ceb0235980> <A object at 0x77ceb0235990>
5.016772 s
149.011612 GB
29.702688 GB/s

 Performance counter stats for './targetallocatealot-c 10000000000 1':

     7,571,461,470      cache-references                                                      
       241,915,266      cache-misses                     #    3.20% of all cache refs         
    24,503,497,532      cycles                                                                
   130,126,387,460      instructions                     #    5.31  insn per cycle            
    20,026,280,693      branches                                                              
             1,285      faults                                                                
                21      migrations                                                            

       5.019444749 seconds time elapsed

       5.012924000 seconds user
       0.005999000 seconds sys

This is pretty cool, we can run this loop with >5 instructions per cycle. Every allocation takes 110116790943 / 10000000000 ≈ 11 instructions and 21074240395 / 10000000000 ≈ 2.1 cycles, including the loop around it.

How often does the GC run?

The RPython GC queries the L2 cache size to determine the size of the nursery. We can find out what it is by turning on PYPYLOG, selecting the proper logging categories, and printing to stdout via :-:

$ PYPYLOG=gc-set-nursery-size,gc-hardware:- ./targetallocatealot-c 1 1
[f3e6970465723] {gc-set-nursery-size
nursery size: 270336
[f3e69704758f3] gc-set-nursery-size}
[f3e697047b9a1] {gc-hardware
L2cache = 1048576
[f3e69705ced19] gc-hardware}
[f3e69705d11b5] {gc-hardware
memtotal = 32274210816.000000
[f3e69705f4948] gc-hardware}
[f3e6970615f78] {gc-set-nursery-size
nursery size: 4194304
[f3e697061ecc0] gc-set-nursery-size}
with initialization
NULL <A object at 0x7fa7b1434020>
0.000008 s
0.000000 GB
0.001894 GB/s

So the nursery is 4 MiB. This means that when we allocate 14.9 GiB the GC needs to perform 10000000000 * 16 / 4194304 ≈ 38146 minor collections. Let's confirm that:

$ PYPYLOG=gc-minor:out ./targetallocatealot-c 10000000000 1
with initialization
w<A object at 0x7991e3835980> <A object at 0x7991e3835990>
5.315511 s
149.011612 GB
28.033356 GB/s
$ head out
[f3ee482f4cd97] {gc-minor
[f3ee482f53874] {gc-minor-walkroots
[f3ee482f54117] gc-minor-walkroots}
minor collect, total memory used: 0
number of pinned objects: 0
total size of surviving objects: 0
time taken: 0.000029
[f3ee482f67b7e] gc-minor}
[f3ee4838097c5] {gc-minor
[f3ee48380c945] {gc-minor-walkroots
$ grep "{gc-minor-walkroots" out | wc -l
38147

Each minor collection is very quick, because a minor collection is O(surviving objects), and in this program only one object survive each time (the other instance is in the process of being allocated). Also, the GC root shadow stack is only one entry, so walking that is super quick as well. The time the minor collections take is logged to the out file:

$ grep "time taken" out | tail
time taken: 0.000002
time taken: 0.000002
time taken: 0.000002
time taken: 0.000002
time taken: 0.000002
time taken: 0.000002
time taken: 0.000002
time taken: 0.000003
time taken: 0.000002
time taken: 0.000002
$ grep "time taken" out | grep -o "0.*" | numsum
0.0988160000000011

(This number is super approximate due to float formatting rounding.)

that means that 0.0988160000000011 / 5.315511 ≈ 2% of the time is spent in the GC.

What does the generated machine code look like?

The allocation fast path of the RPython GC is a simple bump pointer, in Python pseudo-code it would look roughly like this:

result = gc.nursery_free
# Move nursery_free pointer forward by totalsize
gc.nursery_free = result + totalsize
# Check if this allocation would exceed the nursery
if gc.nursery_free > gc.nursery_top:
    # If it does => collect the nursery and al
    result = collect_and_reserve(totalsize)
result.hdr = <GC flags and type id of A>

So we can disassemble the compiled binary targetallocatealot-c and try to find the equivalent logic in machine code. I'm super bad at reading machine code, but I tried to annotate what I think is the core loop (the version without initializing the i field) below:

    ...
    cb68:   mov    %rbx,%rdi 
    cb6b:   mov    %rdx,%rbx

    # initialize object header of object allocated in previous iteration
    cb6e:   movq   $0x4c8,(%rbx)

    # loop termination check
    cb75:   cmp    %rbp,%r12
    cb78:   je     ccb8

    # load nursery_free
    cb7e:   mov    0x33c13(%rip),%rdx

    # increment loop counter
    cb85:   add    $0x1,%rbp

    # add 16 (size of object) to nursery_free
    cb89:   lea    0x10(%rdx),%rax

    # compare nursery_top with new nursery_free
    cb8d:   cmp    %rax,0x33c24(%rip)

    # store new nursery_free
    cb94:   mov    %rax,0x33bfd(%rip)

    # if new nursery_free exceeds nursery_top, fall through to slow path, if not, start at top
    cb9b:   jae    cb68

    # slow path from here on:
    # save live object from last iteration to GC shadow stack
    cb9d:   mov    %rbx,-0x8(%rcx)
    cba1:   mov    %r13,%rdi
    cba4:   mov    $0x10,%esi
    # do minor collection
    cba9:   call   20800 <pypy_g_IncrementalMiniMarkGC_collect_and_reserve>
    ...

Running the benchmark as regular Python code

So far we ran this code as RPython, i.e. type inference is performed and the program is translated to a C binary. We can also run it on top of PyPy, as a regular Python3 program. However, an instance of a user-defined class in regular Python when run on PyPy is actually a much larger object, due to dynamic typing. It's at least 7 words, which is 56 bytes.

However, we can simply use int objects instead. Integers are allocated on the heap and consist of two words, one for the GC and one with the machine-word-sized integer value, if the integer fits into a signed 64-bit representation (otherwise a less compact different representation is used, which can represent arbitrarily large integers).

Therefore, we can simply use this kind of code:

import sys, time


def run(loops):
    t1 = time.time()
    a = prev = None
    for i in range(loops):
        prev = a
        a = i
    print(prev, a) # make sure always two objects are alive
    t2 = time.time()
    object_size_in_words = 2 # GC header, one integer field
    mem = loops * 28 / 1024.0 / 1024.0 / 1024.0
    print(mem, 'GB')
    print(mem / (t2 - t1), 'GB/s')

def main(argv):
    loops = int(argv[1])
    run(loops)
    return 0

if __name__ == '__main__':
    sys.exit(main(sys.argv))

In this case we can't really leave the value uninitialized though.

We can run this both with and without the JIT:

$ pypy3 allocatealot.py 1000000000
999999998 999999999
14.901161193847656 GB
17.857494904899553 GB/s
$ pypy3 --jit off allocatealot.py 1000000000
999999998 999999999
14.901161193847656 GB
0.8275382375297171 GB/s

This is obviously much less efficient than the C code, the PyPy JIT generates much less efficient machine code than GCC. Still, "only" twice as slow is kind of cool anyway.

(Running it with CPython doesn't really make sense for this measurements, since CPython ints are bigger – sys.getsizeof(5) reports 28 bytes.)

The machine code that the JIT generates

Unfortunately it's a bit of a journey to show the machine code that PyPy's JIT generates for this. First we need to run with all jit logging categories:

$ PYPYLOG=jit:out pypy3 allocatealot.py 1000000000

Then we can read the log file to find the trace IR for the loop under the logging category jit-log-opt:

+532: label(p0, p1, p6, p9, p11, i34, p13, p19, p21, p23, p25, p29, p31, i44, i35, descr=TargetToken(137358545605472))
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#24 FOR_ITER')

# are we at the end of the loop
+552: i45 = int_lt(i44, i35)
+555: guard_true(i45, descr=<Guard0x7ced4756a160>) [p0, p6, p9, p11, p13, p19, p21, p23, p25, p29, p31, p1, i44, i35, i34]
+561: i47 = int_add(i44, 1)
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#26 STORE_FAST')
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-10~#28 LOAD_FAST')
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-10~#30 STORE_FAST')
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#32 LOAD_FAST')
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#34 STORE_FAST')
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-11~#36 JUMP_ABSOLUTE')

# update iterator object
+565: setfield_gc(p25, i47, descr=<FieldS pypy.module.__builtin__.functional.W_IntRangeIterator.inst_current 8>)
+569: guard_not_invalidated(descr=<Guard0x7ced4756a1b0>) [p0, p6, p9, p11, p19, p21, p23, p25, p29, p31, p1, i44, i34]

# check for signals
+569: i49 = getfield_raw_i(137358624889824, descr=<FieldS pypysig_long_struct_inner.c_value 0>)
+582: i51 = int_lt(i49, 0)
+586: guard_false(i51, descr=<Guard0x7ced4754db78>) [p0, p6, p9, p11, p19, p21, p23, p25, p29, p31, p1, i44, i34]
debug_merge_point(0, 0, 'run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#24 FOR_ITER')

# allocate the integer (allocation sunk to the end of the trace)
+592: p52 = new_with_vtable(descr=<SizeDescr 16>)
+630: setfield_gc(p52, i34, descr=<FieldS pypy.objspace.std.intobject.W_IntObject.inst_intval 8 pure>)
+634: jump(p0, p1, p6, p9, p11, i44, p52, p19, p21, p23, p25, p29, p31, i47, i35, descr=TargetToken(137358545605472))

To find the machine code address of the trace, we need to search for this line:

Loop 1 (run;/home/cfbolz/projects/gitpypy/allocatealot.py:6-9~#24 FOR_ITER) \
    has address 0x7ced473ffa0b to 0x7ced473ffbb0 (bootstrap 0x7ced473ff980)

Then we can use a script in the PyPy repo to disassemble the generated machine code:

$ pypy rpython/jit/backend/tool/viewcode.py out

This will dump all the machine code to stdout, and open a pygame-based graphviz cfg. In there we can search for the address and see this:

Graphviz based visualization of the machine code the JIT generates

Here's an annotated version with what I think this code does:

# increment the profile counter
7ced473ffb40:   48 ff 04 25 20 9e 33    incq   0x38339e20
7ced473ffb47:   38 

# check whether the loop is done
7ced473ffb48:   4c 39 fe                cmp    %r15,%rsi
7ced473ffb4b:   0f 8d 76 01 00 00       jge    0x7ced473ffcc7

# increment iteration variable
7ced473ffb51:   4c 8d 66 01             lea    0x1(%rsi),%r12

# update iterator object
7ced473ffb55:   4d 89 61 08             mov    %r12,0x8(%r9)

# check for ctrl-c/thread switch
7ced473ffb59:   49 bb e0 1b 0b 4c ed    movabs $0x7ced4c0b1be0,%r11
7ced473ffb60:   7c 00 00 
7ced473ffb63:   49 8b 0b                mov    (%r11),%rcx
7ced473ffb66:   48 83 f9 00             cmp    $0x0,%rcx
7ced473ffb6a:   0f 8c 8f 01 00 00       jl     0x7ced473ffcff

# load nursery_free pointer
7ced473ffb70:   49 8b 8b d8 30 f6 fe    mov    -0x109cf28(%r11),%rcx

# add size (16)
7ced473ffb77:   48 8d 51 10             lea    0x10(%rcx),%rdx

# compare against nursery top
7ced473ffb7b:   49 3b 93 f8 30 f6 fe    cmp    -0x109cf08(%r11),%rdx

# jump to slow path if nursery is full
7ced473ffb82:   0f 87 41 00 00 00       ja     0x7ced473ffbc9

# store new value of nursery free
7ced473ffb88:   49 89 93 d8 30 f6 fe    mov    %rdx,-0x109cf28(%r11)

# initialize GC header
7ced473ffb8f:   48 c7 01 30 11 00 00    movq   $0x1130,(%rcx)

# initialize integer field
7ced473ffb96:   48 89 41 08             mov    %rax,0x8(%rcx)
7ced473ffb9a:   48 89 f0                mov    %rsi,%rax
7ced473ffb9d:   48 89 8d 60 01 00 00    mov    %rcx,0x160(%rbp)
7ced473ffba4:   4c 89 e6                mov    %r12,%rsi
7ced473ffba7:   e9 94 ff ff ff          jmp    0x7ced473ffb40
7ced473ffbac:   0f 1f 40 00             nopl   0x0(%rax)

Conclusion

The careful design of the RPython GC's allocation fast path gives pretty good allocation rates. This technique isn't really new, it's a pretty typical way to design a GC. Apart from that, my main conclusion would be that computers are fast or something? Indeed, when we ran the same code on my colleague's two-year-old AMD, we got quite a bit worse results, so a lot of the speed seems to be due to the hard work of CPU architects.

June 15, 2025 01:48 PM UTC

June 13, 2025


Real Python

The Real Python Podcast – Episode #253: Starting With Marimo Notebooks & Python App Config Management

Looking for a guide on getting started with Marimo notebooks? How do you build a reproducible notebook for sharing or create a dashboard with interactive UI elements? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 13, 2025 12:00 PM UTC


Daniel Roy Greenfeld

TIL: HTML 404 errors for FastHTML

from fastapi import FastAPI
from fastapi.responses import HTMLResponse


async def custom_404_exception_handler(request, exc):
    return HTMLResponse(
        f'<p>404 Not Found at "{request.url.path}"</p>', status_code=404
    )

# Add more HTTP exceptions as needed
HTTP_EXCEPTIONS = {404: custom_404_exception_handler}

app = FastAPI(exception_handlers=HTTP_EXCEPTIONS)


@app.get("/")
async def read_root():
    return {"Hello": "World"}

Try it out by running the app and going to a non-existent path, like /not-found. You should see a simple HTML page with a 404 message.

June 13, 2025 02:30 AM UTC

June 12, 2025


Peter Bengtsson

A Python dict that can report which keys you did not use

Demonstrates a very basic way, in Python, how to know which fields of a dict you never accessed.

June 12, 2025 08:25 PM UTC


Robin Wilson

More links – June 2025

I’ve got into a bit of a habit of writing occasional posts with links to interesting things I’ve found (probably because it’s a relatively easy blog post to write). This is another of those posts – this time, written in June 2025. So, let’s get on with some links:

June 12, 2025 12:10 PM UTC