Planet Python
Last update: November 17, 2023 04:43 PM UTC
November 17, 2023
Mike Driscoll
Episode 22 – Git and Django with Adam Johnson
You may know Adam from all his work around the Django web framework. If you head to the Python Package Index (PyPI), you will see that Adam has made or contributed to more than 80 projects!
Adam recently released a new book called Boost Your Git DX
Listen in as we chat about:
- Book writing
- Django
- Python
- Git
- and much more!
Links
- Boost Your Django DX
- Boost Your Git DX
- Speed Up Your Django Tests
- Adam Johnson’s website
The post Episode 22 – Git and Django with Adam Johnson appeared first on Mouse Vs Python.
November 17, 2023 01:57 PM UTC
Real Python
The Real Python Podcast – Episode #181: Computational Thinking & Learning Python During an AI Revolution
Has the current growth of artificial intelligence (AI) systems made you wonder what the future holds for Python developers? What are the hidden benefits of learning to program in Python and practicing computational thinking? This week on the show, we speak with author Lawrence Gray about his upcoming book "Mastering Python: A Problem Solving Approach."
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 17, 2023 12:00 PM UTC
PyCharm
PyCharm 2023.3 EAP 7 Is Out!
You can download the build from our website, get it from the free Toolbox App, or update to it using snaps if you’re an Ubuntu user.
The seventh build of the Early Access Program for PyCharm 2023.3 brings improvements to:
- Django Structure view.
- Search Everywhere.
- Support for nested classes in HTML completion.
- Angular 17 support.
These are the most important updates for this build. For the full list of changes in this EAP build, read the release notes.
We’re dedicated to giving you the best possible experience, and your feedback is vital. If you find any bugs, please report them via our issue tracker. And if you have any questions or comments, feel free to share them in the comments below or get in touch with us on X (formerly Twitter).
November 17, 2023 09:42 AM UTC
Marcos Dione
is-dinant-dead-or-a-tip-for-writing-regular-expressions
NE: Another dictated and quickly revised post. Sorry for the mess.
Last night I was trying to develop a Prometheeus exporter for Apache logs. There's only one already written but it doesn't provide much information, and I just wanted to try myself (yes, a little NIH).
So I decided to start with the usual thing; that is, parsing the log lines. What's the best thing to do this than regular expressions and since I needed to capture a lot of stuff, and then be able to reference them, I thought "Oh yeah, now I remember my project dinant. What happened with it?"
I opened the last version of the source file and I found out that it's incomplete code and it's not in a good shape. So I said "look, it's too late, I'm not going to put it back in shape this because, even if I'm doing this for a hobby, eventually I will need this for work, so I will try to get something quick fast, and then when I have the time I'll see if I can revive dinant". So the answer to the title question is "maybe".
One of the ideas of dinant was that you would build your regular expressions piece by piece. Because it provides blocks that you could easily combine, that made building the regular expression easy, but it doesn't mean that you cannot do that already. For instance the first thing I have to parse is an IP address. What's an IP address? It's four octets joined by three dots. So we just define a regular expression that matches the octet and then a regular expression that matches the whole IP. Then for the rest of the fields of the line I kept using the same idea.
Another tip is that for defining regular expressions I like to use r-strings,
raw strings, so backslashes are escaping regular expression elements like . or * and
not escaping string elements like \n or \t, and given that they are prefixed by r, to me it's not
only a raw string but it's also a regular expression string :)
Finally, building your regular expressions block by block and then combining them in a final regular expression should make your regular expressions easier to test, because then you can you can build test code that test each block individually, and then you test bigger and bigger expressions, exactly like I did for dinant.
Here's the regexps quite well tested:
import re
capture = lambda name, regexp: f"(?P<{name}>{regexp})"
octect = r'([0-9]|[1-9][0-9]|1[0-9]{1,2}|2[0-4][0-9]|25[0-5])'
assert re.fullmatch(octect, '0') is not None
assert re.fullmatch(octect, '9') is not None
assert re.fullmatch(octect, '10') is not None
assert re.fullmatch(octect, '99') is not None
assert re.fullmatch(octect, '100') is not None
assert re.fullmatch(octect, '255') is not None
assert re.fullmatch(octect, '-1') is None
assert re.fullmatch(octect, '256') is None
IPv4 = r'\.'.join([octect] * 4) # thanks to r'', the \ is a regexp escape symbol, not a string escape symbol
assert re.fullmatch(IPv4, '0.0.0.0') is not None
assert re.fullmatch(IPv4, '255.255.255.255') is not None
assert re.fullmatch(IPv4, '255.255.255') is None
assert re.fullmatch(IPv4, '255.255') is None
assert re.fullmatch(IPv4, '255') is None
Meanwhile, after reading this, I decided to just use the grok exporter. More on that soon.
python dinant regexp prometheeus apache
November 17, 2023 08:33 AM UTC
November 16, 2023
Test and Code
209: Testing argparse Applications
How do you test the argument parsing bit of an application that uses argparse?
This episode covers:
- Design for Test: Structuring your app or script so it's easier to test.
- pytest & capsys for testing stdout
- Adding debug and preview flags for debugging and testing
- And reverting to subprocess.run if you can't modify the code under test
Also, there's a full writeup and code samples available:
- Blog post: Testing argparse Applications
- Code Repo
The Complete pytest Course
- For the fastest way to learn pytest, go to courses.pythontest.com
- Whether your new to testing or pytest, or just want to maximize your efficiency and effectiveness when testing.
November 16, 2023 08:07 PM UTC
Brian Okken
Testing argparse Applications
I was asked recently about how to test the argument parsing bit of an application that used argparse. argparse is a built in Python library for dealing with parsing command line arguments for command line interfaces, CLI’s. You know, like git clone <repo address>. git is the application. <repo address> is a command line argument. clone is a sub-command. Well, that might be a bad example, as I’m not going to use subcommands in my example, but lots of this still applies, even if you are using subcommands.
November 16, 2023 03:20 PM UTC
Talk Python to Me
#438: Celebrating JupyterLab 4 and Jupyter 7 Releases
Jupyter Notebooks and Jupyter Lab have to be one of the most important parts of Python when it comes to bring new users to the Python ecosystem and certainly for the day to day work of data scientists and general scientists who have made some of the biggest discoveries of recent times. And that platform has recently gotten a major upgrade with JupyterLab 4 released and Jupyter Notebook being significantly reworked to be based on the changes from JupyterLab as well. We have an excellent panel of guests, Sylvain Corlay, Frederic Collonval, Jeremy Tuloup, and Afshin Darian here to tell us what's new in these and other parts of the Jupyter ecosystem.<br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Guests</b><br/> <br/> <b>Sylvain Corlay</b><br/> <b>Frederic Collonval</b><br/> <b>Jeremy Tuloup</b><br/> <b>Afshin Darian</b><br/> <br/> <b>JupyterLab 4.0 is Here</b>: <a href="https://blog.jupyter.org/jupyterlab-4-0-is-here-388d05e03442" target="_blank" rel="noopener">blog.jupyter.org</a><br/> <b>Announcing Jupyter Notebook 7</b>: <a href="https://blog.jupyter.org/announcing-jupyter-notebook-7-8d6d66126dcf" target="_blank" rel="noopener">blog.jupyter.org</a><br/> <b>JupyterCon 2023 Videos</b>: <a href="https://www.youtube.com/playlist?list=PL_1BH3ug7n1Ih_Yy2TmM7MZ2zogSLZvzE" target="_blank" rel="noopener">youtube.com</a><br/> <b>Jupyterlite</b>: <a href="https://github.com/jupyterlite/jupyterlite" target="_blank" rel="noopener">github.com</a><br/> <b>Download JupyterLab Desktop</b>: <a href="https://github.com/jupyterlab/jupyterlab-desktop/releases" target="_blank" rel="noopener">github.com</a><br/> <b>Mythical Man Month Book</b>: <a href="https://en.wikipedia.org/wiki/The_Mythical_Man-Month" target="_blank" rel="noopener">wikipedia.org</a><br/> <b>Blender in Jupyter</b>: <a href="https://twitter.com/kolibril13/status/1699790198505353259" target="_blank" rel="noopener">twitter.com</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=OG41ji18kkU" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/438/celebrating-jupyterlab-4-and-jupyter-7-releases" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div><br/> <strong>Sponsors</strong><br/> <a href='https://talkpython.fm/phylum-research'>Phylum</a><br> <a href='https://talkpython.fm/python-tutor'>Python Tutor</a><br> <a href='https://talkpython.fm/training'>Talk Python Training</a>
November 16, 2023 08:00 AM UTC
November 15, 2023
death and gravity
reader 3.10 released – storage internal API
Hi there!
I'm happy to announce version 3.10 of reader, a Python feed reader library.
What's new? #
Here are the highlights since reader 3.9.
Storage internal API #
The storage internal API is now documented!
This is important because it opens up reader to using other databases than SQLite.
The protocols are mostly stable, but some changes are still expected. The long term goal is full stabilization, but at least one other implementation needs to exists before that, to work out any remaining kinks.
A SQLAlchemy backend would be especially useful, since it would provide access to a variety of database engines mostly out of the box. (Alas, I do not have time nor a need for this at the moment. Interested on working on it? Let me know!)
Why not use SQLAlchemy from the start? #
In the beginning:
- I wanted to keep things as simple as possible, so I stay motivated for the long term. I also wanted to follow a problem-solution approach, which cautions against solving problems you don't have. (Details on both here and here.)
- By that time, I was already a SQLite fan, and due to the single-user nature of reader, I was relatively confident concurrency won't be an issue.
- I didn't know exactly where and how I would deploy the web app; sqlite3 being in the standard library made it very appealing.
Since then, I did come up with some of my own complexity – reader has a query builder and a migration system (albeit both of them tiny), and there were some concurrency issues. SQLAlchemy would have likely helped with the first two, but not with the last. Overall, I still think plain SQLite was the right choice at the time.
Deprecated sqlite3 datetime support #
The default sqlite3 datetime adapters/converters were deprecated in Python 3.12. Since adapters/converters apply to all database connections, reader does not have the option of registering its own (as a library, it should not change global stuff), so datetime conversions now happen in the storage. As an upside, this provided an opportunity to change the storage to use timezone-aware datetimes.
Share experimental plugin #
There's a new share web app plugin to add social sharing links to the entry page.
Ideally, this functionality should end up in a plugin
that adds them to Entry.links
(to be exposed in #320),
so all reader users can benefit from it.
Python versions #
None this time, but Python 3.12 support is coming soon!
For more details, see the full changelog.
That's it for now.
Want to contribute? Check out the docs and the roadmap.
Learned something new today? Share this with others, it really helps!
What is reader? #
reader takes care of the core functionality required by a feed reader, so you can focus on what makes yours different.
reader allows you to:
- retrieve, store, and manage Atom, RSS, and JSON feeds
- mark articles as read or important
- add arbitrary tags/metadata to feeds and articles
- filter feeds and articles
- full-text search articles
- get statistics on feed and user activity
- write plugins to extend its functionality
...all these with:
- a stable, clearly documented API
- excellent test coverage
- fully typed Python
To find out more, check out the GitHub repo and the docs, or give the tutorial a try.
Why use a feed reader library? #
Have you been unhappy with existing feed readers and wanted to make your own, but:
- never knew where to start?
- it seemed like too much work?
- you don't like writing backend code?
Are you already working with feedparser, but:
- want an easier way to store, filter, sort and search feeds and entries?
- want to get back type-annotated objects instead of dicts?
- want to restrict or deny file-system access?
- want to change the way feeds are retrieved by using Requests?
- want to also support JSON Feed?
- want to support custom information sources?
... while still supporting all the feed types feedparser does?
If you answered yes to any of the above, reader can help.
The reader philosophy #
- reader is a library
- reader is for the long term
- reader is extensible
- reader is stable (within reason)
- reader is simple to use; API matters
- reader features work well together
- reader is tested
- reader is documented
- reader has minimal dependencies
Why make your own feed reader? #
So you can:
- have full control over your data
- control what features it has or doesn't have
- decide how much you pay for it
- make sure it doesn't get closed while you're still using it
- really, it's easier than you think
Obviously, this may not be your cup of tea, but if it is, reader can help.
November 15, 2023 09:42 PM UTC
Stack Abuse
Guide to Heaps in Python
Introduction
Imagine a bustling airport with flights taking off and landing every minute. Just as air traffic controllers prioritize flights based on urgency, heaps help us manage and process data based on specific criteria, ensuring that the most "urgent" or "important" piece of data is always accessible at the top.
In this guide, we'll embark on a journey to understand heaps from the ground up. We'll start by demystifying what heaps are and their inherent properties. From there, we'll dive into Python's own implementation of heaps, the
heapqmodule, and explore its rich set of functionalities. So, if you've ever wondered how to efficiently manage a dynamic set of data where the highest (or lowest) priority element is frequently needed, you're in for a treat.
What is a Heap?
The first thing you'd want to understand before diving into the usage of heaps is what is a heap. A heap stands out in the world of data structures as a tree-based powerhouse, particularly skilled at maintaining order and hierarchy. While it might resemble a binary tree to the untrained eye, the nuances in its structure and governing rules distinctly set it apart.
One of the defining characteristics of a heap is its nature as a complete binary tree. This means that every level of the tree, except perhaps the last, is entirely filled. Within this last level, nodes populate from left to right. Such a structure ensures that heaps can be efficiently represented and manipulated using arrays or lists, with each element's position in the array mirroring its placement in the tree.

The true essence of a heap, however, lies in its ordering. In a max heap, any given node's value surpasses or equals the values of its children, positioning the largest element right at the root. On the other hand, a min heap operates on the opposite principle: any node's value is either less than or equal to its children's values, ensuring the smallest element sits at the root.

Advice: You can visualize a heap as a pyramid of numbers. For a max heap, as you ascend from the base to the peak, the numbers increase, culminating in the maximum value at the pinnacle. In contrast, a min heap starts with the minimum value at its peak, with numbers escalating as you move downwards.
As we progress, we'll dive deeper into how these inherent properties of heaps enable efficient operations and how Python's heapq module seamlessly integrates heaps into our coding endeavors.
Characteristics and Properties of Heaps
Heaps, with their unique structure and ordering principles, bring forth a set of distinct characteristics and properties that make them invaluable in various computational scenarios.
First and foremost, heaps are inherently efficient. Their tree-based structure, specifically the complete binary tree format, ensures that operations like insertion and extraction of priority elements (maximum or minimum) can be performed in logarithmic time, typically O(log n). This efficiency is a boon for algorithms and applications that require frequent access to priority elements.
Another notable property of heaps is their memory efficiency. Since heaps can be represented using arrays or lists without the need for explicit pointers to child or parent nodes, they are space-saving. Each element's position in the array corresponds to its placement in the tree, allowing for predictable and straightforward traversal and manipulation.
The ordering property of heaps, whether as a max heap or a min heap, ensures that the root always holds the element of highest priority. This consistent ordering is what allows for quick access to the top-priority element without having to search through the entire structure.
Furthermore, heaps are versatile. While binary heaps (where each parent has at most two children) are the most common, heaps can be generalized to have more than two children, known as d-ary heaps. This flexibility allows for fine-tuning based on specific use cases and performance requirements.
Lastly, heaps are self-adjusting. Whenever elements are added or removed, the structure rearranges itself to maintain its properties. This dynamic balancing ensures that the heap remains optimized for its core operations at all times.
Advice: These properties made heap data structure a good fit for an efficient sorting algorithm - heap sort. To learn more about heap sort in Python, read our "Heap Sort in Python" article.
As we delve deeper into Python's implementation and practical applications, the true potential of heaps will unfold before us.
Types of Heaps
Not all heaps are created equal. Depending on their ordering and structural properties, heaps can be categorized into different types, each with its own set of applications and advantages. The two main categories are max heap and min heap.
The most distinguishing feature of a max heap is that the value of any given node is greater than or equal to the values of its children. This ensures that the largest element in the heap always resides at the root. Such a structure is particularly useful when there's a need to frequently access the maximum element, as in certain priority queue implementations.
The counterpart to the max heap, a min heap ensures that the value of any given node is less than or equal to the values of its children. This positions the smallest element of the heap at the root. Min heaps are invaluable in scenarios where the least element is of prime importance, such as in algorithms that deal with real-time data processing.
Beyond these primary categories, heaps can also be distinguished based on their branching factor:
While binary heaps are the most common, with each parent having at most two children, the concept of heaps can be extended to nodes having more than two children. In a d-ary heap, each node has at most d children. This variation can be optimized for specific scenarios, like decreasing the height of the tree to speed up certain operations.
Binomial Heap is a set of binomial trees that are defined recursively. Binomial heaps are used in priority queue implementations and offer efficient merge operations.
Named after the famous Fibonacci sequence, the Fibonacci heap offers better-amortized running times for many operations compared to binary or binomial heaps. They're particularly useful in network optimization algorithms.
Python's Heap Implementation - The heapq Module
Python offers a built-in module for heap operations - the heapq module. This module provides a collection of heap-related functions that allow developers to transform lists into heaps and perform various heap operations without the need for a custom implementation. Let's dive into the nuances of this module and how it brings you the power of heaps.
The heapq module doesn't provide a distinct heap data type. Instead, it offers functions that work on regular Python lists, transforming and treating them as binary heaps.
This approach is both memory-efficient and integrates seamlessly with Python's existing data structures.
That means that heaps are represented as lists in heapq. The beauty of this representation is its simplicity - the zero-based list index system serves as an implicit binary tree. For any given element at position i, its:
- Left Child is at position
2*i + 1 - Right Child is at position
2*i + 2 - Parent Node is at position
(i-1)//2

This implicit structure ensures that there's no need for a separate node-based binary tree representation, making operations straightforward and memory usage minimal.
Space Complexity: Heaps are typically implemented as binary trees but don't require storage of explicit pointers for child nodes. This makes them space-efficient with a space complexity of O(n) for storing n elements.
It's essential to note that the heapq module creates min heaps by default. This means that the smallest element is always at the root (or the first position in the list). If you need a max heap, you'd have to invert order by multiplying elements by -1 or use a custom comparison function.
Python's heapq module provides a suite of functions that allow developers to perform various heap operations on lists.
Note: To use the heapq module in your application, you'll need to import it using simple import heapq.
In the following sections, we'll dive deep into each of these fundamental operations, exploring their mechanics and use cases.
How to Transform a List into a Heap
The heapify() function is the starting point for many heap-related tasks. It takes an iterable (typically a list) and rearranges its elements in-place to satisfy the properties of a min heap:
import heapq
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
heapq.heapify(data)
print(data)
This will output a reordered list that represents a valid min heap:
[1, 1, 2, 3, 3, 9, 4, 6, 5, 5, 5]
Time Complexity: Converting an unordered list into a heap using the heapify function is an O(n) operation. This might seem counterintuitive, as one might expect it to be O(nlogn), but due to the tree structure's properties, it can be achieved in linear time.
How to Add an Element to the Heap
The heappush() function allows you to insert a new element into the heap while maintaining the heap's properties:
import heapq
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 3)
heapq.heappush(heap, 7)
print(heap)
Running the code will give you a list of elements maintaining the min heap property:
[3, 5, 7]
Time Complexity: The insertion operation in a heap, which involves placing a new element in the heap while maintaining the heap property, has a time complexity of O(logn). This is because, in the worst case, the element might have to travel from the leaf to the root.
How to Remove and Return the Smallest Element from the Heap
The heappop() function extracts and returns the smallest element from the heap (the root in a min heap). After removal, it ensures the list remains a valid heap:
import heapq
heap = [1, 3, 5, 7, 9]
print(heapq.heappop(heap))
print(heap)
Note: The heappop() is invaluable in algorithms that require processing elements in ascending order, like the Heap Sort algorithm, or when implementing priority queues where tasks are executed based on their urgency.
This will output the smallest element and the remaining list:
1
[3, 7, 5, 9]
Here, 1 is the smallest element from the heap, and the remaining list has maintained the heap property, even after we removed 1.
Time Complexity: Removing the root element (which is the smallest in a min heap or largest in a max heap) and reorganizing the heap also takes O(logn) time.
How to Push a New Item and Pop the Smallest Item
The heappushpop() function is a combined operation that pushes a new item onto the heap and then pops and returns the smallest item from the heap:
import heapq
heap = [3, 5, 7, 9]
print(heapq.heappushpop(heap, 4))
print(heap)
This will output 3, the smallest element, and print out the new heap list that now includes 4 while maintaining the heap property:
3
[4, 5, 7, 9]
Note: Using the heappushpop() function is more efficient than performing operations of pushing a new element and popping the smallest one separately.
How to Replace the Smallest Item and Push a New Item
The heapreplace() function pops the smallest element and pushes a new element onto the heap, all in one efficient operation:
import heapq
heap = [1, 5, 7, 9]
print(heapq.heapreplace(heap, 4))
print(heap)
This prints 1, the smallest element, and the list now includes 4 and maintains the heap property:
1
[4, 5, 7, 9]
Note: heapreplace() is beneficial in streaming scenarios where you want to replace the current smallest element with a new value, such as in rolling window operations or real-time data processing tasks.
Finding Multiple Extremes in Python's Heap
nlargest(n, iterable[, key]) and nsmallest(n, iterable[, key]) functions are designed to retrieve multiple largest or smallest elements from an iterable. They can be more efficient than sorting the entire iterable when you only need a few extreme values. For example, say you have the following list and you want to find three smallest and three largest values in the list:
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
Here, nlargest() and nsmallest() functions can come in handy:
import heapq
data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
print(heapq.nlargest(3, data)) # Outputs [9, 6, 5]
print(heapq.nsmallest(3, data)) # Outputs [1, 1, 2]
This will give you two lists - one contains the three largest values and the other contains the three smallest values from the data list:
[9, 6, 5]
[1, 1, 2]
How to Build Your Custom Heap
While Python's heapq module provides a robust set of tools for working with heaps, there are scenarios where the default min heap behavior might not suffice. Whether you're looking to implement a max heap or need a heap that operates based on custom comparison functions, building a custom heap can be the answer. Let's explore how to tailor heaps to specific needs.
Implementing a Max Heap using heapq
By default, heapq creates min heaps. However, with a simple trick, you can use it to implement a max heap. The idea is to invert the order of elements by multiplying them by -1 before adding them to the heap:
import heapq
class MaxHeap:
def __init__(self):
self.heap = []
def push(self, val):
heapq.heappush(self.heap, -val)
def pop(self):
return -heapq.heappop(self.heap)
def peek(self):
return -self.heap[0]
With this approach, the largest number (in terms of absolute value) becomes the smallest, allowing the heapq functions to maintain a max heap structure.
Heaps with Custom Comparison Functions
Sometimes, you might need a heap that doesn't just compare based on the natural order of elements. For instance, if you're working with complex objects or have specific sorting criteria, a custom comparison function becomes essential.
To achieve this, you can wrap elements in a helper class that overrides the comparison operators:
import heapq
class CustomElement:
def __init__(self, obj, comparator):
self.obj = obj
self.comparator = comparator
def __lt__(self, other):
return self.comparator(self.obj, other.obj)
def custom_heappush(heap, obj, comparator=lambda x, y: x < y):
heapq.heappush(heap, CustomElement(obj, comparator))
def custom_heappop(heap):
return heapq.heappop(heap).obj
With this setup, you can define any custom comparator function and use it with the heap.
Conclusion
Heaps offer predictable performance for many operations, making them a reliable choice for priority-based tasks. However, it's essential to consider the specific requirements and characteristics of the application at hand. In some cases, tweaking the heap's implementation or even opting for alternative data structures might yield better real-world performance.
Heaps, as we've journeyed through, are more than just another data structure. They represent a confluence of efficiency, structure, and adaptability. From their foundational properties to their implementation in Python's heapq module, heaps offer a robust solution to a myriad of computational challenges, especially those centered around priority.
November 15, 2023 07:21 PM UTC
Real Python
Embeddings and Vector Databases With ChromaDB
The era of large language models (LLMs) is here, bringing with it rapidly evolving libraries like ChromaDB that help augment LLM applications. You’ve most likely heard of chatbots like OpenAI’s ChatGPT, and perhaps you’ve even experienced their remarkable ability to reason about natural language processing (NLP) problems.
Modern LLMs, while imperfect, can accurately solve a wide range of problems and provide correct answers to many questions. But, due to the limits of their training and the number of text tokens they can process, LLMs aren’t a silver bullet for all tasks.
You wouldn’t expect an LLM to provide relevant responses about topics that don’t appear in their training data. For example, if you asked ChatGPT to summarize information in confidential company documents, then you’d be out of luck. You could show some of these documents to ChatGPT, but there’s a limited number of documents that you can upload before you exceed ChatGPT’s maximum number of tokens. How would you select documents to show ChatGPT?
To address these shortcomings and scale your LLM applications, one great option is to use a vector database like ChromaDB. A vector database allows you to store encoded unstructured objects, like text, as lists of numbers that you can compare to one another. You can, for example, find a collection of documents relevant to a question that you want an LLM to answer.
In this tutorial, you’ll learn about:
- Representing unstructured objects with vectors
- Using word and text embeddings in Python
- Harnessing the power of vector databases
- Encoding and querying over documents with ChromaDB
- Providing context to LLMs like ChatGPT with ChromaDB
After reading, you’ll have the foundational knowledge to use ChromaDB in your NLP or LLM applications. Before reading, you should be comfortable with the basics of Python and high school math.
Get Your Code: Click here to download free sample code that shows you how to use ChromaDB to add context to an LLM.
Represent Data as Vectors
Before diving into embeddings and vector databases, you should understand what vectors are and what they represent. Feel free to skip ahead to the next section if you’re already comfortable with vector concepts. If you’re not or if you could use a refresher, then keep reading!
Vector Basics
You can describe vectors with variable levels of complexity, but one great starting place is to think of a vector as an array of numbers. For example, you could represent vectors using NumPy arrays as follows:
>>> import numpy as np
>>> vector1 = np.array([1, 0])
>>> vector2 = np.array([0, 1])
>>> vector1
array([1, 0])
>>> vector2
array([0, 1])
In this code block, you import numpy and create two arrays, vector1 and vector2, representing vectors. This is one of the most common and useful ways to work with vectors in Python, and NumPy offers a variety of functionality to manipulate vectors. There are also several other libraries that you can use to work with vector data, such as PyTorch, TensorFlow, JAX, and Polars. You’ll stick with NumPy for this overview.
You’ve created two NumPy arrays that represent vectors. Now what? It turns out you can do a lot of cool things with vectors, but before continuing on, you’ll need to understand some key definitions and properties:
-
Dimension: The dimension of a vector is the number of elements that it contains. In the example above,
vector1andvector2are both two-dimensional since they each have two elements. You can only visualize vectors with three dimensions or less, but generally, vectors can have any number of dimensions. In fact, as you’ll see later, vectors that encode words and text tend to have hundreds or thousands of dimensions. -
Magnitude: The magnitude of a vector is a non-negative number that represents the vector’s size or length. You can also refer to the magnitude of a vector as the norm, and you can denote it with ||v|| or |v|. There are many different definitions of magnitude or norm, but the most common is the Euclidean norm or 2-norm. You’ll learn how to compute this later.
-
Unit vector: A unit vector is a vector with a magnitude of one. In the example above,
vector1andvector2are unit vectors. -
Direction: The direction of a vector specifies the line along which the vector points. You can represent direction using angles, unit vectors, or coordinates in different coordinate systems.
-
Dot product (scalar product): The dot product of two vectors, u and v, is a number given by u ⋅ v = ||u|| ||v|| cos(θ), where θ is the angle between the two vectors. Another way to compute the dot product is to do an element-wise multiplication of u and v and sum the results. The dot product is one of the most important and widely used vector operations because it measures the similarity between two vectors. You’ll see more of this later on.
-
Orthogonal vectors: Vectors are orthogonal if their dot product is zero, meaning that they’re at a 90 degree angle to each other. You can think of orthogonal vectors as being completely unrelated to each other.
-
Dense vector: A vector is considered dense if most of its elements are non-zero. Later on, you’ll see that words and text are most usefully represented with dense vectors because each dimension encodes meaningful information.
While there are many more definitions and properties to learn, these six are most important for this tutorial. To solidify these ideas with code, check out the following block. Note that for the rest of this tutorial, you’ll use v1, v2, and v3 to name your vectors:
>>> import numpy as np
>>> v1 = np.array([1, 0])
>>> v2 = np.array([0, 1])
>>> v3 = np.array([np.sqrt(2), np.sqrt(2)])
>>> # Dimension
>>> v1.shape
(2,)
>>> # Magnitude
>>> np.sqrt(np.sum(v1**2))
1.0
>>> np.linalg.norm(v1)
1.0
>>> np.linalg.norm(v3)
2.0
>>> # Dot product
>>> np.sum(v1 * v2)
0
>>> v1 @ v3
1.4142135623730951
You first import numpy and create the arrays v1, v2, and v3. Calling v1.shape shows you the dimension of v1. You then see two different ways to compute the magnitude of a NumPy array. The first, np.sqrt(np.sum(v1**2)), uses the Euclidean norm that you learned about above. The second computation uses np.linalg.norm(), a NumPy function that computes the Euclidean norm of an array by default but can also compute other matrix and vector norms.
Lastly, you see two ways to calculate the dot product between two vectors. Using np.sum(v1 * v2) first computes the element-wise multiplication between v1 and v2 in a vectorized fashion, and you sum the results to produce a single number. A better way to compute the dot product is to use the at-operator (@), as you see with v1 @ v3. This is because @ can perform both vector and matrix multiplications, and the syntax is cleaner.
While all of these vector definitions and properties may seem straightforward to compute, you might still be wondering what they actually mean and why they’re important to understand. One way to better understand vectors is to visualize them in two dimensions. In this context, you can represent vectors as arrows, like in the following plot:
Representing vectors as arrows in two dimensions
The above plot shows the visual representation of the vectors v1, v2, and v3 that you worked with in the last example. The tail of each vector arrow always starts at the origin, and the tip is located at the coordinates specified by the vector. As an example, the tip of v1 lies at (1, 0), and the tip of v3 lies at roughly (1.414, 1.414). The length of each vector arrow corresponds to the magnitude that you calculated earlier.
From this visual, you can make the following key inferences:
-
v1andv2are unit vectors because their magnitude, given by the arrow length, is one.v3isn’t a unit vector, and its magnitude is two, twice the size ofv1andv2. -
v1andv2are orthogonal because their tails meet at a 90 degree angle. You see this visually but can also verify it computationally by computing the dot product betweenv1andv2. By using the dot product definition, v1 ⋅ v2 = ||v1|| ||v2|| cos(θ), you can see that when θ = 90, cos(θ) = 0 and v1 ⋅ v2 = 0. Intuitively, you can think ofv1andv2as being totally unrelated or having nothing to do with each other. This will become important later. -
v3makes a 45 degree angle with bothv1andv2. This means thatv3will have a non-zero dot product withv1andv2. This also means thatv3is equally related to bothv1andv2. In general, the smaller the angle between two vectors, the more they point toward a common direction.
Read the full article at https://realpython.com/chromadb-vector-database/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 15, 2023 02:00 PM UTC
Mike Driscoll
Using CSS to Style a Python TUI with Textual
Textual is a Python framework for creating Text Based user interfaces (TUIs). You can create graphical user interfaces in your terminal with Textual.
If you haven’t heard of Textual before, check out An Intro to Textual – Creating Text User Interfaces with Python
In this tutorial, you will learn how to create and style a form. The form won’t do anything, but this tutorial teaches how to add widgets, lay them out, and then give them some style.
Getting Started
If you don’t have Textual yet, you must install it. Textual is not built-in to Python, so you can use pip to get it on your machine.
Open up your terminal and run the following command to install Textual:
python -m pip install textual
Now you’re ready to rock!
Creating a Form in Textual
You are now ready to start coding with Textual. Open up your favorite Python editor and create a new file named form.py.
Then enter the following code:
# form.py
from textual.app import App, ComposeResult
from textual.containers import Center
from textual.screen import Screen
from textual.widgets import Button, Footer, Header, Input, Static
class Form(Static):
def compose(self) -> ComposeResult:
"""
Creates the main UI elements
"""
yield Input(id="first_name", placeholder="First Name")
yield Input(id="last_name", placeholder="Last Name")
yield Input(id="address", placeholder="Address")
yield Input(id="city", placeholder="City")
yield Input(id="state", placeholder="State")
yield Input(id="zip_code", placeholder="Zip Code")
yield Input(id="email", placeholder="email")
with Center():
yield Button("Save", id="save_button")
class AddressBookApp(App):
def compose(self) -> ComposeResult:
"""
Lays out the main UI elemens plus a header and footer
"""
yield Header()
yield Form()
yield Footer()
if __name__ == "__main__":
app = AddressBookApp()
app.run()
Here, you import all the bits and bobs you’ll need to create your form. You can use the Static class to group together multiple widgets. Think of it as a container-widget.
You create the Form() class to contain most of your form’s widgets. You will compose a series of text input widgets where users can fill in their name and address information. There is also a reference to something called Center(), an actual container in Textual that helps you align widgets.
Next, in the AddressBookApp() class, you create a header, the form, and a footer. Now you are ready to run can run your code.
Open up your terminal again and use the following command:
python form.py
When you run your code, you will see something like the following:

The default colors work, but you may want to change them to give your application a different look.
You will learn how to do that by using CSS!
CSS Styling
Textual supports a limited subset of CSS that you can use to style your widgets.Create a new file and name it form.css.
Next, add the following code:
Input {
background: white;
}
Button {
background: blue;
}
The Input parameter tells Textual to style all the widgets that are of the Input type. In this example, you are setting the background color white.
The Button line item will set all the Button widget’s background color to blue. Of course, in this example, there is only one Button.
Now you need to update your code to tell Textual that you want to load a CSS file:
from textual.app import App, ComposeResult
from textual.containers import Center
from textual.screen import Screen
from textual.widgets import Button, Footer, Header, Input, Static
class Form(Static):
def compose(self) -> ComposeResult:
"""
Creates the main UI elements
"""
yield Input(id="first_name", placeholder="First Name")
yield Input(id="last_name", placeholder="Last Name")
yield Input(id="address", placeholder="Address")
yield Input(id="city", placeholder="City")
yield Input(id="state", placeholder="State")
yield Input(id="zip_code", placeholder="Zip Code")
yield Input(id="email", placeholder="email")
with Center():
yield Button("Save", id="save_button")
class AddressBookApp(App):
CSS_PATH = "form.css"
def compose(self) -> ComposeResult:
"""
Lays out the main UI elemens plus a header and footer
"""
yield Header()
yield Form()
yield Footer()
if __name__ == "__main__":
app = AddressBookApp()
app.run()
One-line change is all you need and that change is the first line in your AddressBookApp() class where you set a CSS_PATH variable. You can supply a relative or an absolute path to your CSS file here.
If you want to modify the style of any of the widgets in your TUI, you only need to go into the CSS file.
Try re-running the application and you’ll see an immediate difference:

If you’d like to be more specific about which widgets you want to style, change your CSS to the following:
Input {
background: white;
}
#first_name {
background: yellow;
color: red
}
#address {
background: green;
}
#save_button {
background: blue;
}
Here, you leave the Input widgets the same but add some hash-tag items to the CSS. These hash-tagged names must match the id you set for the individual widgets you want to style.
If you specify incorrect id names, those style blocks will be ignored. In this example, you explicitly modify the first_name and address Input widgets. You also call out the save_button Button,. This doesn’t really change the look of the button since you didn’t change the color, but if you add a second Button, it won’t get any special styling.
Here is what it looks like when you run it now:

You may not like these colors, so feel free to try out some of your own. That’s part of the fun of creating a TUI.
Wrapping Up
Now you know the basics of using CSS with your Textual applications. CSS is not my favorite way of applying styling, but this seems to work pretty well with Textual. The other nice thing about Textual is that there is a developer mode that you can enable where you can edit the CSS and watch it change the user interface live.
Give Textual a try and see what you can make!
The post Using CSS to Style a Python TUI with Textual appeared first on Mouse Vs Python.
November 15, 2023 01:32 PM UTC
PyCharm
Unveiling Python 3.12: What’s New in the World of Python?
Python 3.12 made its debut on October 2, 2023, in keeping with the annual tradition of releasing new versions every October.
This latest iteration introduces a range of new features and enhancements that we will delve into in this blog post. For a comprehensive list of changes, you can refer to the official documentation.
F-Strings
F-strings, also known as formatted string literals, were introduced in Python 3.6, providing a straightforward and concise method for string formatting. They allow the inclusion of expressions within string literals, simplifying the creation of strings with variables, expressions, or function call results. F-strings are identified by the prefix fbefore the string, and expressions within curly braces {} are computed and substituted with their values.
Due to their readability and versatility, f-strings have become the preferred choice for string formatting in Python, facilitating the creation of neatly formatted and dynamic strings in your code.
Issues addressed in Python 3.12:
- Flexibility to use quotes
- Improved handling of backslashes
- Refined handling of comments
- Enhanced support for nested f-strings
Quotes
Quotes in Python 3.11
Quotes in Python 3.12
Backslashes
In Python 3.11
In Python 3.12
Nested
In Python 3.11
Comments
In Python 3.11
Error Messages
Python 3.12 has made significant enhancements in error messages compared to previous versions. While prior updates improved error messages, with the introduction of a PEG parser in Python 3.9 and “did you mean” semantics in Python 3.10, this release introduces further improvements:
- Added
stdlibas a source of places for “did you mean” - Class member “did you mean”
- Import from syntax error “did you mean”
- Import names “did you mean”
Another notable improvement is the increased intelligence of error messages when dealing with common developer mistakes. For example, the error message explicitly recommends the correct approach.
import a.y.z from b.y.z Traceback (most recent call last): File "<stdin>", line 1 import a.y.z from b.y.z ^^^^^^^^^^^^^^^^^^^^^^^ SyntaxError: Did you mean to use 'from ... import ...' instead?
Additionally, Python 3.12’s error messages are more astute in recognizing instances where you reference an object’s attribute but don’t include the self prefix.
If you use PyCharm, you probably won’t see much of a change, since the IDE handled such errors and provided a quick-fix suggestion even before running a script.
In the past, the check was limited to the built-ins, but it now includes support for the standard library.
Lastly, when you encounter an import error and receive an exception while trying to import something from a module, Python 3.12 automatically suggests potential corrections. These enhancements collectively contribute to a significantly improved coding experience in Python.
Improvements in Type Annotations
PEP 698 Override Decorator
In this PEP, the suggestion is to introduce an @override decorator to Python’s type system. This addition aims to empower type checkers to proactively identify and prevent a specific category of errors arising when a base class modifies methods inherited by its derived classes.
PEP 695 Generic Types
Previously, we used to define generics using TypeVar syntax. TypeVar is a feature of the Python type hinting system that allows you to create a placeholder for a type that will be specified later when a function or class is used. It is primarily used to indicate that a particular type can be of any type, providing flexibility and generic type annotations in Python.
In Python 3.12, this has become much simpler.
You can also extend it to classes.
Previously we used TypeVar.
Now, in Python 3.12, it is not necessary:
Use the type keyword to define your own aliases.
Previously, we used TypeAlias from the typing module.
Now, in Python 3.12
PEP 709 Comprehension Inlining
In the past, dictionary, list, and set comprehensions were defined using a mechanism that involved creating functions. Essentially, the contents of comprehension were compiled into a separate function, which was then instantiated and immediately executed. This process incurred some overhead because it required the creation of a function object and the establishment of a stack frame when the function was called.
However, the implementation has been changed. Dictionary, list, and set comprehensions no longer rely on functions in the background. Instead, all comprehensions are now compiled directly within the context of the current function.
The comprehension’s bytecode is contained within an individual code object. Whenever inline_comprehension() is invoked, a new temporary function object is created via MAKE_FUNCTION, executed (resulting in the establishment and subsequent removal of a new frame on the Python stack), and promptly discarded.
Python 3.12
This alteration means that there is no longer a separate stack frame associated with the comprehension.
PEP 684 Per Interpreter GIL
If you’d like to learn more about the Global Interpreter Lock (GIL), watch this video where Guido discusses the Global Interpreter Lock and subinterpreters.
Python operates as an interpreted language, setting it apart from compiled languages that employ compilers to convert code into machine language. In contrast, Python reads and executes instructions directly within its interpreter. Performance enhancements in Python releases often translate to improvements in the CPython interpreter.
When you execute a Python program using CPython, it creates an interpreter instance. The initial instance is called the main interpreter and it is capable of generating subinterpreters. Most aspects of subinterpreters are distinct from one another, but not entirely. This subinterpreter concept isn’t new and has existed since Python 1.5, although it typically operates beneath the language’s surface.
Handling parallel execution can be tricky, especially when multiple processes attempt to modify a single value simultaneously, leading to consistency issues. Python employs the Global Interpreter Lock to mitigate such problems, but it’s been a source of frustration for developers seeking to write parallel code.
interp = interpreters.create()
print('before')
interp.run('print("during")')
print('after')
Efforts are underway to minimize the GIL’s impact and potentially eliminate it.
PEP 684 and PEP 554 impact the structure of subinterpreters. PEP 684 relocates the GIL from the global level to a subinterpreter level, while PEP 554 is focused on enabling the fundamental capability of multiple interpreters, isolated from each other, in the same Python process.
It’s crucial to understand that these adjustments are largely behind the scenes, and Python users will not encounter them directly until Python 3.13 is released.
To learn more about PEP 684, visit https://peps.python.org/pep-0684/
PEP 669 Low Impact Monitoring
PyCharm has added initial support for debugging based on PEP 669, improving overall debugger performance and making functionality such as tracing of raised exceptions and dropping into the debugger on a failed test almost penalty-less compared with the old sys.settrace based approach.
Credits: mCoding
import sys
def my_trace_call(code, instruction_offset, call, arg0):
print("Event: call")
def my_trace_line(code, line_number):
print("Event: line")
def setup_monitoring():
mo = sys.monitoring
events = mo.events
mo.use_tool_id(0, "my_debugger")
mo.set_events(0, events.CALL | events.LINE)
mo.register_callback(0, events.CALL, my_trace_call)
mo.register_callback(0, events.LINE, my_trace_line)
def main():
for x in range(5):
print(x)
if __name__ == "__main__":
setup_monitoring()
main()
In the past, Python debuggers used sys.settrace, which offered essentially the same functionality but in a less efficient manner. The new sys.monitoring namespace introduces a streamlined API for event registration, and its implementation details enable it to leverage the ongoing efforts to specialize instructions at runtime.
To know more about PEP 669 https://peps.python.org/pep-0669/
PEP 683 Immortal Objects
Meta, the company behind Instagram, utilizes Python (Django) for its front-end server. They implement a multi-process architecture with asyncio to handle parallelism. However, the high scale of operations and request volume can lead to memory inefficiency issues. To address this, they employ a pre-fork web server architecture to cache objects in shared memory, reducing private memory usage.
Upon closer examination, they found that the private memory of processes increased over time, while shared memory decreased. This issue was caused by Python objects, which although mostly immutable, still underwent modifications through reference counts and garbage collection (GC) operations, triggering a copy-on-write mechanism in server processes.
To resolve this problem, they introduced Immortal Objects (PEP-683), marking objects as truly immutable. This approach ensures that the reference count and GC header remain unchanged, reducing memory overhead.
To learn more about Immortal Objects, read the Meta Engineering Blog https://engineering.fb.com/2023/08/15/developer-tools/immortal-objects-for-python-instagram-meta/
Linux Perf Profiler
A profiler serves as a valuable instrument for observing and diagnosing the efficiency of your scripts and programs. Profiling your code allows you to obtain precise measurements, which can be utilized to refine your implementation.
Python has a history of supporting profiling through standard library tools such as timeit, cProfile, and memray from Bloomberg. Furthermore, there are third-party alternatives that provide more functionality.
Linux perf is a profiling and performance analysis tool that is integrated into the Linux kernel. It provides a wide range of features and capabilities for monitoring and analyzing the performance of a Linux system. Linux perf is a powerful utility that allows you to collect and analyze data on various aspects of system behavior, such as CPU utilization, memory usage, hardware events, and more. Some of its key features include:
1. CPU Profiling: Linux perf can be used to profile CPU usage, helping you identify hotspots in your code and understand how CPU time is distributed among different processes and functions.
2. Hardware Events: It can collect data on hardware events like cache misses, branch mispredictions, and instruction counts, which is valuable for optimizing code and understanding the impact of hardware on performance.
3. System-wide Profiling: Linux perf can capture system-wide data, enabling you to analyze the performance of all running processes and system components simultaneously.
4. Kernel Profiling: You can use Linux perf to analyze the performance of the Linux kernel itself, helping you pinpoint kernel-level bottlenecks and issues.
5. Tracing: It supports dynamic tracing of kernel and user-space events, allowing you to trace the execution of specific programs or system calls.
6. Performance Counters: Linux perf can access the performance monitoring counters available in modern CPUs, providing detailed information about processor behavior.
Linux perf is a versatile tool that is commonly used by developers, system administrators, and performance analysts to optimize software and diagnose performance problems on Linux systems. It provides a wealth of information that can help improve the efficiency and performance of applications and the overall system.
This article, authored by Peter McConnell, explores the use of performance engineering with Python 3.12. It begins by introducing the Linux perf tool and the FlameGraph visualization tool. The goal is to reduce the runtime of a Python script from 36 seconds to 0.8 seconds, emphasizing the importance of Python 3.12’s performance profiling support.
The article explores the use of environment variables to enable perf support and repeats the profiling process with Python 3.12, generating an improved FlameGraph. The source code responsible for the performance issue is examined.
Summary
Python 3.12 comes with a bunch of welcome ergonomics improvements. Declaring generic classes, functions, and type aliases for type hinting is now as straightforward as in many statically typed languages with first-class syntactic support provided by PEP 695. Already universally loved f-strings are now even easier to use thanks to former grammar restrictions, such as preventing re-using quotes and including escape sequences inside them, being lifted in PEP 701. Low overhead debugging features make using a debugger by default for all development tasks a no-brainer. Apart from that, there are new typing features, various performance improvements, and new standard library APIs.
Explore the capabilities of Python 3.12 with PyCharm 2023.3, now available in the Early Access Program (EAP). This version introduces a swifter debugging experience and enhanced code assistance tailored to Python 3.12’s new typing features. Unlock the potential of the new language features with the tool designed for it.
Learn more about Python 3.12 Support in PyCharm: https://blog.jetbrains.com/pycharm/2023/10/2023-3-eap-2/.
For a detailed exploration of additional features, please refer to the official documentation at https://docs.python.org/3/whatsnew/3.12.html.
November 15, 2023 11:15 AM UTC
Python Software Foundation
It's time for our annual year-end PSF fundraiser and membership drive 🎉
Support Python in 2023!
There are three ways to join in the drive this year:
- Save on PyCharm! JetBrains is once again supporting the PSF by providing a 30% discount on PyCharm and all proceeds will go to the PSF! You can take advantage of this discount by clicking the button on the page linked here, and the discount will be automatically applied when you check out. The promotion will only be available through November 27th, so go grab the deal today!
- Donate directly to the PSF! Every dollar makes a difference. (Does every dollar also make a puppy’s tail wag? We make no promises, but may you should try, just in case? 🐶)
- Become a member! Sign up as a Supporting member of the PSF. Be a part of the PSF, and help us sustain what we do with your annual support.
Or, heck, why not do all three? 🥳
Your Donations:
- Keep Python thriving
- Invest directly in CPython and PyPI progress
- Bring the global Python community together
- Make our community more diverse and robust every year
Let’s take a look back on 2023:
PyCon US - We held our 20th PyCon US, in Salt Lake City and online, which was an exhilarating success! For the online component, PyCon US OX, we added two moderated online hallway tracks (in Spanish and English) and saw a 33% increase in virtual engagement. It was great to see everyone again in 2023, and we’re grateful to all the speakers, volunteers, attendees, and sponsors who made it such a special event.
Security Developer in Residence - Seth Larson joined the PSF earlier this year as our first ever Security Developer-in-Residence. Seth is already well-known to the Python community – he was named a PSF Fellow in 2022 and has already written a lot about Python and security on his blog. This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project.
PyPI Safety & Security Engineer - Mike Fiedler joined the PSF earlier this year as our first ever PyPI Safety & Security Engineer. Mike is already a dedicated member of the Python packaging community – he has been a Python user for some 15 years, maintains and contributes to open source, and became a PyPI Maintainer in 2022. You can see some of what he's achieved for PyPI already on the PyPI blog. This critical role would not be possible without funding from AWS.
Welcome, Marisa and Marie! - In 2023 we were able to add two new full time staff members to the PSF. Marisa Comacho joined as Community Events Manager and Marie Nordin joined as Community Communications Manager. We are excited to add two full time dedicated staff members to the PSF to support PyCon US, our communications, and the community as a whole.
CPython Developer in Residence - Our CPython Developer in Residence, Łukasz Langa, continued to provide trusted support and advancement of the Python language, including oversight for the releases of Python 3.8 and 3.9, adoption of Sigstore, and stewardship of PEP 703 (to name a few of many!). Łukasz also engaged with the community by orchestrating the Python Language Summit and participating in events such as PyCon US 2023, EuroPython, and PyCon Colombia. This critical role would not be possible without funding from Meta.
Authorized as CVE Numbering Authority (CNA) - Being authorized as a CNA is one milestone in the Python Software Foundation's strategy to improve the vulnerability response processes of critical projects in the Python ecosystem. The Python Software Foundation CNA scope covers Python and pip, two projects which are fundamental to the rest of Python ecosystem.
Five new Fiscal Sponsorees - Welcome to Bandit, BaPya, Twisted, PyOhio, and North Bay Python as new Fiscal Sponsorees of the PSF! The PSF provides 501(c)(3) tax-exempt status to fiscal sponsorees and provides back office support so they can focus on their missions.
Our Thanks:
Thank you for being a part of this drive and of the Python community! Keep an eye on this space and on our social media in the coming weeks for updates on the drive and the PSF 👀
Your support means the world to us. We’re incredibly grateful to be in community with you!
November 15, 2023 10:30 AM UTC
November 14, 2023
PyCharm
PyCharm 2023.3 EAP 6 Is Out!
You can download the build from our website, get it from the free Toolbox App, or update to it using snaps if you’re an Ubuntu user.
The sixth build of the Early Access Program for PyCharm 2023.3 brings improvements to:
- Support for Type Parameter Syntax (PEP 695).
- Django Structure view.
- Django Live Preview.
These are the most important updates for this build. For the full list of changes in this EAP build, read the release notes.
We’re dedicated to giving you the best possible experience, and your feedback is vital. If you find any bugs, please report them via our issue tracker. And if you have any questions or comments, feel free to share them in the comments below or get in touch with us on X (formerly Twitter).
November 14, 2023 07:57 PM UTC
PyCoder’s Weekly
Issue #603 (Nov. 14, 2023)
#603 – NOVEMBER 14, 2023
View in Browser »
SciPy Builds on Windows Are a Minor Miracle
Moving SciPy to Meson meant finding a different Fortran compiler on Windows, which was particularly tricky to pull off for conda-forge. This blog tells the story about how things looked pretty grim for the Python 3.12 release, and how things ended up working out just in the nick of time. Associated HN discussion.
ALEX OBERMEIER
An Unbiased Evaluation of Environment and Packaging Tools
This detailed article covers the wide world of packaging in Python, how the different tools overlap, and how each has its own area of specialization. A great deep dive on all the choices out there that can help you pick the right tool for your project.
ANNA-LENA POPKES
Automate LLM Backend Deployments Using Infrastructure as Code
New GitHub project to provision, update, and destroy the cloud infrastructure for a LLM backend using infrastructure as code (Python). Deployment options include deploying Hugging Face models to Docker (local), Runpod, and Azure →
PULUMI sponsor
Document Your Python Code and Projects With ChatGPT
Good documentation is a critical feature of any successful Python project. In practice, writing documentation is hard and can take a lot of time and effort. Nowadays, with tools like ChatGPT, you can quickly document your Python code and projects.
REAL PYTHON
Discussions
Articles & Tutorials
Python Errors as Values
Error handling can be done in a variety of ways, and this article discusses why one organization decided to use returned error values instead of exceptions. Along the way, you’ll see comparisons between Python, Go, and Rust to better understand the different mechanisms.
AARON HARPER
Guide to Hash Tables in Python
Hash tables offer an efficient and flexible method of storing and retrieving data, making them indispensable for tasks involving large data sets or requiring rapid access to stored items. Python’s dict is a hash, learn how it works and how it can help your code.
DIMITRIJE STAMENIC
Confusing git Terminology
Julia is working on a doc that explains git and in doing so polled some people about what git terminology they found confusing. This post covers the most common responses and attempts to clear up the confusion.
JULIA EVANS
Check if a Python String Contains a Substring
In this video course, you’ll learn the best way to check whether a Python string contains a substring. You’ll also learn about idiomatic ways to inspect the substring further, match substrings with conditions using regular expressions, and search for substrings in pandas.
REAL PYTHON course
Building a Python Compiler and Interpreter
This article starts the journey of building a compiler and interpreter for the Python programming language, in Python. You’ll learn all about tokenizing, parsing, compiling, and interpreting.
RODRIGO GIRÃO SERRÃO • Shared by Rodrigo Girão Serrão
TIL: Django Constraints
Constraints in Django allow you to further restrict how data is managed in the database. This quick post covers how to use the CheckConstraint and UniqueConstraint classes in Django.
SARAH ABDEREMANE
PEP 733: An Evaluation of Python’s Public C API
This is an informational PEP describing the shared public view of the C API in Python. It talks about why the C API exists, who the stakeholders are, and problems with the interface.
PYTHON.ORG
What Stage Startup Offers the Best Risk-Reward Tradeoff?
A deep dive on the success rate statistics of startups in the US with analysis on what joining at different stages means to a stock package payout.
BILLY GALLAGHER
Let’s Make a Silly JSON-like Parser
This article goes into deep detail on how you would construct a JSON parser in Python. If you’re new to parsing, this is a great place to start.
ARUN MANI J
Rust vs. Go, Java, and Python in AWS Lambda Functions
A performance comparison of JSON parsing in AWS Lambda functions using Rust, Go, Java, and Python.
CLIFF CROSLAND
Everything You Can Do With Python’s bisect
Learn how to optimize search and keep your data sorted in Python with the bisect module
MARTIN HEINZ • Shared by Martin Heinz
Projects & Code
Events
Weekly Real Python Office Hours Q&A (Virtual)
November 15, 2023
REALPYTHON.COM
PyData Bristol Meetup
November 16, 2023
MEETUP.COM
PyData Karlsruhe #8
November 16, 2023
MEETUP.COM
PyLadies Dublin
November 16, 2023
PYLADIES.COM
Hamburg Python Pizza
November 17 to November 18, 2023
PYTHON.PIZZA
PyCon ID 2023
November 18 to November 20, 2023
PYCON.ID
PyCon Chile 2023
November 24 to November 27, 2023
PYCON.CL
Happy Pythoning!
This was PyCoder’s Weekly Issue #603.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
November 14, 2023 07:30 PM UTC
Real Python
Python Basics: Modules and Packages
As you gain experience writing code, you’ll eventually work on projects that are so large that keeping all the code in a single file becomes cumbersome.
Instead of writing a single file, you can put related code into separate files called modules. You can put individual modules together like building blocks to create a larger application.
In this video course, you’ll learn how to:
- Create your own modules
- Use modules in another file through the
importstatement - Organize several modules into a package
This video course is part of the Python Basics series, which accompanies Python Basics: A Practical Introduction to Python 3. You can also check out the other Python Basics courses.
Note that you’ll be using IDLE to interact with Python throughout this course.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 14, 2023 02:00 PM UTC
Python Morsels
Solving programming exercises
How can you maximize the learning value from each coding challenge you solve?
Table of contents
Outline an approach and walk away 💭
Start by outlining your approach in a docstring or a comment. Be detailed, but use rough descriptions and pseudocode. You'll likely find yourself rereading the problem statement multiple times as you outline your approach.
For a challenging problem where you're likely to get stuck, time-box your outlining time. For example, set a timer for 15 minutes and then start outlining. When the timer goes off, walk away.
Walking away will let your brain work on the problem in the background. This will decrease the stress of getting stuck on a problem and allow your brain to be more creative because you're now unencumbered by the need to solve the problem quickly.
Ideally, after outlining the problem you might take a shower, make yourself a meal, or go for a walk. If you can, try to perform an activity that doesn't require intent focus, so your brain can wander.
When you walk away from an exercise before it's complete, you're likely to keep pondering it. You might realize your approach has a flaw or you might think of a completely different approach. The next time you sit down to solve your programming exercise, you'll likely find that you're a bit more eager to jump in than you would have if you'd kept coding right after outlining.
Time-box yourself ⏲️
Ready to sit down and …
Read the full article: https://www.pythonmorsels.com/programming-exercise-tips/
November 14, 2023 02:00 PM UTC
scikit-learn
NVIDIA Is A New Sponsor Of The Scikit-Learn consortium at the Inria Foundation
Sponsored blog post
We are thrilled to announce that NVIDIA has joined the scikit-learn consortium as a corporate partner. As a leading provider of GPU-accelerated computing solutions, we at NVIDIA recognize the importance of machine learning and the role it plays in the growth of many industries and areas of science. Our partnership with the scikit-learn consortium demonstrates our commitment to supporting the development and advancement of open-source software in the machine learning community.
Scikit-learn is a popular open-source Python library for machine learning. One of the strengths of scikit-learn is its ease of use and well-defined API. This makes it a favorite tool among data scientists and machine learning practitioners. Thanks to its active community and continuous development, scikit-learn is constantly evolving and improving.
At NVIDIA, we believe that investing in open-source projects like scikit-learn is important. Afterall, it is a central component of the modern data stack in both science and industry. By financially supporting the scikit-learn consortium, we are contributing to the long-term sustainability of scikit-learn and helping to ensure that it remains an easy to use, reliable and valuable tool for years to come. Furthermore, we hope to help advance the project’s development, improve its performance, and enhance its capabilities for machine learning on GPUs.
Our partnership with the scikit-learn consortium will also enable us to collaborate more closely with the scikit-learn community, and provide us with insights into how we can improve NVIDIA’s RAPIDS open-source libraries to better serve their needs. We are committed to working with the foundation to ensure that scikit-learn remains a powerful and easy to use machine learning library that meets the needs of data science practitioners in science and industry.
NVIDIA’s commitment to scikit-learn goes beyond financial support. We have hired Tim Head, an experienced open-source maintainer, to work full-time on the project. This is not Tim’s first open-source rodeo. He has previously contributed to several high-profile open-source projects, including Project Jupyter. His focus will be reviewing pull requests and coordinating the development of large features. Tim was recently elected as a core maintainer of scikit-learn. His expertise and experience will be invaluable in ensuring the continued growth and success of the project.
In summary, NVIDIA’s partnership with the scikit-learn consortium is an important step in our ongoing commitment to support the development and growth of open-source software in the machine learning community. We are excited to work with the foundation and the community of contributors to help advance the capabilities of scikit-learn and accelerate the development of machine learning applications.
AI helped write this blog post!
November 14, 2023 12:00 AM UTC
Seth Michael Larson
Querying every file in every release on the Python Package Index
Querying every file in every release on the Python Package Index
Querying every file in every release on the Python Package Index
Published 2023-11-14 by Seth Larson
Reading time: minutes
This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!
Last week I published a graphic showing the use of memory safe and unsafe systems programming languages in Python packages which garnered some interest from the community how I was creating such a graphic.
The graphic used file extension information which isn't a perfect method for detecting other programming languages, but likely good enough for trends and identifying projects.
What is interesting about this graphic is it needs access to files within Python distributions like wheels and source distributions on PyPI. This is something that's difficult to access without actually downloading the artifact. So how can I query this information for every package since 2005?
I used the same dataset previously to detect vulnerable WebP binaries bundled in Python packages. Let's explore how to use this dataset to answer other questions!
None of this article would be possible without the work of Tom Forbes to create and continually update this dataset. Thanks Tom for all your work and for helping me get started.
Why is this data useful?
I'm also doing work on a few different projects regarding Python packaging metadata, namely PEP 639 and "Tracking bundled projects in Python distributions". Having this dataset available gives me a bunch of contextual information for those projects as well as being able to track adoption of new packaging metadata.
There was also a bit of emphasis about memory-safe programming languages in the recent US Government RFI, and I was the author for the section regarding memory safety. I wanted to explore the Python package ecosystems' current usage of memory safe languages like Rust and Go compared to C, C++, and Fortran. From the above graphic it seems there's some interest in using memory-safe languages which is nice to see.
The need to be able to query this dataset for multiple projects meant it probably made a bit of sense to create a small utility that can be reused, including by others (yay open source!) I created a small Gist that includes this utility. It's not optimized (actually quite slow if you don't use threads when downloading of files).
Downloading the file metadata dataset
⚠️ WARNING: PLEASE READ! ⚠️
A word of warning before we start blindly downloading all the things, these datasets are all very large, like 30+ GB just for the high-level metadata in Parquet files. Make sure you have enough storage space before copying and pasting any commands you see in this blog post. I don't want to hear that anyone's filled up their hard-drive without knowing. You have been warned! 🐉
With that out of the way, let's get started ourselves! The entire dataset is available under the pypi-data GitHub organization with varying levels of detail all the way from high-level metadata and filenames to actual file contents.
There are many datasets available on py-code.org/datasets. The Clickhouse dataset isn't completely up-to-date but as a way to experiment with the dataset it can be an easy place to play around and get started. We want the complete up-to-date dataset though, so we need to download things locally. We want the "Metadata on every file uploaded to PyPI" dataset.
To download the dataset there's a series of curl commands:
$ curl -L --remote-name-all $(curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt")
Two curls in one (at least there's no ... | sudo sh involved...) let's examine the innermost curl first
and use a local copy instead of fetching from the network:
$ curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt" > dataset.txt
$ cat dataset.txt
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-0.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-1.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-10.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-11.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-12.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-13.parquet
https://github.com/pypi-data/data/releases/download/2023-11-12-03-06/index-14.parquet
...
It's a list of URLs that all look legit, let's download those (this will take some time):
$ curl -L --remote-name-all $(cat dataset.txt)
$ ls
index-0.parquet index-12.parquet index-1.parquet index-4.parquet index-7.parquet
index-10.parquet index-13.parquet index-2.parquet index-5.parquet index-8.parquet
index-11.parquet index-14.parquet index-3.parquet index-6.parquet index-9.parquet
Querying the dataset
In order to take full advantage of this dataset we can query the top-level Parquet metadata and subsequently download the underlying individual files only when
necessary. I've created a small helper as I mentioned earlier (pycodeorg module below) to assist with these examples.
The dataset uses Parquet as a data storage format which is columnar and can be queried using DuckDB. This is the first project I've used DuckDB with and
from first impressions it seems like a lovely piece of software. Before we start creating our query I like to see what the dataset fields and types are so lets run a DESCRIBE:
DESCRIBE SELECT * FROM '*.parquet';
┌─────────────────┬─────────────┬─────────┐
│ column_name │ column_type │ null │
│ varchar │ varchar │ varchar │
├─────────────────┼─────────────┼─────────┤
│ project_name │ VARCHAR │ YES │
│ project_version │ VARCHAR │ YES │
│ project_release │ VARCHAR │ YES │
│ uploaded_on │ TIMESTAMP │ YES │
│ path │ VARCHAR │ YES │
│ archive_path │ VARCHAR │ YES │
│ size │ UBIGINT │ YES │
│ hash │ BLOB │ YES │
│ skip_reason │ VARCHAR │ YES │
│ lines │ UBIGINT │ YES │
│ repository │ UINTEGER │ YES │
├─────────────────┴─────────────┴─────────┤
│ 11 rows 6 columns │
└─────────────────────────────────────────┘
Now that we know the form of the dataset we can make our first query. Let's create a query for projects per file extension and split that by month. That query would look something like this:
SELECT (
-- We're bucketing our data by month and extension --
datetrunc('month', uploaded_on) AS month,
regexp_extract(path, '\.([a-z0-9]+)$', 1) AS ext,
-- DuckDB has native list/object manipulation, pretty cool! --
LIST(DISTINCT project_name) AS projects
)
FROM '*.parquet'
WHERE (
-- Our regex for matching files for languages we care about --
regexp_matches(path, '\.(asm|c|cc|cpp|cxx|h|hpp|rs|[Ff][0-9]{0-2}(?:or)?|go)$')
-- Filter out test files and whole virtual environments --
-- embedded in Python distributions. --
AND NOT regexp_matches(path, '(^|/)test(|s|ing)')
AND NOT contains(path, '/site-packages/')
)
GROUP BY month, ext
ORDER BY month DESC;
With this query and some data massaging we can create this graphic and see how Rust is driving the majority of memory-safe programming language use in binary Python distributions:
Accessing file data
Previously it was very difficult to learn about the adoption of new packaging metadata standards
and fields due to the prohibitively large bandwidth, storage, and CPU cost that came with downloading an entire
swath of PyPI and unpack their contents only to examine a small METADATA or WHEEL file. However, with this dataset
we can write a simple query and fetch only the files we need to get the answers to the above questions:
SELECT repository, project_name, path
FROM '*.parquet'
WHERE (
-- We only want distributions uploaded in --
-- October 2023 for a recent snapshot. --
datetrunc('month', uploaded_on) = DATE '2023-10-01'
-- We want .dist-info/WHEEL files from wheels --
AND regexp_matches(path, '\.dist-info/WHEEL$')
-- And files shouldn't be skipped since we can't call --
-- `get_file()` on these, like if they're empty or binaries. --
-- Pretty unlikely! --
AND skip_reason == ''
);
substitute this query in for the QUERY variable below:
import re
import pycodeorg
QUERY = ...
# Find all 'WHEEL' metadata files in wheels:
for repo, project, path in pycodeorg.query(QUERY):
# Fetch the file data from the dataset
data = pycodeorg.get_data(repo, project, path)
# Then parse the 'Generator' field and aggregate
if match := re.search(rb"\nGenerator:\s*([\w]+)", data):
builder = match.group(1).decode()
...
This query allows me to provide this data, which to my knowledge isn't available yet
and from this we can answer questions like which wheel builder is most common (which are bdist_wheel by a wide margin, then poetry and hatch)
and which packaging metadata fields are in use. I'm excited to see what other insights folks are able to gather
from using this dataset!
Other items
- The Request for Information (RFI) response for the Python Software Foundation has been submitted. Hope that our submission will be available on regulations.gov soon. We'll write a blog post on the PSF blog sharing the response once its available.
- PyPI's security audit blog post and corresponding post by Trail of Bits have been published. I didn't work directly on this project but it's so exciting to see the results of this work be shared.
- Pushed the blog post announcing the "Becoming a CVE Numbering Authority as an Open Source project" into final draft, now working with OpenSSF marketing to schedule the post for the blog.
- Received a shoutout from Carol Willing during her keynote at PyCon Sweden. Thanks, Carol!
That's all for this week! 👋 If you're interested in more you can read last week's report.
Don't let social media algorithms decide what you want to see.
Never miss an article and support the decentralized web. Get guaranteed notifications for new publications by following the RSS feed or the email newsletter. Send any thoughts and questions you have via Mastodon or email.
Thanks for reading!
— Seth
This work is licensed under CC BY-SA 4.0
November 14, 2023 12:00 AM UTC
November 13, 2023
Real Python
JupyterLab for an Enhanced Notebook Experience
Maybe you’ve already worked with Jupyter Notebooks from Project Jupyter to create documents containing runnable code. You can achieve even more with JupyterLab, a tool kit that you can use to document and share your research, teaching, and learning activities. It’s useful in a wide range of disciplines, from data analysis and data visualization to scientific study.
JupyterLab enhances your notebooks by providing a browser-based interface that allows you to use multiple notebooks together effectively. In addition, it offers you a comprehensive Markdown editor, file manager, file viewer, and an infrastructure that enables you to run code from a wide range of files.
In this tutorial, you’ll learn how to:
- Share code between multiple Jupyter Notebooks
- Debug a Jupyter Notebook
- Create and manage Markdown files
- Run embedded code from a range of different files
- Manage and view different file types from a single interface
- Access your operating system from within JupyterLab
Jupyter is a portmanteau word blended from the three programming languages Julia, Python, and R. Although you’ll focus on Python in this tutorial, you can use Jupyter with the other languages as well. Plus, this free application works on macOS, Linux, and Windows environments.
JupyterLab takes Jupyter Notebook usage to a different level, so you’ll get the most out of this tutorial if you’re already familiar with Jupyter Notebook.
Free Bonus: Click here to download notebooks and files that you can play with in JupyterLab.
Installing and Starting JupyterLab
The cleanest way of installing JupyterLab on a computer is to use a virtual environment. This will ensure that your JupyterLab work doesn’t interfere with any other Python projects or environments that you may already have. For this tutorial, you’ll create a new virtual environment named jl_venv. Select your operating system to get JupyterLab up and running:
Of course, once you’ve finished this tutorial, you can delete tutorial_project and add in your own project-specific folders instead.
Note: If you wish, you could create a Samples subfolder within tutorial_project and save this tutorial’s downloadable files into it. These include completed versions of the notebooks that you’ll create later on, as well as some other files. This will also give you some files to play around with and will allow you to fully participate in the tutorial.
JupyterLab will start in your web browser, all ready for you to use. But before you dive in, you might want to know how to end your session:
-
To shut JupyterLab down, make sure everything is saved, and then use File → Shut Down to close the application before closing your browser. This will close everything down cleanly. Closing the browser alone doesn’t close the server, while crashing the server may cause data loss.
-
To restart, open either Powershell or your terminal, navigate to your
jupyterlab_projectsfolder, then activatejl_venv. Finally, create or enter your specific project’s folder then start JupyterLab as before. -
To deactivate your virtual environment, use the
deactivatecommand. Your command prompt will return to normal.
Once you’ve installed and started JupyterLab, its server will start, along with a web browser connection to it. It may take a moment, but soon you’ll be looking at its main interface:
Because this is your first time running JupyterLab, the front screen shown above contains only a single Launcher window. This is where you can access everything else that’s on offer.
Note: Before you start using JupyterLab, you may like to change its appearance to make it easier for you to use. There are several options available to you:
-
You can hide or display various screen regions using View → Appearance. This is useful if you have a small monitor.
-
You can change the overall theme of the interface by opening Settings → Theme. Themes may help you see more clearly.
-
You can also increase and decrease various font sizes using the options under Settings and Settings → Theme. These may help clarify text.
You can even use your favorite coding font by accessing Settings → Settings Editor and then scrolling down the list of settings on the left until you reach Notebook. Once you’re there, fill out the font family and font size according to your preferences. Then close the Settings tab:
As you can see from the screenshot, you’ve updated the font within your notebooks. If you don’t like your adjustments, then click the big red Restore to Defaults button that appears at the top-right of the Settings screen, and no harm done.
In the upcoming sections, you’ll perform a range of tasks highlighting how JupyterLab’s tools enhance the capability of notebooks. You’ll also see some other interesting features as well.
Understanding JupyterLab Kernels
JupyterLab’s tools support you in your work. Although the tools are self-contained, by using some of them together, you get more out of them. This integration is probably JupyterLab’s most powerful feature.
A good starting point when learning JupyterLab is for you to know what its basic components are and how to make them work together. The diagram below shows an overview of these:
This diagram may look overwhelming at first because there are several parts. Don’t worry, you’ll soon see their relevance. The arrows show how various components interact. These interactions are one of the great benefits of JupyterLab. You’ll start with the central part of the application and the diagram: the kernel.
Read the full article at https://realpython.com/using-jupyterlab/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
November 13, 2023 02:00 PM UTC
Mike Driscoll
PyDev of the Week: Phil Ewels
This week, we welcome Phil Ewels as our PyDev of the Week! Phil is the creator of the rich-click package. You can find Phil on Twitter, Mastodon, or BlueSky.
You can also catch up with Phil over on GitHub.
Let’s spend a few moments getting to know Phil better!

Can you tell us a little about yourself (hobbies, education, etc):
I’m a scientist turned software developer from the UK, now living in Sweden. My education and career path is a little winding: I studied Biochemistry at Bristol and went on to do a PhD in Molecular Biology at Cambridge, working with techniques to tease out the three-dimensional structure of DNA in order to figure out why some cancers happen more often than we expect. During school and university I did web development on the side for fun, and at some point the two interests fused and I got into the field of bioinformatics – computational analysis of biological data. I ended up at SciLifeLab in Sweden working at a national center doing DNA sequencing and analysis for academic researchers. I’ve been lucky enough to have the opportunity to be behind some popular open-source software for scientists over the years and recently joined Seqera to focus on building scientific OSS and the communities that drive them.
I have three young children so I don’t have much time for hobbies these days! But if I get a chance I love to get out on my bike or into the mountains, and I spent much of my earlier life rowing competitively.
Why did you start using Python?
I switched from Perl to Python when I moved to Sweden, as it was used by everyone else in the group. It immediately clicked with me and I never looked back. I’ve never had any formal training, but I had some great mentors in those early days – Per Kraulis, Mario Giovacchini and Guillermo Carrasco to name a few. I learnt quickly, mostly through their pull-request reviews!
What other programming languages do you know and which is your favorite?
My first languages were BASIC on my older brother’s BBC Micro and then Actionscript in Macromedia Flash! Then PHP, JavaScript, Perl and Python. More recently, I’ve done a lot of work with Nextflow which is a domain-specific language based on Groovy. If I start a new project I’ll always reach for Python, unless there’s a compelling reason to do otherwise.
What projects are you working on now?
As of this month I’m project manager for open-source software at Seqera, so my official remit covers Nextflow (data pipelines), MultiQC (data visualisation / reporting), Wave (on-the-fly containers) and Fusion (fast virtual filesystems). I’m also the co-founder of nf-core and am heavily involved in writing the developer tools for that community. Those are my day job projects, after that I have a long tail of pet projects lurking on GitHub that never quite get the attention that they deserve! The most recent ones are rich-click (see below) and rich-codex, for auto-generating terminal output screenshots from simple markdown commands.
Which Python libraries are your favorite (core or 3rd party)?
I have a soft spot for user-friendly interfaces and fell in love with Rich by Will McGugan / theTextualize team the moment I came across it. More recently they released Textual which is pure magic and a delight to work with. Before Textual I used Questionary for interactive CLI prompts, which is fun. Oh and I love all ofSebastián Ramírez’s libraries, but especially FastAPI. Those are beautiful in their simplicity and a great source of inspiration, as well as being super intuitive to work with. Also worth a mention are Pyodide and Ruff – neither are really libraries, but both do really cool things with Python code, in different ways.
How did the rich-click package come about?
As mentioned above, I’m a bit of a fan-boy when it comes to Rich. I had used it to pretty good effect to make the CLI output of most of my tools colourful. As a result, the output from Click (used for the CLI interface) started to stand out as being a bit.. bland. I’d been toying with the idea of building a plugin for a few months, then Will launched rich-cli complete with an attractive CLI. I took his code (with his permission, and help) and generalised it into a package that would do the same for any Python package using Click to generate a CLI. My priority was to make it as simple to use as possible: the basic usage is simply `import rich_click as click`, and the CLI output looks magically nicer. The package has since found its way into quite a few people’s projects and inspired native Typer functionality which is cool. Most recently, Daniel Reeves offered to help with maintenance which I hope gives the project a more stable base to live on (he also did a cool new logo!).
What is your elevator pitch for the MultiQC tool?
MultiQC is a data visualisation tool that understands the log outputs from hundreds of different scientific software packages. At the end of a typical genomics analysis run you might have outputs from 10s – 100s of different tools for 10s – 1000s of samples. MultiQC can scoop up the summary statistics from all of these and generate a Human-readable HTML report summarising the key quality-control metrics, letting you spot any outlier samples. Whilst MultiQC’s origins lie in genomics, it doesn’t do anything specific to that field, and it’s built in a modular way that makes it easy to extend to any tool outputs. You can play around with it in your browser (the recent Pyodidetoy I’ve been playing with) here.
Is there anything else you’d like to say?
Just a big thanks to you and everyone in the Python community for being such a lovely bunch of people. Python folk are one of a small handful of enclaves that I still really enjoy interacting with on social media and I think much of that stems from our diverse backgrounds. Long may that continue! Oh, and if anyone fancies writing a rich-pre-commit library that I could use, then that’d be awesome!
Thanks for doing the interview, Phil!
The post PyDev of the Week: Phil Ewels appeared first on Mouse Vs Python.
November 13, 2023 01:30 PM UTC
TechBeamers Python
DataProvider in TestNG – All You Need to Know
This tutorial explains every little detail about the TestNG data providers. For example, what is dataprovider in TestNG, what is its purpose, is the dataprovider same in all versions of TestNG, or added more features? Furthermore, we’ll cover what is the easiest as well as the best method methods to use in Java. Also, we’ll [...]
The post DataProvider in TestNG – All You Need to Know appeared first on TechBeamers.
November 13, 2023 01:24 PM UTC
Django Weblog
2024 DSF Board Candidates
Thank you to the twelve individuals who have chosen to stand for election. This page contains their submitted candidate statements. Our deepest gratitude goes to our departing board member, Aaron Bassett, for your contributions and commitment to the Django community. Those eligible to vote in this election will receive information on how to vote shortly. Please check for an email with the subject line "2024 DSF Board Voting".
Chris Achinga Mombasa, Kenya
My Software development career was highly influenced by developer communities. Participating in tech meet-ups and events, notably DjangoCon Africa, has not only expanded my technical skills but also shaped my approach to both personal and professional growth. This experience has motivated me to seek a position on the Django Software Foundation Board, especially after the talks from Anna Makarudze on Navigating the Open-Source World as a Minority, that highlighted the challenges of organising events that benefits African communities As an advocate for African and minority communities within the tech ecosystem, I aspire to bring a unique and necessary perspective to the DSF Board. My commitment to volunteering and giving back to the community aligns perfectly with the ethos of the Django community. My experiences have taught me the value of dedicated community organizers who selflessly share resources and knowledge, fostering an environment where developers at all levels can thrive.
Joining the DSF Board would enable me to champion the interests of young and emerging developers globally, particularly from underrepresented regions. I aim to ensure that everyone, regardless of their background, has equitable access to the opportunities that Django, both as a community and a web development framework, can offer.
In my role with a Non-Governmental Organization aiding youth groups along the Kenyan Coast(Swahilipot Hub Foundation), I've garnered experience in community engagement and utilizing technology for social good. This experience has been instrumental in creating Django-based platforms that empower community self-management. My presence on the DSF Board would not only represent these communities but also allow me to serve as a mentor and technical advisor.
I am eager to contribute my insights and leadership to the DSF Board. With your support, I hope to make a meaningful impact, fostering an inclusive and dynamic environment where every developer can achieve their full potential.
David Vaz Porto, Portugal
Software developer for over 20 years, fell in love with django almost at the beginning of his journey 2007, version 0.96. He loves Django and Python so much he has been bringing developers to the community since then, ended up starting his consultancy firm around these technologies.
During DjangoCon Europe 2019 at Copenhagen he decided to take the next step helping the community, proposing to organize DjangoCon Europe 2020 in Portugal. He got more than he bargained for, ending up co-organising the first virtual-only DjangoCon Europe, repeating in 2021, and finally a hybrid DjangoCon Europe in 2022. His effort, together with the team around him, was rewarded with success, the 2022 edition had record breaking attendees with 500+ in person and 200+ online. To keep things going he is also co-organising DjangoCon Europe in 2024 in Spain Vigo, hoping to bring the Spanish community closer.
David is also contributing to the Portuguese Python Community, starting in 2022 the very first PyCon Portugal. His drive is to bring The Portuguese community forward, with a different city every year to increase the reach of the conference. The first edition was in Porto, leveraging on DjangoCon Europe 2022, this year it was in Coimbra, with participants from over 25 countries, and we are already preparing the next edition.
David is enthusiastic, committed and pragmatic. Throughout his personal and professional journey, he has always had a positive impact in every process he puts his mind on, influencing, building and empowering the people around him. He hopes to put his experience to good use in Django Software Foundation.
Jacob Kaplan-Moss Oregon
I was one of the original maintainers of Django, and was the original founder and first President of the DSF. I re-joined the DSF board and have served for the last year. Outside of Django, I'm a security consultant at Latacora, and have previously ran engineering and security teams at 18F and Heroku.
When I ran for the board last year, I wrote:
> I'd be coming back to the DSF with a bunch of experience in executive leadership and more experience working with nonprofits. I think I can apply those skills, along with my general knowledge of the Django community, to push things forward. What that means, specifically, isn't entirely clear yet. I'd plan to spend the first months of my board term asking a bunch of questions and listening.
I did that asking-questions-and-listening, and what needs doing at the DSF became clear. I'd most succinctly articulate it as: "new blood".
The Django community is super-vibrant and new people are joining the community all the time, but it's very hard for people to "level up" and move to any sort of leadership position at the DSF or among the core team. We just don't have very many opportunities for people to have an impact, and we don't have good "onramps" to that work.
So, this term, I (with the rest of the board) started building some of these opportunities onramps! The recently-announced working group and membership changes are the start of this, and if re-elected I'd want to continue working in this direction. It's now easier for people to join the DSF, and easier for them to spin up working groups to do impactful work. But now we need to start defining these groups, funding them, and continuing this growth.
Jay Miller United States
The Django community often serves as a great example for many aspects of the broader Python community. Our community shines when many of us get involved. To make this happen, we need to encourage greater community involvement.
My goals for the next two years, if elected, are to increase the amount of information we share with the community while reducing the time it takes to disseminate that information to the community.
I intend to utilize the existing channels in the Django and the larger Python community. We will also establish new official communication channels for the foundation. These channels will be managed by a Communications Working Group.
The second effort is to extend our reach to a global and diverse audience. We understand that our impact can extend far beyond our current scope by expanding working groups. Therefore, I would work to create and support working groups that currently lack direct representation in the DSF. I would also advocate for decisions that directly impact these areas to be developed and executed by those individual groups with DSF support.
I hope that you will support me in this vision, which aims to increase the visibility and support of the DSF to the farthest reaches of the community.
Mahmoud Nassee Cairo/Egypt
I really like helping people and also helping this awesome community to grow. I don't have much to say 🙂.. But I really like volunteering work it helps me to make something that I could be proud of and also make some new friends!
Ngazetungue Muheue Namibia
I'm Ngazetungue Muheue, a dedicated software developer, community advocate, and a member of the Django Software Foundation (DSF). I'm also the founder of the Python and Django Community in Namibia. Despite facing unique challenges as a member of underprivileged communities and living with a disability, I've played a significant role in expanding Django by establishing and contributing to various Django and Python communities in Africa and Namibia.
Recognizing the importance of open-source communities and user-friendly technology, I've worked closely with students and underprivileged individuals to bridge the tech gap by involving them in Django user groups, teaching Django, and fostering their participation in the global tech community. As a visionary leader, I've cultivated a culture of collaboration, inclusivity, and continuous learning within the African tech ecosystem. My contributions include organizing the inaugural DjangoCon Africa in 2023 and actively participating in organizing and volunteering at DjangoCon Europe in 2023 and 2022, advancing the growth of the Django ecosystem. I've also spoken at various PyCon events worldwide, showcasing my commitment to fostering the global Django and Python community.
As a board member of the Django Software Foundation, my primary goal is to expand Django communities worldwide, connect underprivileged community members with the DSF, and enhance the inclusivity of the Django community. This involves translating Django documentation for non-English speakers, increasing project grants, integrating people with disabilities into the community, and creating internship opportunities for a more diverse and empowered Django community.
Joining the DSF board will enable me to inspire and support nations in engaging young and underprivileged individuals in tech-related activities while safeguarding the interests and mission of our community and the DSF. More links: https://twitter.com/muheuenga https://2023.djangocon.africa/team https://twitter.com/djangonamibia https://na.pycon.org/ https://pynam.org/django/
Paolo Melchiorre Pescara, Italy
Ciao, I'm Paolo and I live in Italy.
I've been a contributor to the Django project for years, and a member of the DSF. I attended my first DjangoCon Europe in 2017 and have since presented many Django talks at conferences around the world. I've participated as a coach in DjangoGirls workshops several times, and I organized one in my hometown. I've always been a Python developer, I helped the PyCon Italia organization for a few years and I recently founded the Python Pescara meetup.
As a member of the DSF board of directors, I would like to bring a different point of view to the foundation, as a southern European citizen, inhabitant of the Mediterranean area, non-native English speaker, and a small company employee.
Some initiatives I would like to carry forward are:organize active user sprints to focus on specific Django features continue the work of renovating the Django project website create synergies with the Python community and its web sub-communities simplify Django documentation and help its translations support creators of Django content (e.g. books, articles, podcasts, videos, ...)
Peter Baumgartner Colorado, USA
I'm a current DSF board member and acting Treasurer.
I've been a part of the Django community for over 15 years. I'm an open-source contributor, a regular speaker at DjangoCon US, and the co-author of High Performance Django. In 2007, I founded Lincoln Loop, a web agency that leverages Django extensively in its work. Lincoln Loop has financially sponsored the DSF and DjangoCon for many years, and I'm looking for other ways to give back to a community that has given us so much.
At Lincoln Loop, I have to wear many hats and deeply understand the financial ramifications of our decisions as a company. I believe the experience of running a business will be directly applicable to a position on the DSF board, and I look forward to applying that experience if elected.
Sarah Abderemane Paris, France
I'm an active DSF member and I've been contributing to this amazing community via multiple ways:
Django contributor and Accessibility Team Member Maintainer of djangoproject.com Organizer of Djangonaut Space Organizer of Django Paris Meetup Organizer of DjangoCon Europe 2023
I have seen many aspects of the community through all those experiences. As a relatively new member, I can bring a fresh perspective to the community and help foster a renewed sense of togetherness. I have a strong connection with Djangonaut Space mentoring program and the community. I'm well positioned to serve as an intermediary, facilitating communication regarding initiatives and ideas between the board and the community.
I would like to increase fundraising by improving communication and making improvements to make each sponsor special by highlighting sponsors not only on the website but also on social networks. Relying on my experiences with various Django projects, I will push forward ideas to further develop our community, specifically helping existing and new contributors.
With the community's support, I will set up a working group for mentorship and push accessibility in the framework. I am passionate about these topics as they show that Django is a framework for everyone by everyone.
I see myself as a representative of Django's diversity and would like to emphasize and expand the richness of it even more. Being part of the board would inspire people to get involved and be part of the community. They could add their stone to the building of this wonderful community.
Thibaud Colas Europe
To me, Django feels like it's in maintenance mode, a decade behind in areas like front-end development and serverless. To stay relevant compared to projects with tens of millions in venture capital, we need a more vibrant, more diverse community. We can build one together by making the right programs happen, like Djangonaut Space and Outreachy.
The DSF also needs to evolve with the times. In the age of ChatGPT, copyright and trademarks are very dated concerns. We need a foundation that can help its community navigate modern societal challenges: social equity issues affecting our users; accessibility issues plaguing the Django web; climate change and Django's carbon footprint.
I can help. Let's grow Django's contributors 10x, and have the Django universe lead by example in community-driven open source.
Tom Carrick Amsterdam, Netherlands
I've been using Django since 2008. A lot has changed since then, but one constant has been my wish to see Django continuously improve.
I'm active in the community in many ways. I've been a regular code contributor since 2016. I founded the accessibility team, and also started the official Discord server. So I've dedicated quite some time to Django already, but I have room for more, with even more impact.
I would like to help grow the next generation of Django contributors, from more diverse backgrounds. From running DjangoCon sprint tables over the years, and getting involved with Djangonaut Space, it's clear to me that the new contributor experience has substantial room for improvement.
I also want to expand Django's fundraising efforts. It's becoming difficult to add important new features. We need more funding to hire more Fellows, and expand their remit to work on bigger features.
The new working groups are a much needed initiative, and I'd love to help develop all these ideas to their fullest potential.
Velda Kiara Nairobi, Kenya
As a passionate software developer and technical writer deeply rooted in the open-source community, I am honored to be running for the DSF board. My experience in contributing to open-source projects, coupled with my leadership background in the Open Source Community Africa Nairobi, has ignited my desire to enhance the participation and contributions of communities from diverse backgrounds. My involvement in open-source initiatives has made me appreciate the power of collaboration and the impact of collective efforts. I have witnessed firsthand how open-source communities foster innovation and inclusivity, enabling individuals from all over the world to share their knowledge and expertise.
Driven by my belief of open source impact, I aspire to elevate the DSF board's decision-making process by incorporating the unique perspectives and insights of communities from diverse backgrounds. My experience working with developer communities has equipped me with the skills and empathy necessary to understand and address the specific needs of these underrepresented groups. As a leader, I prioritize decision-making that aligns with the needs and aspirations of the community. I believe in fostering an environment where everyone feels empowered to participate, contribute, and lead. My commitment to inclusivity extends beyond the color of one's skin; I envision a DSF community that embraces and celebrates the diversity of thought, experience, and background.
My passion for Django and my role as an advocate for the framework extend beyond personal preference. I recognize the immense value of Django to the developer community and am eager to contribute further through the DSF board. I believe that my involvement will allow me to add value to the Django community, supporting its growth and ensuring that it remains a thriving hub for developers worldwide. My journey in the open-source community began with a fascination for the framework. However, over time, I have come to realize that the true beauty of open-source lies in the community that surrounds it. I am committed to giving back to this community, not just as a developer or technical writer, but also as a leader and advocate for diversity and inclusion.
I humbly ask for your vote to join the DSF board and contribute my skills, experience, and passion to the continued growth and success of the Django community. Together, we can create a more inclusive and vibrant open-source ecosystem that empowers individuals from all backgrounds to innovate, collaborate, and make a lasting impact on the world.
November 13, 2023 12:32 PM UTC
Python GUIs
How to Restore the Window's Geometry in a PyQt App — Make Your Windows Remember Their Last Geometry
In GUI applications the window's position & size are known as the window geometry. Saving and restoring the geometry of a window between executions is a useful feature in many applications. With persistent geometry users can arrange applications on their desktop for an optimal workflow and have the applications return to those positions every time they are launched.
In this tutorial, we will explore how to save and restore the geometry and state of a PyQt window using the QSettings class. With this functionality, you will be able to give your applications a usability boost.
To follow along with this tutorial, you should have prior knowledge of creating GUI apps with Python and PyQt. Additionally, having a basic understanding of using the QSettings class to manage an application's settings will be beneficial.
Understanding a Window's Geometry
PyQt defines the geometry of a window using a few properties. These properties represent a window's position on the screen and size. Here's a summary of PyQt's geometry-related properties:
| Property | Description | Access Method |
|---|---|---|
x |
Holds the x coordinate of a widget relative to its parent. If the widget is a window, x includes any window frame and is relative to the desktop. This property defaults to 0. |
x() |
y |
Holds the y coordinate of a widget relative to its parent. If the widget is a window, y includes any window frame and is relative to the desktop. This property defaults to 0. |
y() |
pos |
Holds the position of the widget within its parent widget. If the widget is a window, the position is relative to the desktop and includes any frame. | pos() |
geometry |
Holds the widget's geometry relative to its parent and excludes the window frame. | geometry() |
width |
Holds the width of the widget, excluding any window frame. | width() |
height |
Holds the height of the widget, excluding any window frame. | height() |
size |
Holds the size of the widget, excluding any window frame. | size() |
In PyQt, the QWidget class provides the access methods in the table above. Note that when your widget is a window or form, the first three methods operate on the window and its frame, while the last four methods operate on the client area, which is the window's workspace without the external frame.
Additionally, the x and y coordinates are relative to the screen of your computer. The origin of coordinates is the upper left corner of the screen, at which point both x and y are 0.
Let's create a small demo app to inspect all these properties in real time. To do this, go ahead and fire up your code editor or IDE and create a new Python file called geometry_properties.py. Then add the following code to the file and save it in your favorite working directory:
from PyQt6.QtWidgets import (
QApplication,
QLabel,
QMainWindow,
QPushButton,
QVBoxLayout,
QWidget,
)
class Window(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Window's Geometry")
self.resize(400, 200)
self.central_widget = QWidget()
self.global_layout = QVBoxLayout()
self.geometry_properties = [
"x",
"y",
"pos",
"width",
"height",
"size",
"geometry",
]
for prop in self.geometry_properties:
self.__dict__[f"{prop}_label"] = QLabel(f"{prop}:")
self.global_layout.addWidget(self.__dict__[f"{prop}_label"])
button = QPushButton("Update Geometry Properties")
button.clicked.connect(self.update_labels)
self.global_layout.addWidget(button)
self.central_widget.setLayout(self.global_layout)
self.setCentralWidget(self.central_widget)
def update_labels(self):
for prop in self.geometry_properties:
self.__dict__[f"{prop}_label"].setText(
f"{prop}: {getattr(self, prop)()}"
)
if __name__ == "__main__":
app = QApplication([])
window = Window()
window.show()
app.exec()
Wow! There's a lot of code in this file. First, we import the required classes from PyQt6.QtWidgets. Then, we create our app's main window by inheriting from QMainWindow.
In the initializer method, we set the window's title and size using setWindowTitle() and resize(), respectively. Next, we define a central widget and a layout for our main window.
We also define a list of properties. We'll use that list to add some QLabel objects. Each label will show a geometry property and its current values. The Update Geometry Properties button allows us to update the value of the window's geometry properties.
Finally, we define the update_labels() method to update the values of all the geometry properties using their corresponding access methods. That's it! Go ahead and run the app. You'll get the following window on your screen:
A Window Showing Labels for Every Geometry Property
Looking good! Now go ahead and click the Update Geometry Properties button. You'll see how all the properties get updated. Your app's window will look something like this:
A Window Showing the Current Value of Every Geometry Property
As you can see, x and y are numeric values, while pos is a QPoint object with x and y as its coordinates. These properties define the position of this window on your computer screen.
The width and height properties are also numeric values, while the size property is a QSize object defined after the current width and height.
Finally, the geometry property is a QRect object. In this case, the rectangle comprises x, y, width, and height.
Great! With this first approach to how PyQt defines a window's geometry, we're ready to continue digging into this tutorial's main topic: restoring the geometry of a window in PyQt.
Keeping an App's Geometry Settings: The QSetting Class
Users of GUI apps will generally expect the apps to remember their settings across sessions. This information is often referred to as settings or preferences. In PyQt applications, you'll manage settings and preferences using the QSettings class. This class allows you to have persistent platform-independent settings in your GUI app.
A commonly expected feature is that the app remembers the geometry of its windows, particularly the main window.
In this section, you'll learn how to save and restore the window's geometry in a PyQt application. Let's start by creating a skeleton PyQt application to kick things off. Go ahead and create a new Python file called geometry.py. Once you have the file opened in your favorite code editor or IDE, then add the following code:
from PyQt6.QtWidgets import QApplication, QMainWindow
class Window(QMainWindow):
def __init__(self):
super(Window, self).__init__()
self.setWindowTitle("Window's Geometry")
self.move(50, 50)
self.resize(400, 200)
if __name__ == "__main__":
app = QApplication([])
window = Window()
window.show()
app.exec()
This code creates a minimal PyQt app with an empty main window. The window will appear at 50 pixels from the upper left corner of your computer screen and have a size of 400 by 200 pixels.
We'll use the above code as a starting point to make the app remember and restore the main window's geometry across sessions.
First, we need to have a QSettings instance in our app. Therefore, you have to import QSettings from PyQt6.QtCore and instantiate it as in the code below:
from PyQt6.QtCore import QSettings
from PyQt6.QtWidgets import QApplication, QMainWindow
class Window(QMainWindow):
def __init__(self):
super(Window, self).__init__()
self.setWindowTitle("Window's Geometry")
self.move(50, 50)
self.resize(400, 200)
self.settings = QSettings("PyhonGUIs", "GeometryApp")
When instantiating QSettings, we must provide the name of our company or organization and the name of our application. We use "PyhonGUIs" as the organization and "GeometryApp" as the application name.
Now that we have a QSettings instance, we should implement two methods. The first method should allow you to save the app's settings and preferences. The second method should help you read and load the settings. In this tutorial, we'll call these methods write_settings() and read_settings(), respectively:
class Window(QMainWindow):
# ...
def write_settings(self):
# Write settings here...
def read_settings(self):
# Read settings here...
Note that our methods don't do anything yet. You'll write them in a moment. For now, they're just placeholders.
The write_settings() method must be called when the user closes or terminates the application. This way, you guarantee that all the modified settings get saved for the next session. So, the appropriate place to call write_settings() is from the main window's close event handler.
Let's override the closeEvent() method as in the code below:
class Window(QMainWindow):
# ...
def closeEvent(self, event):
self.write_settings()
super().closeEvent(event)
event.accept()
In this code, we override the closeEvent() handler method. The first line calls write_settings() to ensure that we save the current state of our app's settings. Then, we call the closeEvent() of our superclass QMainWindow to ensure the app's window closes correctly. Finally, we accept the current event to signal that it's been processed.
Now, where should we call read_settings() from? In this example, the best place for calling the read_settings() method is .__init__(). Go ahead and add the following line of code to the end of your __init__() method:
class Window(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Window's Geometry")
self.move(50, 50)
self.resize(400, 200)
self.settings = QSettings("PythonGUIs", "GeometryApp")
self.read_settings()
By calling the read_settings() method from __init__(), we ensure that our app will read and load its settings every time the main window gets created and initialized.
Great! We're on the way to getting our application to remember and restore its window's geometry. First, you need to know that you have at least two ways to restore the geometry of a window in PyQt:
- Using the
posandsizeproperties - Using the
geometryproperty
In both cases, you need to save the current value of the selected property and load the saved value when the application starts. To kick things off, let's start with the first approach.
Restoring the Window's Geometry With pos and size
In this section, we'll first write the required code to save the current value of pos and size by taking advantage of our QSettings object. The code snippet below shows the changes that you need to make on your write_settings() method to get this done:
class Window(QMainWindow):
# ...
def write_settings(self):
self.settings.setValue("pos", self.pos())
self.settings.setValue("size", self.size())
This code is straightforward. We call the setValue() method on our setting object to set the "pos" and "size" configuration parameters. Note that we get the current value of each property using the corresponding access method.
With the write_settings() method updated, we're now ready to read and load the geometry properties from our app's settings. Go ahead and update the read_settings() method as in the code below:
class Window(QMainWindow):
# ...
def read_settings(self):
self.move(self.settings.value("pos", defaultValue=QPoint(50, 50)))
self.resize(self.settings.value("size", defaultValue=QSize(400, 200)))
The first line inside read_settings() retrieves the value of the "pos" setting parameter. If there's no saved value for this parameter, then we use QPoint(50, 50) as the default value. Next, the move() method moves the app's window to the resulting position on your screen.
The second line in read_settings() does something similar to the first one. It retrieves the current value of the "size" parameter and resizes the window accordingly.
Great! It's time for a test! Go ahead and run your application. Then, move the app's window to another position on your screen and resize the window as desired. Finally, close the app's window to terminate the current session. When you run the app again, the window will appear in the same position. It will also have the same size.
If you have any issues completing and running the example app, then you can grab the entire code below:
from PyQt6.QtCore import QPoint, QSettings, QSize
from PyQt6.QtWidgets import QApplication, QMainWindow
class Window(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Window's Geometry")
self.move(50, 50)
self.resize(400, 200)
self.settings = QSettings("PyhonGUIs", "GeometryApp")
self.read_settings()
def write_settings(self):
self.settings.setValue("pos", self.pos())
self.settings.setValue("size", self.size())
def read_settings(self):
self.move(self.settings.value("pos", defaultValue=QPoint(50, 50)))
self.resize(self.settings.value("size", defaultValue=QSize(400, 200)))
def closeEvent(self, event):
self.write_settings()
super().closeEvent(event)
event.accept()
if __name__ == "__main__":
app = QApplication([])
window = Window()
window.show()
app.exec()
Now you know how to restore the geometry of a window in a PyQt app using the pos and size properties. It's time to change gears and learn how to do this using the geometry property.
Restoring the Window's Geometry With geometry
We can also restore the geometry of a PyQt window using its geometry property and the restoreGeometry() method. To do that, we first need to save the current geometry using our QSettings object.
Go ahead and create a new Python file in your working directory. Once you have the file in place, add the following code to it:
from PyQt6.QtCore import QByteArray, QSettings
from PyQt6.QtWidgets import QApplication, QMainWindow
class Window(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Window's Geometry")
self.move(50, 50)
self.resize(400, 200)
self.settings = QSettings("PythonGUIs", "GeometryApp")
self.read_settings()
def write_settings(self):
self.settings.setValue("geometry", self.saveGeometry())
def read_settings(self):
self.restoreGeometry(self.settings.value("geometry", QByteArray()))
def closeEvent(self, event):
self.write_settings()
super().closeEvent(event)
event.accept()
if __name__ == "__main__":
app = QApplication([])
window = Window()
window.show()
app.exec()
There are only two changes in this code compared to the code from the previous section. We've modified the implementation of the write_settings() and read_settings() methods.
In write_settings(), we use the setValue() to save the current geometry of our app's window. The saveGeometry() allows us to access and save the current window's geometry. In read_settings(), we call the value() method to retrieve the saved geometry value. Then, we use restoreGeometry() to restore the geometry of our window.
Again, you can run the application consecutive times and change the position and size of its main window to ensure your code works correctly.
Restoring the Window's Geometry and State
If your app's window has toolbars and dock widgets, then you want to restore their state on the parent window. To do that, you can use the restoreState() method. To illustrate this, let's reuse the code from the previous section.
Update the content of write_settings() and read_settings() as follows:
class Window(QMainWindow):
# ...
def write_settings(self):
self.settings.setValue("geometry", self.saveGeometry())
self.settings.setValue("windowState", self.saveState())
def read_settings(self):
self.restoreGeometry(self.settings.value("geometry", QByteArray()))
self.restoreState(self.settings.value("windowState", QByteArray()))
In write_settings(), we add a new setting value called "windowState". To keep this setting, we use the saveState() method, which saves the current state of this window's toolbars and dock widgets. Meanwhile, in read_settings(), we restore the window's state by calling the value() method, as usual, to get the state value back from our QSettings object. Finally, we use restoreState() to restore the state of toolbars and dock widgets.
Now, to make sure that this new code works as expected, let's add a sample toolbar and a dock window to our app's main window. Go ahead and add the following methods right after the __init__() method:
from PyQt6.QtCore import QByteArray, QSettings, Qt
from PyQt6.QtWidgets import QApplication, QDockWidget, QMainWindow
class Window(QMainWindow):
def __init__(self):
super().__init__()
self.setWindowTitle("Window's State")
self.resize(400, 200)
self.settings = QSettings("PythonGUIs", "GeometryApp")
self.create_toolbar()
self.create_dock()
self.read_settings()
def create_toolbar(self):
toolbar = self.addToolBar("Toolbar")
toolbar.addAction("One")
toolbar.addAction("Two")
toolbar.addAction("Three")
def create_dock(self):
dock = QDockWidget("Dock", self)
dock.setAllowedAreas(
Qt.DockWidgetArea.LeftDockWidgetArea
| Qt.DockWidgetArea.RightDockWidgetArea
)
self.addDockWidget(Qt.DockWidgetArea.LeftDockWidgetArea, dock)
# ...
In this new update, we first import the Qt namespace from PyQt6.QtCore and QDockWidget from PyQt6.QtWidgets. Then we call the two new methods from __init__() to create the toolbar and dock widget at initialization time.
In the create_toolbar() method, we create a sample toolbar with three sample buttons. This toolbar will show at the top of our app's window by default.
Next, we create a dock widget in create_dock(). This widget will occupy the rest of our window's working area.
That's it! You're now ready to give your app a try. You'll see a window like the following:
A Window Showing a Sample Toolbar and a Dock Widget
Play with the toolbar and the dock widget. Move them around. Then close the app's window and run the app again. Your toolbar and dock widget will show in the last position you left them.
Conclusion
Through this tutorial, you have learned how to restore the geometry and state of a window in PyQt applications using the QSettings class. By utilizing the pos, size, geometry, and state properties, you can give your users the convenience of persistent position and size on your app's windows.
With this knowledge, you can enhance the usability of your PyQt applications, making your app more intuitive and user-friendly.
November 13, 2023 06:00 AM UTC
November 12, 2023
Ned Batchelder
Debug helpers in coverage.py
Debugging in the coverage.py code can be difficult, so I’ve written a lot of helper code to support debugging. I just added some more.
These days I’m working on adding support in coverage.py for sys.monitoring. This is a new feature in Python 3.12 that completely changes how Python reports information about program execution. It’s a big change to coverage.py and it’s a new feature in Python, so while working on it I’ve been confused a lot.
Some of the confusion has been about how sys.monitoring works, and some was eventually diagnosed as a genuine bug involving sys.monitoring. But all of it started as straight-up “WTF!?” confusion. My preferred debugging approach at times like this is to log a lot of detailed information and then pore over it.
For something like sys.monitoring where Python is calling my functions and passing code objects, it’s useful to see stack traces for each function call. And because I’m writing large log files it’s useful to be able to tailor the information to the exact details I need so I don’t go cross-eyed trying to find the clues I’m looking for.
I already had a function for producing compact log-friendly stack traces. For this work, I added more options to it. Now my short_stack function produces one line per frame, with options for which frames to include (it can omit the 20 or so frames of pytest before my own code is involved); whether to show the full file name, or an abbreviated one; and whether to include the id of the frame object:
_hookexec : 0x10f23c120 syspath:/pluggy/_manager.py:115
_multicall : 0x10f308bc0 syspath:/pluggy/_callers.py:77
pytest_pyfunc_call : 0x10f356340 syspath:/_pytest/python.py:194
test_thread_safe_save_data : 0x10e056480 syspath:/tests/test_concurrency.py:674
__enter__ : 0x10f1a7e20 syspath:/contextlib.py:137
collect : 0x10f1a7d70 cov:/control.py:669
start : 0x10f1a7690 cov:/control.py:648
start : 0x10f650300 cov:/collector.py:353
_start_tracer : 0x10f5c4e80 cov:/collector.py:296
__init__ : 0x10e391ee0 cov:/pep669_tracer.py:155
log : 0x10f587670 cov:/pep669_tracer.py:55
short_stack : 0x10f5c5180 cov:/pep669_tracer.py:93
Once I had these options implemented in a quick way and they proved useful, I moved the code into coverage.py’s debug.py file and added tests for the new behaviors. This took a while, but in the end I think it was worth it. I don’t need to use these tools often, but when I do, I’m deep in a bad problem and I want to have a well-sharpened tool at hand.
Writing debug code is like writing tests: it’s just there to support you in development, it’s not part of “the product.” But it’s really helpful. You should do it. It could be something as simple as a custom __repr__ method for your classes to show just the information you need.
It’s especially helpful when your code deals in specialized domains or abstractions. Your debugging code can speak the same language. Zellij was a small side project of mine to draw geometric art like this:

When the over-under weaving code wasn’t working right, I added some tooling to get debug output like this:

I don’t remember what the different colors, patterns, and symbols meant, but at the time it was very helpful for diagnosing what was wrong with the code.



Author:
NVIDIA
François Goupil