Planet Python
Last update: September 27, 2023 09:41 PM UTC
September 27, 2023
Zero to Mastery
Python Monthly Newsletter đ»đ
46th issue of Andrei Neagoie's must-read monthly Python Newsletter: Python oddities, cProfile trick, and GIL removal. All this and more. Read the full newsletter to get up-to-date with everything you need to know from last month.
September 27, 2023 07:41 PM UTC
Real Python
Python 3.12 Preview: Static Typing Improvements
Pythonâs support for static typing gradually improves with each new release of Python. The core features were in place in Python 3.5. Since then, thereâve been many tweaks and improvements to the type hinting system. This evolution continues in Python 3.12, which, in particular, simplifies the typing of generics.
In this tutorial, youâll:
- Use type variables in Python to annotate generic classes and functions
- Explore the new syntax for type hinting type variables
- Model inheritance with the new
@override
decorator - Annotate
**kwargs
more precisely with typed dictionaries
This wonât be an introduction to using type hints in Python. If you want to review the background that youâll need for this tutorial, then have a look at Python Type Checking.
Youâll find many other new features, improvements, and optimizations in Python 3.12. The most relevant ones include the following:
- Ever better error messages
- Support for the Linux
perf
profiler - More powerful f-strings
- Support for subinterpreters
Go ahead and check out whatâs new in the changelog for more details on these and other features.
Free Bonus: Click here to download your sample code for a sneak peek at Python 3.12, coming in October 2023.
Recap Type Variable Syntax Before Python 3.12
Type variables have been a part of Pythonâs static typing system since its introduction in Python 3.5. PEP 484 introduced type hints to the language, and type variables and generics play an important role in that document. In this section, youâll dive into how youâve used type variables so far. Thisâll give you the necessary background to appreciate the new syntax that youâll learn about later.
A generic type is a type parametrized by another type. Typical examples include a list of integers and a tuple consisting of a float, a string, and another float. You use square brackets to parametrize generics in Python. You can write the two examples above as list[int]
and tuple[float, str, float]
, respectively.
In addition to using built-in generic types, you can define your own generic classes. In the following example, you implement a generic queue based on deque
in the standard library:
# generic_queue.py
from collections import deque
from typing import Generic, TypeVar
T = TypeVar("T")
class Queue(Generic[T]):
def __init__(self) -> None:
self.elements: deque[T] = deque()
def push(self, element: T) -> None:
self.elements.append(element)
def pop(self) -> T:
return self.elements.popleft()
This is a first-in, first-out (FIFO) queue. It represents the kind of lines that youâll find yourself in at stores, where the first person into the queue is also the first one to leave the queue. Before looking closer at the code, and in particular at the type hints, play a little with the class:
>>> from generic_queue import Queue
>>> queue = Queue[int]()
>>> queue.push(3)
>>> queue.push(12)
>>> queue.elements
deque([3, 12])
>>> queue.pop()
3
You can use .push()
to add elements to the queue and .pop()
to remove elements from the queue. Note that when you called the Queue()
constructor, you included [int]
. This isnât necessary, but it tells the type checker that you expect the queue to only contain integer elements.
Normally, using square brackets like you did in Queue[int]()
isnât valid Python syntax. You can use square brackets with Queue
, however, because you defined Queue
as a generic class by inheriting from Generic
. How does the rest of your class use this int
parameter?
To answer that question, you need to look at T
, which is a type variable. A type variable is a special variable that can stand in for any type. However, during type checking, the type of T
will be fixed.
In your Queue[int]
example, T
will be int
in all annotations on the class. You could also instantiate Queue[str]
, where T
would represent str
everywhere. This would then be a queue with string elements.
If you look back at the source code of Queue
, then youâll see that .pop()
returns an object of type T
. In your special integer queue, the static type checker will make sure that .pop()
returns an integer.
Speaking of static type checkers, how do you actually check the types in your code? Type annotations are mostly ignored during runtime. Instead, you need to install a separate type checker and run it explicitly on your code.
Note: In this tutorial, youâll use Pyright as your type checker. You can install Pyright from PyPI using pip
:
$ python -m pip install pyright
If youâre using Visual Studio Code, then you can use Pyright inside the editor through the Pylance extension. You may need to activate it by setting the Python âș Analysis: Type Checking Mode option in your settings.
If you install Pyright, then you can use it to type check your code:
$ pyright generic_queue.py
0 errors, 0 warnings, 0 informations
To see an example of the kinds of errors that Pyright can detect, add the following lines to your generic queue implementation:
Read the full article at https://realpython.com/python312-typing/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
September 27, 2023 02:00 PM UTC
Python Software Foundation
Python Developers Survey Numbers for 2022!
We are excited to announce the results of the sixth official annual Python Developers Survey. This work is done each year as a collaborative effort between the Python Software Foundation and JetBrains. Late last year, more than 23,000 Python developers and enthusiasts from almost 200 countries/regions participated in the survey to reveal the current state of the language and the ecosystem around it. (Spoiler alert: Many people are using Python, and 51% are using it for both work AND personal projects.)
https://lp.jetbrains.com/python-developers-survey-2022/
We know the whole Python community finds this work useful. From Luciana Abud, product manager for Visual Studio Code, âOur teams at Microsoft are truly grateful to the Python Software Foundation and JetBrains for orchestrating the Python Developers Survey! The insights we gain allows us to take a data-driven approach to help with prioritizing feature development, addressing pain points, enhancing usability and anticipating future needs. This survey is invaluable in shaping our approach and continuously improving the Python development experience within the VS Code ecosystem!â
Weâd love to hear how you use these numbers, so please share your thoughts on social media, mentioning @jetbrains and @ThePSF with the #pythondevsurvey hashtag. We are also open to any suggestions and feedback related to this survey which could help us run an even better one next time.
September 27, 2023 11:44 AM UTC
PyCharm
PyCharm 2023.3 Early Access Program Is Open!
UI/UX Enhancements, Support for PEP 647, and More
The Early Access Program for PyCharm 2023.3 kicks off today, offering you a sneak peek of the exciting new features and improvements we expect to include in the next major release.
If youâre not familiar with how the EAP works, please read this blog post for an introduction to the program and an explanation of why your participation is invaluable.
We invite you to join us over the next few weeks, take a closer look at the latest additions to PyCharm, and share your feedback on the new features.

You can download the build from our website, get it from the free Toolbox App, or update to it using snaps if youâre an Ubuntu user.
Read on to explore the new features and enhancements that you can test in this version.
User experience
Option to hide the main toolbar in the default viewing mode
In response to your feedback about the new UI, weâve implemented an option to hide the main toolbar when using the IDEâs default viewing mode, just like in the old UI. To declutter your workspace and remove the toolbar, select View | Appearance and uncheck the Toolbar option.

Default tool window layout option
With the release of PyCharm 2023.1, we introduced the ability to save multiple tool window layouts and switch between them, enhancing the customizability of your workspace. In the first PyCharm 2023.3 EAP build, weâre expanding this functionality by introducing the Default layout option, which provides a quick way to revert your workspaceâs appearance to its default state. This layout is not customizable and can be accessed through Window | Layouts.

New product icon for macOS
With the launch of the PyCharm 2023.3 EAP, we have redesigned the PyCharm icon for macOS to align it with the standard style guidelines of the operating system.
Django REST Framework
Support for viewset
PyCharm 2023.3 will help you define endpoints when working with the Django REST Framework. The IDE will support code completion, navigation, and rename refactoring for the methods used in the viewsets.
Try this feature and share your feedback with us!
Editor
Support for type guards [PEP 647]
PyCharm 2023.3 will support PEP 647. PEP 647 introduced a way to treat custom functions as “type guards”, which, when used in a conditional statement, leads to the narrowing of their argument types. Think of the built-in functions isinstance
and issubclass
, which our type-checker already recognizes. Now, the user-defined function returning typing.TypeGuard
has the same effect on type inference in PyCharm.

Move code elements in the Python files
In PyCharm 2023.3, you can move code elements left or right in the Python files with Option + Shift + Cmd + Left/Right on macOS (Alt + Shift + Ctrl + Left/Right on Windows/Linux).

Python Console
Option to switch between single and double quotes when copying string values from the Variable View
There is a new option to put double quotes instead of single quotes around a string value copied from the Variable View in the Python or Debug Console.
To switch between single and double quotes, go to the Other option (three vertical dots icon) in the main Debug menu bar, choose Debugger Settings | Variable Quoting Policy and pick your preferred option.

Navigate between the commands in the Python Console
In PyCharm 2023.3, you can navigate between multi-line commands in the Python Console using Cmd + Up / Cmd + Down shortcuts on macOS (Ctrl + Up / Ctrl + Down on Windows / Linux). When you move to the previously executed command, a caret is set to the end of the first line. When you get to the most recently executed multi-line command from your history, a caret is set to the end of the command.

Static code completion in the Python Console
In PyCharm 2023.2, we added an option to use static code completion in the Python Console. In PyCharm 2023.3, it will be enabled by default. If you would like to switch to runtime code completion, go to Settings | Build, Execution, Deployment | Console and choose the option in the Code completion drop-down menu.
Notable bug fix: execute code with root privileges via sudo
We fixed a regression that prevented users from executing code via an SSH connection in PyCharm with root privileges via sudo. [PY-52690]
These are the most important updates for this week. For the full list of changes in this EAP build, read the release notes.
Weâre dedicated to giving you the best possible experience, and your feedback is vital. If you find any bugs, please report them via our issue tracker. And if you have any questions or comments, feel free to share them in the comments below or get in touch with us on X (formerly Twitter).
September 27, 2023 09:54 AM UTC
Anarcat
How big is Debian?
Now this was quite a tease! For those who haven't seen it, I encourage you to check it out, it has a nice photo of a Debian t-shirt I did not know about, to quote the Fine Article:
Today, when going through a box of old T-shirts, I found the shirt I was looking for to bring to the occasion: [...]
For the benefit of people who read this using a non-image-displaying browser or RSS client, they are respectively:
10 years 100 countries 1000 maintainers 10000 packages
and
1 project 10 architectures 100 countries 1000 maintainers 10000 packages 100000 bugs fixed 1000000 installations 10000000 users 100000000 lines of code
20 years ago we celebrated eating grilled meat at J0rd1âs house. This year, we had vegan tostadas in the menu. And maybe we are no longer that young, but we are still very proud and happy of our project!
Now⊠How would numbers line up today for Debian, 20 years later? Have we managed to get the âbugs fixedâ line increase by a factor of 10? Quite probably, the lines of code we also have, and I can only guess the number of users and installations, which was already just a wild guess back then, might have multiplied by over 10, at least if we count indirect users and installs as wellâŠ
Now I don't know about you, but I really expected someone to come up with an answer to this, directly on Debian Planet! I have patiently waited for such an answer but enough is enough, I'm a Debian member, surely I can cull all of this together. So, low and behold, here are the actual numbers from 2023!
- 1 project: unchanged, although we could count 129 derivatives in the current census
~10 architectures: number almost unchanged, but the actual architectures are of course different (woody released with
i386
,m68k
, Alpha, SPARC, PowerPC, ARM, IA-64,hppa
,mips
,s390
; while bookworm released with actually 9 supported architectures instead of 10:i386
,amd64
,aarch64
,armel
,armhf
,mipsel
,mips64el
,ppc64el
,s390x
)~100 countries: actually 63 now, but I suspect we were generously rounding up last time as well (extracted with
ldapsearch -b ou=users,dc=debian,dc=org -D uid=anarcat,ou=users,dc=debian,dc=org -ZZ -vLxW '(c=*)' c | grep ^c: | sort | uniq -c | sort -n | wc -l
oncoccia
)~1000 maintainers: amazingly, almost unchanged (according to the last DPL vote, there were 831 DDs in 2003 and 996 in the last vote)
35000 packages: that number obviously increased quite a bit, but according to sources.debian.org, woody released with 5580 source packages and bookworm with 34782 source packages and according to UDD, there are actually 200k+ binary packages (
SELECT COUNT(DISTINCT package) FROM all_packages;
=> 211151)1 000 000+ (OVER ONE MILLION!) bugs fixed! now that number grew by a whole order of magnitude, incredibly (934809 done, 16 fixed, 7595 forwarded, 82492 pending, 938 pending-fixed, according to UDD again,
SELECT COUNT(id),status FROM all_bugs GROUP BY status;
)~1 000 000 installations (?): that one is hard to call. popcon has 225419 recorded installs, but it is likely an underestimate - hard to count
how many users? even harder, we were claiming ten million users then, how many now? how can we even begin to tell, with Debian running on the space station?
1 000 000 000+ (OVER ONE BILLION!) lines of code: that, interestingly, has also grown by an order of magnitude, from 100M to 1B lines of code, again according to sources.debian.org, woody shipped with 143M lines of codes and bookworm with 1.3 billion lines of code
So it doesn't line up as nicely, but it looks something like this:
1 project
10 architectures
30 years
100 countries (actually 63, but we'd like to have yours!)
1000 maintainers (yep, still there!)
35000 packages
211000 *binary* packages
1000000 bugs fixed
1000000000 lines of code
uncounted installations and users, we don't track you
So maybe the the more accurate, rounding to the nearest logarithm, would look something like:
1 project
10 architectures
100 countries (actually 63, but we'd like to have yours!)
1000 maintainers (yep, still there!)
100000 packages
1000000 bugs fixed
1000000000 lines of code
uncounted installations and users, we don't track you
I really like how the "packages" and "bugs fixed" still have an order of magnitude between them there, but that the "bugs fixed" vs "lines of code" have an extra order of magnitude, that is we have fixed ten times less bugs per line of code since we last did this count, 20 years ago.
Also, I am tempted to put 100 years
in there, but that would be
rounding up too much. Let's give it another 30 years first.
Hopefully, some real scientist is going to balk at this crude methodology and come up with some more interesting numbers for the next t-shirt. Otherwise I'm available for bar mitzvahs and children parties.
September 27, 2023 02:23 AM UTC
September 26, 2023
PyCoderâs Weekly
Issue #596 (Sept. 26, 2023)
#596 â SEPTEMBER 26, 2023
View in Browser »
Design and Guidance: Object-Oriented Programming in Python
In this video course, you’ll learn about the SOLID principles, which are five well-established standards for improving your object-oriented design in Python. By applying these principles, you can create object-oriented code that is more maintainable, extensible, scalable, and testable.
REAL PYTHON course
Learning About Code Metrics in Python With Radon
Radon is a code metrics tool. This article introduces you to it and teaches you how you can improve your code based on its measurements.
MIKE DRISCOLL
You Write Great Python Code but How Do You Know it’s Secure Code
If you’re not a security expert, consider Semgrep. Trusted by Slack, Gitlab, Snowflake, and thousands of engineers, it acts like a security spellchecker for your code. Simply point Semgrep to your code; it identifies vulnerabilities and even checks code dependencies and help you ship secure code â
SEMGREP sponsor
Speeding Up Your Code When Multiple Cores Aren’t an Option
Parallelism isnât the only answer: often you can optimize low-level code to get significant performance improvements.
ITAMAR TURNER-TRAURING
Articles & Tutorials
How to Catch Multiple Exceptions in Python
In this how-to tutorial, you’ll learn different ways of catching multiple Python exceptions. You’ll review the standard way of using a tuple in the except clause, but also expand your knowledge by exploring some other techniques, such as suppressing exceptions and using exception groups.
REAL PYTHON
78% MNIST Accuracy Using GZIP in Under 10 Lines of Code
MNIST is a collection of hand-written digits that is commonly used to play with classification algorithms. It turns out that some compression mechanisms can double as classification tools. This article covers a bit of why with the added code-golf goal of a small amount of code.
JAKOBS.DEV
Donât Get Caught by IDOR Vulnerabilities
Are insecure direct object reference (IDOR) threatening your applications? Learn about the types of IDOR vulnerabilities in your Python applications, how to recognize their patterns, and protect your system with Snyk â
SNYK.IO sponsor
Bypassing the GIL for Parallel Processing in Python
In this tutorial, you’ll take a deep dive into parallel processing in Python. You’ll learn about a few traditional and several novel ways of sidestepping the global interpreter lock (GIL) to achieve genuine shared-memory parallelism of your CPU-bound tasks.
REAL PYTHON
Creating a Great Python DevX
This article talks about the different tools you commonly come across as part of the Python development experience. It gives an overview of black, nox, ruff, Mypy, and more, covering why you should use them when you code your own projects.
SCOTT HOUSEMAN
Why Are There So Many Python Dataframes?
Ever wonder why there are so many ways libraries that have Dataframes in Python? This article talks about the different perspectives of the popular toolkits and why they are what they are.
MAHESH VASHISHTHA
The Protocol Class
typing.Protocol
enables type checking in a Java-esque interface like mechanism. Using it, you can declare that a duck-typed class conform to a specific protocol. Read on for details.
PEPIJN BAKKER
What Does if __name__ == "__main__"
Mean in Python?
In this video course, you’ll learn all about Python’s name-main idiom. You’ll learn what it does in Python, how it works, when to use it, when to avoid it, and how to refer to it.
REAL PYTHON course
Why & How Python Uses Bloom Filters in String Processing
Dive into Pythonâs clever use of Bloom filters in string APIs for speedier performance. Find out how CPython’s unique implementation makes it more efficient.
ABHINAV UPADHYAY
Simulate the Monty Hall Problem in Python
Write a Python simulation to solve this classic probability puzzle that has stumped mathematicians and Nobel Prize winners!
DATASCHOOL.IO âą Shared by Kevin Markham
Death by a Thousand Microservices
The software industry is learning once again that complexity kills and trending back towards monoliths and larger services.
ANDREI TARANCHENKO
How to Test Jupyter Notebooks With Pytest and Nbmake
Tutorial on how to use the pytest plugin nbmake to automate end-to-end testing of notebooks.
SEMAPHORECI.COM âą Shared by Larisa Ioana
Projects & Code
Clientele: Loveable Python API Clients From OpenAPI Schemas
GITHUB.COM/PHALT âą Shared by Paul Hallett
reader: A Python Feed Reader Library
GITHUB.COM/LEMON24 âą Shared by Adrian
Events
Weekly Real Python Office Hours Q&A (Virtual)
September 27, 2023
REALPYTHON.COM
SPb Python Drinkup
September 28, 2023
MEETUP.COM
PyCon India 2023
September 29 to October 3, 2023
PYCON.ORG
PythOnRio Meetup
September 30, 2023
PYTHON.ORG.BR
PyConZA 2023
October 5 to October 7, 2023
PYCON.ORG
PyCon ES Canarias 2023
October 6 to October 9, 2023
PYCON.ORG
Django Day Copenhagen 2023
October 6 to October 7, 2023
DJANGODAY.DK
DjangoCongress JP 2023
October 7 to October 8, 2023
DJANGOCONGRESS.JP
Happy Pythoning!
This was PyCoder’s Weekly Issue #596.
View in Browser »
[ Subscribe to đ PyCoder’s Weekly đ â Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
September 26, 2023 07:30 PM UTC
TechBeamers Python
Python String Splitting: split(), rsplit(), regex
String manipulation is a fundamental skill in Python, and understanding how to split strings is a crucial aspect of it. In this comprehensive guide, we’ll explore various methods and techniques for splitting strings, including the split() and rsplit() functions, regular expressions (regex), and advanced splitting techniques. By the end of this tutorial, you’ll have a [...]
The post Python String Splitting: split(), rsplit(), regex appeared first on TechBeamers.
September 26, 2023 05:46 PM UTC
Real Python
Python Basics Exercises: Conditional Logic and Control Flow
In Python Basics: Conditional Logic and Control Flow, you learned that much of the Python code you’ll write is unconditional. That is, the code doesn’t make any choices. Every line of code is executed in the order that it’s written or in the order that functions are called, with possible repetitions inside loops.
In this course, you’ll revisit how to use conditional logic to write programs that perform different actions based on different conditions. Paired with functions and loops, conditional logic allows you to write complex programs that can handle many different situations.
In this video course, you’ll use the following:
- Boolean comparators:
==
,!=
,<
,>
,<=
,>=
- Logical operators:
and
,or
,not
- Conditional logic:
if
…elif
…else
- Exception handling:
try
…except
- Loops:
for
,while
- Control flow statements:
break
,continue
Ultimately, you’ll bring all of your knowledge together to build a text-based role-playing game. Along the way, you’ll also get some insight into how to tackle coding challenges in general, which can be a great way to level up as a developer.
This video course is part of the Python Basics series, which accompanies Python Basics: A Practical Introduction to Python 3. You can also check out the other Python Basics courses.
Note that you’ll be using IDLE to interact with Python throughout this course.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
September 26, 2023 02:00 PM UTC
Python Bytes
#354 Python 3.12 is Coming!
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/ptmcg/logmerger"><strong>logmerger</strong></a></li> <li><a href="https://mastodon.social/@hugovk/111091573987175428"><strong>The third and final Python 3.12 RC is out now</strong></a></li> <li><a href="https://jamesg.blog/2023/08/26/python-dictionary-dispatch/"><strong>The Python dictionary dispatch pattern</strong></a></li> <li><a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9"><strong>Visualizing the CPython Release Process</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=1zf29sQVGow' style='font-weight: bold;'>Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by us! Support our work through:</p> <ul> <li>Our <a href="https://training.talkpython.fm/"><strong>courses at Talk Python Training</strong></a></li> <li><a href="https://pythonpeople.fm"><strong>Python People</strong></a> Podcast</li> <li><a href="https://www.patreon.com/pythonbytes"><strong>Patreon Supporters</strong></a></li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.</p> <p><strong>Brian #1:</strong> <a href="https://github.com/ptmcg/logmerger"><strong>logmerger</strong></a> </p> <ul> <li>Paul McGuire</li> <li><code>logmerger</code> is a TUI for viewing a merged display of multiple log files, merged by timestamp.</li> <li>Built on textual</li> <li>Awesome flags: <ul> <li><code>--output -</code> <br /> <ul> <li>to send the merged logs to stdout</li> </ul></li> <li><code>--start START</code> and <code>--end END</code> <ul> <li>start and end time to select time window for merging logs</li> </ul></li> </ul></li> <li>Caveats: <ul> <li>new. no pip install yet. so clone the code or download</li> <li>perhaps I jumped the gun on covering this, but itâs cool</li> </ul></li> </ul> <p><strong>Michael #2:</strong> <a href="https://mastodon.social/@hugovk/111091573987175428"><strong>The third and final Python 3.12 RC is out now</strong></a></p> <ul> <li>Get your final bugs fixed before the full release</li> <li><strong>Call to action</strong>: We strongly encourage maintainers of third-party Python projects to prepare their projects for 3.12 compatibilities during this phase</li> <li><a href="https://dev.to/hugovk/help-test-python-312-beta-1508">How to test</a>.</li> <li><a href="https://discuss.python.org/t/python-3-12-0rc3-released-honestly-the-final-release-candidate-i-swear/34093?u=hugovk">Discussion on the issue</a>.</li> <li>Count down until October 2nd, 2023.</li> </ul> <p><strong>Brian #3:</strong> <a href="https://jamesg.blog/2023/08/26/python-dictionary-dispatch/"><strong>The Python dictionary dispatch pattern</strong></a></p> <ul> <li>I kinda love (and hate) jump tables in C</li> <li>We donât talk about dictionary dispatch much in Python, so this is nice, if not dangerous.</li> <li>Short story: you can store lambdas or functions in dictionaries, then look them up and call them at the same time.</li> <li>Also, I gotta shout out to the first blogroll Iâve seen in a very long time. <ul> <li>Should we bring back blogrolls?</li> </ul></li> </ul> <p><strong>Michael #4:</strong> <a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9"><strong>Visualizing the CPython Release Process</strong></a></p> <ul> <li>by Seth Larson</li> <li>Hereâs the deal (you should <a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9">see the image in the article</a> đ ) <ol> <li>Freeze the <a href="https://github.com/python/cpython">python/cpython</a> release branch. This is done using GitHub Branch Protections.</li> <li>Update the Release Manager's fork of <a href="https://github.com/python/cpython">python/cpython</a>.</li> <li>Run Python release tools (release-tool, blurb, sphinx, etc).</li> <li>Push diffs and signed tag to Release Manager's fork.</li> <li>Git tag is made available to experts for Windows and macOS binary installers.</li> <li>Source tarballs, Windows, and macOS binary installers built and tested concurrently. <ul> <li>6a: Release manager builds the <code>tgz</code> and <code>tar.xz</code> source files for the Python release. This includes building the updates documentation.</li> <li>6b: Windows expert starts the <a href="https://github.com/python/release-tools/tree/master/windows-release">Azure Pipelines</a> configured to build Python.</li> <li>6c: macOS Expert builds the macOS installers.</li> </ul></li> <li>All artifacts (source and binary) are tested on their platforms.</li> <li>Release manager signs all artifacts using Sigstore and GPG.</li> <li>All artifacts are made available on python.org.</li> <li>After artifacts are published to python.org, the git commit and tag from the Release Manager's fork is pushed to the release branch.</li> </ol></li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://testandcode.teachable.com/p/the-complete-pytest-course"><strong>The Complete pytest Course</strong></a>, <a href="https://testandcode.teachable.com/p/pytest-working-with-projectshttps://testandcode.teachable.com/p/pytest-working-with-projects"><strong>part 2, Ch 7 Testing Strategy</strong></a> went up this weekend. <ul> <li>Only 9 more chapters to go</li> </ul></li> <li><a href="https://testandcode.com/207"><strong>âTest & Codeâ â âPython Testâ</strong></a> <ul> <li>Full version: âThe Python Test Podcastâ â âTest & Codeâ â âPython Testâ</li> <li>Also: âPython (Bytes | People | Test)â </li> </ul></li> </ul> <p>Michael:</p> <ul> <li>If youâre at <a href="https://pybay.com">PyBay</a>, come say âhiâ</li> <li><a href="https://www.youtube.com/playlist?list=PL8uoeex94UhFcwvAfWHybD7SfNgIUBRo-">EuroPython 2023 Videos up</a></li> <li><a href="https://training.talkpython.fm/courses/htmx-django-modern-python-web-apps-hold-the-javascript">Django + HTMX</a> has a few days of early-bird discount left</li> </ul> <p><strong>Joke:</strong> <a href="https://www.reddit.com/r/programminghumor/comments/15tnhhs/so_true/"><strong>Are you sleeping?</strong></a></p>
September 26, 2023 08:00 AM UTC
Test and Code
207: Welcome to "Python Test", pytest course, pytest-repeat and pytest-flakefinder
- Podcast name: "Test & Code" -> "Python Test"
- Python Bytes Podcast
- Python People Podcast
-
Python Test Podcast <- you are here
- which is still, at least for now, at testandcode.com
- New course: "The Complete pytest Course"
-
pytest-repeat, which I'm starting to contribute to
- Give `--repeat-scope` a try. You can use it to change from repeating every test to repeating the session, module, or class.
- pytest-flakefinder, which is an alternative to pytest-repeat
- pytest-check is completely unrelated, but mentioned in the show
September 26, 2023 01:37 AM UTC
Seth Michael Larson
Starting on Software Bill-of-Materials (SBOM) for CPython
Starting on Software Bill-of-Materials (SBOM) for CPython
Starting on Software Bill-of-Materials (SBOM) for CPython
Published 2023-09-26 by Seth Larson
Reading time: minutes
This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!
I've started dipping my toes into creating an authoritative SBOM for the CPython project, you can follow along in this GitHub repository if you are interested. This project is very early and this will not be the final product or place where this information is published, this is only a place to experiment and get feedback on the approach and outputs before putting the final infrastructure in place.
I started with the most straightforward release artifact, the source tarball, and I am planning to tackle the binary installers
later since they'll require more research into the release processes. There is a work-in-progress SBOM file for Python-3.11.5.tgz
available in the sboms/
directory on the repository.
I've also included an SBOM for CPython 3.11.0 which can be used to see whether vulnerability scanning tools are capable of consuming the result SBOM and flagging subcomponents for vulnerabilities. I used Grype as an example for this, and indeed it was able to consume the SBOM and flag the known vulnerabilities:
$ grype sbom:sboms/Python-3.11.0.tgz.spdx.json
â Vulnerability DB [updated]
â Scanned for vulnerabilities [9 vulnerability matches]
âââ by severity: 0 critical, 6 high, 3 medium, 0 low, 0 negligible
âââ by status: 0 fixed, 9 not-fixed, 0 ignored
NAME INSTALLED FIXED-IN TYPE VULNERABILITY SEVERITY
CPython 3.11.0 CVE-2023-41105 High
CPython 3.11.0 CVE-2023-36632 High
CPython 3.11.0 CVE-2023-24329 High
CPython 3.11.0 CVE-2022-45061 High
CPython 3.11.0 CVE-2023-40217 Medium
CPython 3.11.0 CVE-2023-27043 Medium
CPython 3.11.0 CVE-2007-4559 Medium
expat 2.4.7 CVE-2022-43680 High
expat 2.4.7 CVE-2022-40674 High
The tool was able to see not only vulnerabilities in CPython but also in the expat subcomponent. Without an SBOM the expat subcomponent wouldn't be detected by current versions of Grype. Running Grype on the CPython 3.11.5 SBOM results in zero known vulnerabilities. đ„ł
$ grype sbom:sboms/Python-3.11.5.tgz.spdx.json
â Vulnerability DB [no update available]
â Scanned for vulnerabilities [0 vulnerability matches]
âââ by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
âââ by status: 0 fixed, 0 not-fixed, 0 ignored
No vulnerabilities found
Sigstore signatures for CPython
Now all CPython releases that have Sigstore verification materials have
"bundles" (ie .sigstore
files) instead of the "disjoint verification materials" (ie .crt
and .sig
files).
These new bundles have been back-filled from existing verification materials using new
VerificationMaterials.to_bundle()
method
in the Python Sigstore client. Thanks to Ćukasz Langa for verifying the new bundles and publishing them to python.org.
Now that all releases have bundles available, I've also updated the Sigstore verification instructions on python.org to only reference bundles:
$ python -m sigstore verify identity \
--bundle Python-3.11.0.tgz.sigstore \
--cert-identity pablogsal@python.org \
--cert-oidc-issuer https://accounts.google.com \
Python-3.11.0.tgz
Having bundles means one less file to download to verify a signature and that verification doesn't need to query the transparency log, instead relying on the entry embedded within the bundle.
Truststore support coming for Conda!
Conda has merged the pull request to add Truststore support to Conda which is slated for v23.9.0. This required creating a top-level feedstock for Truststore.
pip has merged the pull request to bundle Truststore into pip, so it's no longer required to "bootstrap" Truststore in order to have support for using system certificates. This feature will be coming in pip v23.3.
Python Security Response Team (PSRT) using GitHub Security Advisories
I spent some time developing a small GitHub App that would add the PSRT GitHub team to all newly created GitHub Security Advisories and have something that works in-theory.
Unfortunately, there's currently no way to get webhook events for the creation of draft GitHub Security Advisories, you can only get a webhook for when security reports are filed. This means that anyone with access to GitHub Security Advisories (ie organization or repository admins) wouldn't trigger the GitHub App action to add the PSRT team.
Security Developer-in-Residence 2023 Q3 update for PSF blog
Since I've just passed 3 months in this role (time sure does fly!) I am drafting a summarized update for my work in 2023 Q3 that will be published to the Python Software Foundation blog. Subscribe to the blog via RSS or other social media platform to get notified.
That's all for this week! đ If you're interested in more you can read last week's report.
Wow, you made it to the end!
If you're like me, you don't believe social media should be the way to get updates on the cool stuff your friends are up to. Instead, you should either follow my blog with the RSS reader of your choice or via my email newsletter for guaranteed article publication notifications.
If you really enjoyed a piece I would be grateful if you shared with a friend. If you have follow-up thoughts you can send them via email.
Thanks for reading!
â Seth
September 26, 2023 12:00 AM UTC
September 25, 2023
PyBites
Meet Will Raphaelson: From Script to Production Flow With Prefect & Marvin AI
This week Robin Beer – one of our coaches – interviews Will Raphaelson, Principal Product Manager at Prefect.
They talk about his use of Python, Prefect as a tool and its philosophy, open source + business and Marvin AI.
And of course they share cool wins and books they are reading.
All in all an insightful chat that hopefully will leave you inspired to go check out these cool new Python tools …
Listen here:
Or watch on YouTube:
Chapters:
00:00 Intro snippet
00:11 Intro music
00:31 Introduction guests + topics
01:32 Welcome Will, do you have a win of the week?
04:12 Go to meet ups
04:37 How do you leverage Python as a product manager?
07:12 Python as a quick prototyping language
08:14 What is Prefect and its philosophy?
10:56 Robin’s experience with Prefect
12:26 How has Prefect evolved over time?
15:54 Orchestrators and observability
18:02 A practical example of an orchestrated flow
21:21 How Prefect handles failures in a flow?
23:24 Open source and business, how to combine them?
27:45 Tips for starting a open source business?
31:05 Rationale vs emotion in making product decisions
34:12 Marvin AI and its relation with Prefect
38:01 Marvin AI is a nice way to start with Python
40:16 Recommended books
43:02 Wrap up
43:55 Outro music
Connect with Will on LinkedIn.
Prefect product links:
– Prefect
– Marvin AI
Mentioned article:
– 28 Dags Later by Stephen Bailey
Books mentioned:
– The Precipice
– Fundamentals of Data Engineering by Joe Reis
September 25, 2023 05:00 PM UTC
Real Python
Python 3.12 Preview: Subinterpreters
With the upcoming release of Python 3.12 this fall and Python 3.13 following a year later, you might have heard about how Python subinterpreters are changing. The upcoming changes will first give extension module developers better control of the GIL and parallelism, potentially speeding up your programs.
The following release may take this even further and allow you to use subinterpreters from Python directly, making it easier for you to incorporate them into your programs!
In this tutorial, youâll:
- Get a high-level view of what Python subinterpreters are
- Learn how changes to CPythonâs global state in Python 3.12 may change things for you
- Get a glimpse of what changes might be coming for subinterpreters in Python 3.13
To get the most out of this tutorial, you should be familiar with the basics of Python, as well as with the global interpreter lock (GIL) and concurrency. Youâll encounter some C code, but only a little.
Youâll find many other new features, improvements, and optimizations in Python 3.12. The most relevant ones include the following:
- Ever better error messages
- Support for the Linux
perf
profiler - More powerful f-strings
- Improved static typing features
Go ahead and check out whatâs new in the changelog for more details on these and other features. Itâs definitely worth your time to explore whatâs coming!
Free Bonus: Click here to download your sample code for a sneak peek at Python 3.12, coming in October 2023.
What Are Python Subinterpreters?
Before you start thinking about subinterpreters, recall that an interpreter is a program that executes a script directly in a high-level language instead of translating it to machine code. In the case of most Python users, CPython is the interpreter youâre running. A subinterpreter is a copy of the full CPython interpreter that can run independently from the main interpreter that started alongside your program.
Note: The terms interpreter and subinterpreter get mixed together fairly commonly. For the purposes of this tutorial, you can view the main interpreter as the one that runs when your program starts. All other interpreters that start after that point are considered subinterpreters. Other than a few minor details, subinterpreters are the same type of object as the main interpreter.
Most of the state of the subinterpreter is separate from the main interpreter. This includes elements like the global scope name table and the modules that get imported. However, this doesnât include some of the items that the operating system provides to the process, like file handles and memory.
This is different from threading or other forms of concurrency in that threads can share the same context and global state while allowing a separate flow of execution. For example, if you start a new thread, then it still has the same global scope name table.
A subinterpreter, however, can be described as a collection of cooperating threads that have some shared state. These threads will have the same set of imports, independent of other subinterpreters. Spawning new threads in a subinterpreter adds new threads to this collection, which wonât be visible from other interpreters.
Some of the upcoming changes that youâll see will also allow subinterpreters to improve parallelism in Python programs.
Subinterpreters have been a part of the Python language since version 1.5, but theyâve only been available as part of the C-API, not from Python. But there are large changes coming that will make them more useful and interesting for everyday Python users.
Whatâs Changing in Python 3.12 (PEP 684)?
Now that you know what a Python subinterpreter is, youâll take a look at whatâs changing in the upcoming releases of CPython.
Most of the subinterpreter changes are described in two proposals, PEP 684 and PEP 554. Only PEP 684 will make it into the 3.12 release. PEP 554 is scheduled for the 3.13 release but hasnât been officially approved yet.
Changes to the Global State and the GIL
The main focus of PEP 684 is refactoring the internals of the CPython source code so that each subinterpreter can have its own global interpreter lock (GIL). The GIL is a lock, or mutex, which allows only one thread to have control of the Python interpreter. Until this PEP, there was a single GIL for all subinterpreters, which meant that no matter how many subinterpreters you created, only one could run at a single time.
Moving the GIL so that each subinterpreter has a separate lock is a great idea. So, why hasnât it been done already? The issue is that the GIL is preventing multiple threads from accessing some of the global state of CPython simultaneously, so itâs protecting your program from bugs that race conditions could cause.
The core developers needed to move much of the previously global state into per-interpreter storage, meaning each interpreter has its own independent version:

Read the full article at https://realpython.com/python312-subinterpreters/ »
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
September 25, 2023 02:00 PM UTC
Mike Driscoll
PyDev of the Week: Claudia Ng
This week we welcome Claudia Ng as the PyDev of the Week! Claudia is an author / contributor at Real Python! If you’d like to see what else Claudia has been up to, you should check out her personal website.
Let’s spend a few moments getting to know Claudia better!
Can you tell us a little about yourself (hobbies, education, etc):
I’m a data scientist and I’ve spent the past five years working in fraud and credit risk in the fintech (financial technology) space. I have a Masters in Public Policy from Harvard University and a Bachelor’s in International Business (Finance) with a minor in Spanish from Northeastern University.
In 2018, I was working at a Fintech called Tala, where I managed the new customer portfolio for their Mexico market. It was an incredible journey where we scaled the customer base by over 500x in only two years! Through this process I saw the power of automating lending decisions enabled by machine learning. I was fascinated by how alternative data could be used to predict customer’s repayment behaviors and fraud risk, unlocking the ability to lend to individuals with no or little credit history.
I’m an impact-driven person and seeing the power of applied ML inspired me to set my mind on pivoting into data science by taking on ML-related projects at work, doing online courses and side projects, and eventually moving onto the data science team.
I love what I do and outside of work, my hobbies include all kinds of water sports, bouldering and sudoku.
Why did you start using Python?
I first started using Python in 2019. I was initially using R for analyses since I had learned to use it in grad school, but the Data Science team used Python, so I started learning and picking it up. I found it to be more robust and there are many good third-party packages to support my work. Python is definitely my preferred language now!
What other programming languages do you know and which is your favorite?
I use Python and SQL daily on the job. I am a huge language nerd and can speak 9 human languages if that counts.
What projects are you working on now?
I am working on my second tutorial for Real Python on type hints for multiple return types in Python. Stay tuned for more when it comes out!
Which Python libraries are your favorite (core or 3rd party)?
I’m a Data Scientist, so I love pretty graphs and visuals. It is a crucial element to being able to tell a good data story and help with better decision-making. I would say that my favorite Python library is plotly. It’s a library for making interactive plots, and I love how versatile it is.
How did you get started writing articles for Real Python?
When I pivoted from an analyst role into data science back in 2019, I started writing because I wanted to share my learnings and hopefully inspire others without a STEM degree to break into data science/ engineering. I was writing blog posts on medium for several publications including Towards Data Science, Towards AI and Analytics Vidhya about different topics related to machine learning, feature engineering and data visualizations.
In early 2023, I saw that Real Python was looking for technical writers and applied. I was a subscriber and learned so much about programming from Real Python’s tutorials and courses, it feels like a dream to be writing for this publication!
What excites you most in the data science world right now?
I am excited about the rise of autoML packages that can automate some of the more tedious parts of ML modeling, like data cleaning, model selection and hyperparameter optimization. This would cut down the time spent during the model development cycle, allowing data scientists to iterate faster.
Is there anything else youâd like to say?
If you would like to check out my work, please visit ds-claudia.com to see past blog posts. You can also subscribe for free to receive emails when I publish new blog posts – no spam I promise!
Thanks so much for doing the interview, Claudia!
The post PyDev of the Week: Claudia Ng appeared first on Mouse Vs Python.
September 25, 2023 12:34 PM UTC
Erik Marsja
Seaborn Confusion Matrix: How to Plot and Visualize in Python
The post Seaborn Confusion Matrix: How to Plot and Visualize in Python appeared first on Erik Marsja.
In this Python tutorial, we will learn how to plot a confusion matrix using Seaborn. Confusion matrices are a fundamental tool in data science and hearing science. They provide a clear and concise way to evaluate the performance of classification models. In this post, we will explore how to plot confusion matrices in Python.
In data science, confusion matrices are commonly used to assess the accuracy of machine learning models. They allow us to understand how well our model correctly classifies different categories. For example, a confusion matrix can help us determine how many emails were correctly classified as spam in a spam email classification model.

In hearing science, confusion matrices are used to evaluate the performance of hearing tests. These tests involve presenting different sounds to individuals and assessing their ability to identify them correctly. A confusion matrix can provide valuable insights into the accuracy of these tests and help researchers make improvements.

Understanding how to interpret and visualize confusion matrices is essential for anyone working with classification models or conducting hearing tests. In the following sections, we will dive deeper into plotting and interpreting confusion matrices using the Seaborn library in Python.
Using Seaborn, a powerful data visualization library in Python, we can create visually appealing and informative confusion matrices. We will learn how to prepare the data, create the matrix, and interpret the results. Whether you are a data scientist or a hearing researcher, this guide will equip you with the skills to analyze and visualize confusion matrices using Seaborn effectively. So, let us get started!
Table of Contents
- Outline
- Prerequisites
- Confusion Matrix
- Visualizing a Confusion Matrix
- How to Plot a Confusion Matrix in Python
- Synthetic Data
- Preparing Data
- Creating a Seaborn Confusion Matrix
- Interpreting the Confusion Matrix
- Modifying the Seaborn Confusion Matrix Plot
- Conclusion
- Additional Resources
- More Tutorials
Outline
The structure of the post is as follows. First, we will begin by discussing prerequisites to ensure you have the necessary knowledge and tools for understanding and working with confusion matrices.
Following that, we will delve into the concept of the confusion matrix, highlighting its significance in evaluating classification model performance. In the “Visualizing a Confusion Matrix” section, we will explore various methods for representing this critical analysis tool, shedding light on the visual aspects.
The heart of the post lies in “How to Plot a Confusion Matrix in Python,” where we will guide you through the process step by step. This is where we will focus on preparing the data for the analysis. Under “Creating a Seaborn Confusion Matrix,” we will outline four key steps, from importing the necessary libraries to plotting the matrix with Seaborn, ensuring a comprehensive understanding of the entire process.
Once the confusion matrix is generated, “Interpreting the Confusion Matrix” will guide you in extracting valuable insights, allowing you to make informed decisions based on model performance.
Before concluding the post, we also look at how to modify the confusion matrix we created using Seaborn. For instance, we explore techniques to enhance the visualization, such as adding percentages instead of raw values to the plot. This additional step provides a deeper understanding of model performance and helps you communicate results more effectively in data science applications.
Prerequisites
Before we explore how to create confusion matrices with Seaborn, there are essential prerequisites to consider. First, a foundational understanding of Python is required. Proficiency in Python and a grasp of programming concepts is needed. If you are new to Python, familiarize yourself with its syntax and fundamental operations.
Moreover, prior knowledge of classification modeling is, of course, needed. You need to know how to get the data needed to generate the confusion matrix.
You must install several Python packages to practice generating and visualizing confusion matrices. Ensure you have Pandas for data manipulation, Seaborn for data visualization, and scikit-learn for machine learning tools. You can install these packages using Python’s package manager, pip. Sometimes, it might be necessary to upgrade pip to the latest version. Installing packages is straightforward; for example, you can install Seaborn using the command pip install seaborn
.
Confusion Matrix
A confusion matrix is a performance evaluation tool used in machine learning. It is a table that allows us to visualize the performance of a classification model by comparing the predicted and actual values of a dataset. The matrix is divided into four quadrants: true positive (TP), true negative (TN), false positive (FP), and false negative (FN).
Understanding confusion matrices is crucial for evaluating model performance because they provide valuable insights into the accuracy and effectiveness of a classification model. By analyzing the values in each quadrant, we can determine how well the model performs in correctly identifying positive and negative instances.
The true positive (TP) quadrant represents the cases where the model correctly predicted the positive class. The true negative (TN) quadrant represents the cases where the model correctly predicted the negative class. The false positive (FP) quadrant represents the cases where the model incorrectly predicted the positive class. The false negative (FN) quadrant represents the cases where the model incorrectly predicted the negative class.
We can calculate performance metrics such as accuracy, precision, recall, and F1 score by analyzing these values. These metrics help us assess the model’s performance and make informed decisions about its effectiveness.
The following section will explore different methods to visualize confusion matrices and discuss the importance of choosing the right visualization technique.
Visualizing a Confusion Matrix
When it comes to visualizing a confusion matrix, several methods are available. Each technique offers its advantages and can provide valuable insights into the performance of a classification model.
One common approach is to use heatmaps, which use color gradients to represent the values in the matrix. Heatmaps allow us to quickly identify patterns and trends in the data, making it easier to interpret the model’s performance. Another method is to use bar charts, where the height of the bars represents the values in the matrix. Bar charts are useful for comparing the different categories and understanding the distribution of predictions.
However, Seaborn is one of Python’s most popular and powerful libraries for visualizing confusion matrices. Seaborn offers various functions and customization options, making creating visually appealing and informative plots easy. It provides a high-level interface to create heatmaps, bar charts, and other visualizations.
Choosing the right visualization technique is crucial because it can greatly impact the understanding and interpretation of the confusion matrix. The chosen visualization should convey the information and insights we want to communicate. Seaborn’s flexibility and versatility make it an excellent choice for plotting confusion matrices, allowing us to create clear and intuitive visualizations that enhance our understanding of the model’s performance.
In the next section, we will plot a confusion matrix using Seaborn in Python. We will explore the necessary steps and demonstrate how to create visually appealing and informative plots that help us analyze and interpret the performance of our classification model.
How to Plot a Confusion Matrix in Python
When it comes to plotting a confusion matrix in Python, there are several libraries available that offer this capability.
Steps to Plot a Confusion matrix in Python:
Generating a confusion matrix in Python using any package typically involves the following steps:
- Import the Necessary Libraries: Begin by importing the relevant Python libraries, such as the package for generating confusion matrices and other dependencies.
- Prepare True and Predicted Labels: Collect the true labels (ground truth) and the predicted labels from your classification model or analysis.
- Compute the Confusion Matrix: Utilize the functions or methods the chosen package provides to compute the confusion matrix. This matrix will tabulate the counts of true positives, true negatives, false positives, and false negatives.
- Visualize or Analyze the Matrix: Optionally, you can visualize the confusion matrix using various visualization tools or analyze its values to assess the performance of your classification model.
Seaborn
This post will use Seaborn, one of this task’s most popular and powerful libraries. Seaborn provides a high-level interface to create visually appealing and informative plots, including confusion matrices. It offers various functions and customization options, making it easy to generate clear and intuitive visualizations.
One of the advantages of using Seaborn for plotting confusion matrices is its flexibility. It allows you to create heatmaps, bar charts, and other visualizations, allowing you to choose the most suitable representation for your data. Another advantage of Seaborn is its versatility. It provides various customization options, such as color palettes and annotations, which allow you to enhance the visual appearance of your confusion matrix and highlight important information. Using Seaborn, you can create visually appealing and informative plots that help you analyze and interpret the performance of your classification model. Its powerful capabilities and user-friendly interface make it an excellent choice for plotting confusion matrices in Python.
- How to Make a Violin plot in Python using Matplotlib and Seaborn
- Seaborn Line Plots: A Detailed Guide with Examples (Multiple Lines)
- How to Make a Scatter Plot in Python using Seaborn
The following sections will dive into the necessary steps to prepare your data for generating a confusion matrix using Seaborn. We will also explore data preprocessing techniques that may be required to ensure accurate and meaningful results. First, however, we will generate a synthetic dataset that can be used to practice generating confusion matrices and plotting them.
Synthetic Data
Here, we generate a synthetic dataset that can be used to practice plotting a confusion matrix with Seaborn:
import pandas as pd
import random
# Define the number of test cases
num_cases = 100
# Create a list of hearing test results (Categorical: Hearing Loss, No Hearing Loss)
hearing_results = ['Hearing Loss'] * 20 + ['No Hearing Loss'] * 70
# Introduce noise (e.g., due to external factors)
noisy_results = [random.choice(hearing_results) for _ in range(10)]
# Combine the results
results = hearing_results + noisy_results
# Create a dataframe:
data = pd.DataFrame({'HearingTestResult': results})
Code language: PHP (php)
In the code chunk above, we first imported the Pandas library, which is instrumental for data manipulation and analysis in Python. We also utilized the ‘random’ module for generating random data.
To begin, we defined the variable num_cases
to represent the total number of test cases, which in this context amounts to 100 observations. Next, we set the stage for simulating a hearing test dataset. We created hearing_results,
a list containing the categories Hearing Loss
and No Hearing Loss.
This categorical variable represents the results of a hypothetical hearing test, where Hearing Loss
indicates an impaired hearing condition and No Hearing Loss
signifies normal hearing.
Incorporating an element of real-world variability, we introduced noisy_results.
This step involves generating ten observations with random selections from the hearing_results
list, mimicking external factors that may affect hearing test outcomes. The purpose is to simulate real-world variability and add diversity to the dataset.
Combining the hearing_results
and noisy_results
, we created the results
list, representing the complete dataset. Finally, we used Pandas to create a dataframe with a dictionary as input. We named it data
with a column labeled HearingTestResult
, which encapsulates the simulated hearing test data.

Preparing Data
Ensuring data is adequately prepared before generating a confusion matrix using Seaborn involves several necessary steps. First, we may need to gather the data we want to evaluate using the confusion matrix. This data should consist of the true and predicted labels from your classification model. Ensure the labels are correctly assigned and aligned with the corresponding data points.
Next, we may need to preprocess the data. Data preprocessing techniques can improve the quality and reliability of your results. Commonly, we use techniques such as handling missing values, scaling or normalizing the data, and encoding categorical variables. We will not go through all these steps to create a Seaborn confusion matrix plot.
For example, we can remove the rows or columns with missing values or impute the missing values using techniques such as mean imputation or regression imputation. Scaling the data can be important to ensure all features are on a similar scale. This can prevent certain features from dominating the analysis and affecting the performance of the confusion matrix.
Encoding categorical variables is necessary if your data includes non-numeric variables. This process can involve converting categorical variables into numerical representations. We can also, as in the example below, recode the categorical variables to True
and False
. See How to use Pandas get_dummies to Create Dummy Variables in Python for more information about dummy coding.
By following these steps and applying appropriate data preprocessing techniques, you can ensure our data is ready to generate a confusion matrix using Seaborn. The following section will provide step-by-step instructions on how to create a Seaborn confusion matrix, along with sample code and visuals to illustrate the process.
Creating a Seaborn Confusion Matrix
To generate a confusion matrix using Seaborn, follow these step-by-step instructions. First, import the necessary libraries, including Seaborn and Matplotlib. Next, prepare your data by ensuring you have the true and predicted labels from your classification model.
Step 1: Import the Libraries
Here, we import the libraries that we will use to use Seaborn to plot a Confusion Matrix.
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
Code language: Python (python)
Step 2: Prepare and Preprocess Data
The following step is to prepare and preprocess data. Note that we do not have any missing values in the example data. However, we need to recode the categorial variables to True
and False
.
data['HearingTestResult'] = data['HearingTestResult'].replace({'Hearing Loss': True,
'No Hearing Loss': False})
Code language: Python (python)
In the Python code above, we transformed a categorical variable, HearingTestResult
, into a binary format for further analysis. We used the Pandas library’s replace
method to map the categories to boolean values. Specifically, we mapped ‘Hearing Loss’ to True
, indicating the presence of hearing loss, and ‘No Hearing Loss’ to False
, indicating the absence of hearing loss.
Step 3: Create a Confusion Matrix
Once the data is ready, we can create the confusion matrix using the confusion_matrix()
function from the Scikit-learn library. This function takes the true and predicted labels as input and returns a matrix that represents the performance of our classification model.
conf_matrix = confusion_matrix(data['HearingTestResult'],
data['PredictedResult'])
Code language: Python (python)
In the code snippet above, we computed a confusion matrix using the confusion_matrix
function from scikit-learn. We provided the true hearing test results from the dataset and the predicted results to evaluate the performance of a classification model.

Step 4: Plot the Confusion Matrix with Seaborn
To plot a confusion Matrix with Seaborn, we can use the following code:
# Plot the confusion matrix using Seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False,
xticklabels=['Predicted Negative', 'Predicted Positive'],
yticklabels=['True Negative', 'True Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()
Code language: Python (python)
In the code chunk above, we created a visual representation of the confusion matrix using the Seaborn library. We defined the plot’s appearance to provide an insightful view of the model’s performance. The sns.heatmap
function generates a heatmap with annotations to depict the confusion matrix values. We specified formatting options (annot
and fmt
) to display the counts, we chose the Blues
color palette for visual clarity. Additionally, we customized the plot’s labels with xticklabels
and yticklabels
denoting the predicted and actual classes, respectively. The xlabel
, ylabel
, and title
functions helped us label the plot appropriately. This visualization is a powerful tool for comprehending the model’s classification accuracy, making it accessible and easy for data analysts and stakeholders to interpret. Here is the resulting plot:

Interpreting the Confusion Matrix
Once you have generated a Seaborn confusion matrix for your classification model, it is important to understand how to interpret the results presented in the matrix. The confusion matrix provides valuable information about your model’s performance and can help you evaluate its accuracy. The confusion matrix consists of four main components: true positives, false positives, true negatives, and false negatives. These components represent the different outcomes of your classification model.
True positives (TP) are the cases where the model correctly predicted the positive class. In other words, these are the instances where the model correctly identified the presence of a certain condition or event. False positives (FP) occur when the model incorrectly predicts the positive class. These are the instances where the model falsely identifies the presence of a certain condition or event.
True negatives (TN) represent the cases where the model correctly predicts the negative class. These are the instances where the model correctly identifies the absence of a certain condition or event. False negatives (FN) occur when the model incorrectly predicts the negative class. These are the instances where the model falsely identifies the absence of a certain condition or event.
By analyzing these components, you can gain insights into the performance of your classification model. For example, many false positives may indicate that your model incorrectly identifies certain conditions or events. On the other hand, many false negatives may suggest that your model fails to identify certain conditions or events.
Understanding the meaning of true positives, false positives, and false negatives is crucial for evaluating the effectiveness of your classification model and making informed decisions based on its predictions. Before concluding the post, we will also examine how we can modify the Seaborn plot.
Modifying the Seaborn Confusion Matrix Plot
We can also plot the confusion matrix with percentages instead of raw values using Seaborn:
# Calculate percentages for each cell in the confusion matrix
percentage_matrix = (conf_matrix / conf_matrix.sum().sum())
# Plot the confusion matrix using Seaborn with percentages
plt.figure(figsize=(8, 6))
sns.heatmap(percentage_matrix, annot=True, fmt='.2%', cmap='Blues', cbar=False,
xticklabels=['Predicted Negative', 'Predicted Positive'],
yticklabels=['True Negative', 'True Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix (Percentages)')
plt.show()
Code language: PHP (php)
In the code snippet above, we changed the code a bit. First, we calculated the percentages and stored them in the variable percentage_matrix
by dividing the raw confusion matrix (conf_matrix
) by the sum of all its elements.

After calculating the percentages, we modified the fmt
parameter within the Seaborn heatmap function. Specifically, we set fmt
to ‘.2%’ to format the annotations as percentages, ensuring that the values displayed in the matrix represent the proportions of the total observations in the dataset. This change enhances the interpretability of the confusion matrix by expressing classification performance relative to the dataset’s scale. Here are some more tutorials about, e.g., modifying Seaborn plots:
- How to Save a Seaborn Plot as a File (e.g., PNG, PDF, EPS, TIFF)
- How to Change the Size of Seaborn Plots
Conclusion
In conclusion, this tutorial has provided a comprehensive overview of how to plot and visualize a confusion matrix using Seaborn in Python. We have explored the concept of confusion matrices and their significance in various industries, such as speech recognition systems in hearing science and cognitive psychology experiments. By analyzing confusion matrices, we can gain valuable insights into the performance of systems and the accuracy of participants’ responses.
Understanding and visualizing a confusion matrix with Seaborn is crucial for data analysis projects. It allows us to assess classification models’ performance and identify improvement areas. Visualizing the confusion matrix will enable us to quickly interpret the results and make informed decisions based on other measures such as accuracy, precision, recall, and F1 score.
We encourage readers to apply their knowledge of confusion matrices and Seaborn in their data analysis projects. By implementing these techniques, they can enhance their understanding of classification models and improve the accuracy of their predictions.
I hope this article has helped demystify confusion matrices and provide practical guidance on plotting and visualizing them using Seaborn. I invite readers to share this post on social media and engage in discussions about their progress and experiences with confusion matrices in their data analysis endeavors.
Additional Resources
In addition to the information provided in this data visualization tutorial, several other resources and tutorials can further enhance your understanding of plotting and visualizing confusion matrices using Seaborn in Python. These resources can provide additional insights, tips, and techniques to help you improve your data analysis projects.
Here are some recommended resources:
- Seaborn Documentation: The official documentation for Seaborn is a valuable resource for understanding the various functionalities and options available for creating visualizations, including confusion matrices. It provides detailed explanations, examples, and code snippets to help you get started.
- Stack Overflow: Stack Overflow is a popular online community where programmers and data analysts share their knowledge and expertise. Using Seaborn, you can find numerous questions and answers related to plotting and visualizing confusion matrices. This platform can be a great source of solutions to specific issues or challenges.
By exploring these additional resources, you can expand your knowledge and skills in plotting and visualizing confusion matrices using Seaborn. These materials will give you a deeper understanding of the subject and help you apply these techniques effectively in your data analysis projects.
More Tutorials
Here are some more Python tutorials on this blog that you may find helpful:
- Coefficient of Variation in Python with Pandas & NumPy
- Python Check if File is Empty: Data Integrity with OS Module
- Find the Highest Value in Dictionary in Python
- Pandas Count Occurrences in Column â i.e. Unique Values
The post Seaborn Confusion Matrix: How to Plot and Visualize in Python appeared first on Erik Marsja.
September 25, 2023 11:08 AM UTC
Kushal Das
Documentation of Puppet code using sphinx
Sphinx is the primary documentation tooling for most of my projects. I use it for the Linux command line book too. Last Friday while in a chat with Leif about documenting all of our puppet codebase, I thought of mixing these too.
Now puppet already has a tool to generate documentation from it's code, called puppet strings. We can use that to generate markdown output and then use the same in sphix for the final HTML output.
I am using https://github.com/simp/pupmod-simp-simplib as the example puppet code as it comes with good amount of reference documentation.
Install puppet strings and the dependencies
$ gem install yard puppet-strings
Then cloning puppet codebase.
$ git clone https://github.com/simp/pupmod-simp-simplib
Finally generating the initial markdown output.
$ puppet strings generate --format markdown --out simplib.md
Files 161
Modules 3 (3 undocumented)
Classes 0 (0 undocumented)
Constants 0 (0 undocumented)
Attributes 0 (0 undocumented)
Methods 5 (0 undocumented)
Puppet Tasks 0 (0 undocumented)
Puppet Types 7 (0 undocumented)
Puppet Providers 8 (0 undocumented)
Puppet Plans 0 (0 undocumented)
Puppet Classes 2 (0 undocumented)
Puppet Data Type Aliases 73 (0 undocumented)
Puppet Defined Types 1 (0 undocumented)
Puppet Data Types 0 (0 undocumented)
Puppet Functions 68 (0 undocumented)
98.20% documented
sphinx setup
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install sphinx myst_parser
After that create a standard sphinx project or use your existing one, and update the conf.py
with the following.
extensions = ["myst_parser"]
source_suffix = {
'.rst': 'restructuredtext',
'.txt': 'markdown',
'.md': 'markdown',
}
Then copy over the generated markdown from the previous step and use sed
command to update the title of the document to something better.
$ sed -i '1 s/^.*$/SIMPLIB Documenation/' simplib.md
Don't forget to add the simplib.md
file to your index.rst
and then build the HTML documentation.
$ make html
We can still improve the markdown generated by the puppet strings
command,
have to figure out simpler ways to do that part.
September 25, 2023 09:23 AM UTC
eGenix.com
Python Meeting DĂŒsseldorf - 2023-09-27
The following text is in German, since we're announcing a regional user group meeting in DĂŒsseldorf, Germany.
AnkĂŒndigung
Das nĂ€chste Python Meeting DĂŒsseldorf findet an folgendem Termin statt:
27.09.2023, 18:00 Uhr
Raum 1, 2.OG im BĂŒrgerhaus Stadtteilzentrum Bilk
DĂŒsseldorfer Arcaden, Bachstr. 145, 40217 DĂŒsseldorf
Programm
Bereits angemeldete VortrÀge
- Moritz Damm:
EinfĂŒhrung in 'Kedro - A framework for production-ready data science' - Marc-AndrĂ© Lemburg:
Parsing structured content with Python 3.10's new match-case - Arkadius Schuchhardt:
Repository Pattern in Python: Why and how? - Jens Diemer:
CLI Tools
Weitere VortrÀge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.
Startzeit und Ort
Wir treffen uns um 18:00 Uhr im BĂŒrgerhaus in den DĂŒsseldorfer Arcaden.
Das BĂŒrgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet
sich an der Seite der Tiefgarageneinfahrt der DĂŒsseldorfer Arcaden.
Ăber dem Eingang steht ein groĂes "Schwimmâ in Bilk" Logo. Hinter der TĂŒr
direkt links zu den zwei AufzĂŒgen, dann in den 2. Stock hochfahren. Der
Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.
>>> Eingang in Google Street View
Einleitung
Das Python Meeting DĂŒsseldorf ist eine regelmĂ€Ăige Veranstaltung in DĂŒsseldorf, die sich an Python Begeisterte aus der Region wendet.
Einen guten Ăberblick ĂŒber die VortrĂ€ge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der VortrĂ€ge nach den Meetings veröffentlichen.Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, DĂŒsseldorf:
Format
Das Python Meeting DĂŒsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.
VortrĂ€ge können vorher angemeldet werden, oder auch spontan wĂ€hrend des Treffens eingebracht werden. Ein Beamer mit HDMI und FullHD Auflösung steht zur VerfĂŒgung.(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de
Kostenbeteiligung
Das Python Meeting DĂŒsseldorf wird von Python Nutzern fĂŒr Python Nutzer veranstaltet.
Da Tagungsraum, Beamer, Internet und GetrĂ€nke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. SchĂŒler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.
Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.
Anmeldung
Da wir nur 25 Personen in dem angemieteten Raum empfangen können, möchten wir bitten, sich vorher anzumelden.
Meeting Anmeldung bitte per Meetup
Weitere Informationen
Weitere Informationen finden Sie auf der Webseite des Meetings:
https://pyddf.de/
Viel SpaĂ !
Marc-Andre Lemburg, eGenix.com
September 25, 2023 09:00 AM UTC
September 22, 2023
Stack Abuse
How to Check for NaN Values in Python
Introduction
Today we're going to explore how to check for NaN (Not a Number) values in Python. NaN values can be quite a nuisance when processing data, and knowing how to identify them can save you from a lot of potential headaches down the road.
Why Checking for NaN Values is Important
NaN values can be a real pain, especially when you're dealing with numerical computations or data analysis. They can skew your results, cause errors, and generally make your life as a developer more difficult. For instance, if you're calculating the average of a list of numbers and a NaN value sneaks in, your result will also be NaN, regardless of the other numbers. It's almost as if it "poisons" the result - a single NaN can throw everything off.
Note: NaN stands for 'Not a Number'. It is a special floating-point value that cannot be converted to any other type than float.
NaN Values in Mathematical Operations
When performing mathematical operations, NaN values can cause lots of issues. They can lead to unexpected results or even errors. Python's math
and numpy
libraries typically propagate NaN values in mathematical operations, which can lead to entire computations being invalidated.
For example, in numpy
, any arithmetic operation involving a NaN value will result in NaN:
import numpy as np
a = np.array([1, 2, np.nan])
print(a.sum())
Output:
nan
In such cases, you might want to consider using functions that can handle NaN values appropriately. Numpy provides nansum()
, nanmean()
, and others, which ignore NaN values:
print(np.nansum(a))
Output:
3.0
Pandas, on the other hand, generally excludes NaN values in its mathematical operations by default.
How to Check for NaN Values in Python
There are many ways to check for NaN values in Python, and we'll cover some of the most common methods used in different libraries. Let's start with the built-in math
library.
Using the math.isnan() Function
The math.isnan()
function is an easy way to check if a value is NaN. This function returns True
if the value is NaN and False
otherwise. Here's a simple example:
import math
value = float('nan')
print(math.isnan(value)) # True
value = 5
print(math.isnan(value)) # False
As you can see, when we pass a NaN value to the math.isnan()
function, it returns True
. When we pass a non-NaN value, it returns False
.
The benefit of using this particular function is that the math
module is built-in to Python, so no third party packages need to be installed.
Using the numpy.isnan() Function
If you're working with arrays or matrices, the numpy.isnan()
function can be a nice tool as well. It operates element-wise on an array and returns a Boolean array of the same shape. Here's an example:
import numpy as np
array = np.array([1, np.nan, 3, np.nan])
print(np.isnan(array))
# array([False, True, False, True])
In this example, we have an array with two NaN values. When we use numpy.isnan()
, it returns a Boolean array where True
corresponds to the positions of NaN values in the original array.
You'd want to use this method when you're already using NumPy in your code and need a function that works well with other NumPy structures, like np.array
.
Using the pandas.isnull() Function
Pandas provides an easy-to-use function, isnull()
, to check for NaN values in the DataFrame or Series. Let's take a look at an example:
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [5, np.nan, np.nan], 'C': [1, 2, 3]})
print(df.isnull())
The output will be a DataFrame that mirrors the original, but with True
for NaN values and False
for non-NaN values:
A B C
0 False False False
1 False True False
2 True True False
One thing you'll notice if you test this method out is that it also returns True
for None
values, hence why it refers to null
in the method name. It will return True
for both NaN
and None
.
Comparing the Different Methods
Each method we've discussed â math.isnan()
, numpy.isnan()
, and pandas.isnull()
â has its own strengths and use-cases. The math.isnan()
function is a straightforward way to check if a number is NaN, but it only works on individual numbers.
On the other hand, numpy.isnan()
operates element-wise on arrays, making it a good choice for checking NaN values in numpy
arrays.
Finally, pandas.isnull()
is perfect for checking NaN values in pandas Series or DataFrame objects. It's worth mentioning that pandas.isnull()
also considers None
as NaN, which can be very useful when dealing with real-world data.
Conclusion
Checking for NaN values is an important step in data preprocessing. We've explored three methods â math.isnan()
, numpy.isnan()
, and pandas.isnull()
â each with its own strengths, depending on the type of data you're working with.
We've also discussed the impact of NaN values on mathematical operations and how to handle them using numpy and pandas functions.
September 22, 2023 08:12 PM UTC
How to Position Legend Outside the Plot in Matplotlib
Introduction
In data visualization, often create complex graphs that need to have legends for the reader to be able to interpret the graph. But what if those legends get in the way of the actual data that they need to see? In this Byte, we'll see how you can move the legend so that it's outside of the plot in Matplotlib.
Legends in Matplotlib
In Matplotlib, legends provide a mapping of labels to the elements of the plot. These can be very important to help the reader understand the visualization they're looking at. Without the legend, you might not know which line represented which data! Here's a basic example of how legends work in Matplotlib:
import matplotlib.pyplot as plt
# Create a simple line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label='Sample Data')
# Add a legend
plt.legend()
# Display the plot
plt.show()
This will produce a plot with a legend located in the upper-left corner inside the plot. The legend contains the label 'Sample Data' that we specified in the plt.plot()
function.
Why Position the Legend Outside the Plot?
While having the legend inside the plot is the default setting in Matplotlib, it's not always the best choice. Legends can obscure important details of the plot, especially when dealing with complex data visualizations. By positioning the legend outside the plot, we can be sure that all data points are clearly visible, making our plots easier to interpret.
How to Position the Legend Outside the Plot in Matplotlib
Positioning the legend outside the plot in Matplotlib is fairly easy to do. We simply need to use the bbox_to_anchor
and loc
parameters of the legend()
function. Here's how to do it:
import matplotlib.pyplot as plt
# Create a simple line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label='Sample Data')
# Add a legend outside the plot
plt.legend(bbox_to_anchor=(1, 1.10), loc='upper right')
# Display the plot
plt.show()
In this example, bbox_to_anchor
is a tuple specifying the coordinates of the legend's anchor point, and loc
indicates the location of the anchor point with respect to the legend's bounding box. The coordinates are in axes fraction (i.e., from 0 to 1) relative to the size of the plot. So, (1, 1.10)
positions the anchor point just outside the top right corner of the plot.
Positioning this legend is a bit more of an art than a science, so you may need to play around with the values a bit to see what works.
Common Issues and Solutions
One common issue is the legend getting cut off when you save the figure using plt.savefig()
. This happens because plt.savefig()
doesn't automatically adjust the figure size to accommodate the legend. To fix this, you can use the bbox_inches
parameter and set it to 'tight' like so:
plt.savefig('my_plot.png', bbox_inches='tight')
Another common issue is the legend overlapping with the plot when positioned outside. This can be fixed by adjusting the plot size or the legend size to ensure they fit together nicely. Again, this is something you'll likely have to test with many different values to find the right configuration and positioning.
Note: Adjusting the plot size can be done using plt.subplots_adjust()
, while the legend size can be adjusted using legend.get_frame()
.
Conclusion
And there you have it! In this Byte, we showed how you can position the legend outside the plot in Matplotlib and explained some common issues. We've also talked a bit about some use-cases where you'll need to position the legend outside the plot.
September 22, 2023 07:14 PM UTC
Importing Python Modules
Introduction
Python allows us to create just about anything, from simple scripts to complex machine learning models. But to work on any complex project, you'll likely need to use or create modules. These are the building blocks of complex projects. In this article, we'll explore Python modules, why we need them, and how we can import them in our Python files.
Understanding Python Modules
In Python, a module is a file containing Python definitions and statements. The file name is the module name with the suffix .py
added. Imagine you're working on a Python project, and you've written a function to calculate the Fibonacci series. Now, you need to use this function in multiple files. Instead of rewriting the function in each file, you can write it once in a Python file (module) and import it wherever needed.
Here's a simple example. Let's say we have a file math_operations.py
with a function to add two numbers:
# math_operations.py
def add_numbers(num1, num2):
return num1 + num2
We can import this math_operations
module in another Python file and use the add_numbers
function:
# main.py
import math_operations
print(math_operations.add_numbers(5, 10)) # Output: 15
In the above example, we've imported the math_operations
module using the import
statement and used the add_numbers
function defined in the module.
Note: Python looks for module files in the directories defined in sys.path
. It includes the directory containing the input script (or the current directory), the PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH), and the installation-dependent default directory. You can check the sys.path
using import sys; print(sys.path)
.
But why do we need to import Python files? Why can't we just write all our code in one file? Let's find out in the next section.
Why Import Python Files?
The concept of importing files in Python is comparable to using a library or a toolbox. Imagine you're working on a project and need a specific tool. Instead of creating that tool from scratch every time you need it, you would look in your toolbox for it, right? The same goes for programming in Python. If you need a specific function or class, instead of writing it from scratch, you can import it from a Python file that already contains it.
This not only helps us from having to continously rewrite code we've already written, but it also makes our code cleaner, more efficient, and easier to manage. This promotes a modular programming approach where the code is broken down into separate parts or modules, each performing a specific function. This modularity makes debugging and understanding the code much easier.
Here's a simple example of importing a Python standard library module:
import math
# Using the math library to calculate the square root
print(math.sqrt(16))
Output:
4.0
We import the math
module and use its sqrt
function to calculate the square root of 16.
Different Ways to Import Python Files
Python provides several ways to import modules, each with its own use cases. Let's look at the three most common methods.
Using 'import' Statement
The import
statement is the simplest way to import a module. It simply imports the module, and you can use its functions or classes by referencing them with the module name.
import math
print(math.pi)
Output:
3.141592653589793
In this example, we import the math
module and print the value of pi.
Using 'from...import' Statement
The from...import
statement allows you to import specific functions, classes, or variables from a module. This way, you don't have to reference them with the module name every time you use them.
from math import pi
print(pi)
Output:
3.141592653589793
Here, we import only the pi
variable from the math
module and print it.
Using 'import...as' Statement
The import...as
statement is used when you want to give a module a different name in your script. This is particularly useful when the module name is long and you want to use a shorter alias for convenience.
import math as m
print(m.pi)
Output:
3.141592653589793
Here, we import the math
module as m
and then use this alias to print the value of pi.
Importing Modules from a Package
Packages in Python are a way of organizing related modules into a directory hierarchy. Think of a package as a folder that contains multiple Python modules, along with a special __init__.py
file that tells Python that the directory should be treated as a package.
But how do you import a module that's inside a package? Well, Python provides a straightforward way to do this.
Suppose you have a package named shapes
and inside this package, you have two modules, circle.py
and square.py
. You can import the circle
module like this:
from shapes import circle
Now, you can access all the functions and classes defined in the circle
module. For instance, if the circle
module has a function area()
, you can use it as follows:
circle_area = circle.area(5)
print(circle_area)
This will print the area of a circle with a radius of 5.
Note: If you want to import a specific function or class from a module within a package, you can use the from...import
statement, as we showed earlier.
But what if your package hierarchy is deeper? What if the circle
module is inside a subpackage called 2d
inside the shapes
package? Python has got you covered. You can import the circle
module like this:
from shapes.2d import circle
Python's import system is quite flexible and powerful. It allows you to organize your code in a way that makes sense to you, while still providing easy access to your functions, classes, and modules.
Common Issues Importing Python Files
As you work with Python, you may come across several errors while importing modules. These errors could stem from a variety of issues, including incorrect file paths, syntax errors, or even circular imports. Let's see some of these common errors.
Fixing 'ModuleNotFoundError'
The ModuleNotFoundError
is a subtype of ImportError
. It's raised when you try to import a module that Python cannot find. It's one of the most common issues developers face while importing Python files.
import missing_module
This will raise a ModuleNotFoundError: No module named 'missing_module'
.
There are several ways you can fix this error:
-
Check the Module's Name: Ensure that the module's name is spelled correctly. Python is case-sensitive, which means
module
andModule
are treated as two different modules. -
Install the Module: If the module is not a built-in module and you have not created it yourself, you may need to install it using pip. For example:
$ pip install missing_module
- Check Your File Paths: Python searches for modules in the directories defined in
sys.path
. If your module is not in one of these directories, Python won't be able to find it. You can add your module's directory tosys.path
using the following code:
import sys
sys.path.insert(0, '/path/to/your/module')
- Use a Try/Except Block: If the module you're trying to import is not crucial to your program, you can use a try/except block to catch the
ModuleNotFoundError
and continue running your program. For example:
try:
import missing_module
except ModuleNotFoundError:
print("Module not found. Continuing without it.")
Avoiding Circular Imports
In Python, circular imports can be quite a headache. They occur when two or more modules depend on each other, either directly or indirectly. This leads to an infinite loop, causing the program to crash. So, how do we avoid this common pitfall?
The best way to avoid circular imports is by structuring your code in a way that eliminates the need for them. This could mean breaking up large modules into smaller, more manageable ones, or rethinking your design to remove unnecessary dependencies.
For instance, consider two modules A
and B
. If A
imports B
and B
imports A
, a circular import occurs. Here's a simplified example:
# A.py
import B
def function_from_A():
print("This is a function in module A.")
B.function_from_B()
# B.py
import A
def function_from_B():
print("This is a function in module B.")
A.function_from_A()
Running either module will result in a RecursionError
. To avoid this, you could refactor your code so that each function is in its own module, and they import each other only when needed.
# A.py
def function_from_A():
print("This is a function in module A.")
# B.py
import A
def function_from_B():
print("This is a function in module B.")
A.function_from_A()
Note: It's important to remember that Python imports are case-sensitive. This means that import module
and import Module
would refer to two different modules and could potentially lead to a ModuleNotFoundError
if not handled correctly.
Using __init__.py in Python Packages
In our journey through learning about Python imports, we've reached an interesting stop â the __init__.py
file. This special file serves as an initializer for Python packages. But what does it do, exactly?
In the simplest terms, __init__.py
allows Python to recognize a directory as a package so that it can be imported just like a module. Previously, an empty __init__.py
file was enough to do this. However, from Python 3.3 onwards, thanks to the introduction of PEP 420, __init__.py
is no longer strictly necessary for a directory to be considered a package. But it still holds relevance, and here's why.
Note: The __init__.py
file is executed when the package is imported, and it can contain any Python code. This makes it a useful place for initialization logic for the package.
Consider a package named animals
with two modules, mammals
and birds
. Here's how you can use __init__.py
to import these modules.
# __init__.py file
from . import mammals
from . import birds
Now, when you import the animals
package, mammals
and birds
are also imported.
# main.py
import animals
animals.mammals.list_all() # Use functions from the mammals module
animals.birds.list_all() # Use functions from the birds module
By using __init__.py
, you've made the package's interface cleaner and simpler to use.
Organizing Imports: PEP8 Guidelines
When working with Python, or any programming language really, it's important to keep your code clean and readable. This not only makes your life easier, but also the lives of others who may need to read or maintain your code. One way to do this is by following the PEP8 guidelines for organizing imports.
According to PEP8, your imports should be grouped in the following order:
- Standard library imports
- Related third party imports
- Local application/library specific imports
Each group should be separated by a blank line. Here's an example:
# Standard library imports
import os
import sys
# Related third party imports
import requests
# Local application/library specific imports
from my_library import my_module
In addition, PEP8 also recommends that imports should be on separate lines, and that they should be ordered alphabetically within each group.
Note: While these guidelines are not mandatory, following them can greatly improve the readability of your code and make it more Pythonic.
To make your life even easier, many modern IDEs, like PyCharm, have built-in tools to automatically organize your imports according to PEP8.
With proper organization and understanding of Python imports, you can avoid common errors and improve the readability of your code. So, the next time you're writing a Python program, give these guidelines a try. You might be surprised at how much cleaner and more manageable your code becomes.
Conclusion
And there you have it! We've taken a deep dive into the world of Python imports, exploring why and how we import Python files, the different ways to do so, common errors and their fixes, and the role of __init__.py
in Python packages. We've also touched on the importance of organizing imports according to PEP8 guidelines.
Remember, the way you handle imports can greatly impact the readability and maintainability of your code. So, understanding these concepts is not just a matter of knowing Python's syntaxâit's about writing better, more efficient code.
September 22, 2023 03:40 PM UTC
Fix the "AttributeError: module object has no attribute 'Serial'" Error in Python
Introduction
Even if you're a seasoned Python developer, you'll occasionally encounter errors that can be pretty confusing. One such error is the AttributeError: module object has no attribute 'Serial'
. This Byte will help you understand and resolve this issue.
Understanding the AttributeError
The AttributeError
in Python is raised when you try to access or call an attribute that a module, class, or instance does not have. Specifically, the error AttributeError: module object has no attribute 'Serial'
suggests that you're trying to access the Serial
attribute from a module that doesn't possess it.
For instance, if you have a module named serial
and you're trying to use the Serial
attribute from it, you might encounter this error. Here's an example:
import serial
ser = serial.Serial('/dev/ttyUSB0') # This line causes the error
In this case, the serial
module you're importing doesn't have the Serial
attribute, hence the AttributeError
.
Note: It's important to understand that Python is case-sensitive. Serial
and serial
are not the same. If the attribute exists but you're using the wrong case, you'll also encounter an AttributeError
.
Fixes for the Error
The good news is that this error is usually a pretty easy fix, even if it seems very confusing at first. Let's explore some of the solutions.
Install the Correct pyserial Module
One of the most common reasons for this error is the incorrect installation of the pyserial
module. The Serial
attribute is a part of the pyserial
module, which is used for serial communication in Python.
You might have installed a module named serial
instead of pyserial
(this is more common than you think!). To fix this, you need to uninstall the incorrect module and install the correct one.
$ pip uninstall serial
$ pip install pyserial
After running these commands, your issue may be resolved. Now you can import Serial
from pyserial
and use it in your code:
from pyserial import Serial
ser = Serial('/dev/ttyUSB0') # This line no longer causes an error
If this didn't fix the error, keep reading.
Rename your serial.py File
For how much Python I've written in my career, you'd think that I wouldn't make this simple mistake as much as I do...
Another possibility is that the Python interpreter gets confused when there's a file in your project directory with the same name as a module you're trying to import. This is another common source of the AttributeError
error.
Let's say, for instance, you have a file named serial.py
in your project directory (or maybe your script itself is named serial.py
). When you try to import serial
, Python might import your serial.py
file instead of the pyserial
module, leading to the AttributeError
.
The solution here is simple - rename your serial.py
file to something else.
$ mv serial.py my_serial.py
Conclusion
In this Byte, we've explored two common causes of the AttributeError: module object has no attribute 'Serial'
error in Python: installing the wrong pyserial
module, and having a file in your project directory that shares a name with a module you're trying to import. By installing the correct module or renaming conflicting files, you should be able to eliminate this error.
September 22, 2023 12:51 PM UTC
Real Python
The Real Python Podcast â Episode #173: Getting Involved in Open Source & Generating QR Codes With Python
Have you thought about contributing to an open-source Python project? What are possible entry points for intermediate developers? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With đ Python Tricks đ â Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
September 22, 2023 12:00 PM UTC
September 21, 2023
Wesley Chun
Managing Shared (formerly Team) Drives with Python and the Google Drive API
2023 UPDATE: We are working to put updated versions of all the code into GitHub... stay tuned. The link will provided in all posts once the code sample(s) is(are) available.
NOTE 2: The code featured here is also available as a video + overview post as part of this series.
Introduction
Team Drives is a relatively new feature from the Google Drive team, created to solve some of the issues of a user-centric system in larger organizations. Team Drives are owned by an organization rather than a user and with its use, locations of files and folders won't be a mystery any more. While your users do have to be a G Suite Business (or higher) customer to use Team Drives, the good news for developers is that you won't have to write new apps from scratch or learn a completely different API.Instead, Team Drives features are accessible through the same Google Drive API you've come to know so well with Python. In this post, we'll demonstrate a sample Python app that performs core features that all developers should be familiar with. By the time you've finished reading this post and the sample app, you should know how to:
- Create Team Drives
- Add members to Team Drives
- Create a folder in Team Drives
- Import/upload files to Team Drives folders
Using the Google Drive API
The demo script requires creating files and folders, so you do need full read-write access to Google Drive. The scope you need for that is:'https://www.googleapis.com/auth/drive'
â Full (read-write) access to Google Drive
DRIVE
variable.Create Team Drives
New Team Drives can be created withDRIVE.teamdrives().create()
. Two things are required to create a Team Drive: 1) you should name your Team Drive. To make the create process idempotent, you need to create a unique request ID so that any number of identical calls will still only result in a single Team Drive being created. It's recommended that developers use a language-specific UUID library. For Python developers, that's the uuid
module. From the API response, we return the new Team Drive's ID. Check it out:def create_td(td_name): request_id = str(uuid.uuid4()) body = {'name': td_name} return DRIVE.teamdrives().create(body=body, requestId=request_id, fields='id').execute().get('id')
Add members to Team Drives
To add members/users to Team Drives, you only need to create a new permission, which can be done withDRIVE.permissions().create()
, similar to how you would share a file in regular Drive with another user. The pieces of information you need for this request are the ID of the Team Drive, the new member's email address as well as the desired role... choose from: "organizer", "owner", "writer", "commenter", "reader". Here's the code:def add_user(td_id, user, role='commenter'): body = {'type': 'user', 'role': role, 'emailAddress': user} return DRIVE.permissions().create(body=body, fileId=td_id, supportsTeamDrives=True, fields='id').execute().get('id')Some additional notes on permissions: the user can only be bestowed permissions equal to or less than the person/admin running the script... IOW, they cannot grant someone else greater permission than what they have. Also, if a user has a certain role in a Team Drive, they can be granted greater access to individual elements in the Team Drive. Users who are not members of a Team Drive can still be granted access to Team Drive contents on a per-file basis.
Create a folder in Team Drives
Nothing to see here! Yep, creating a folder in Team Drives is identical to creating a folder in regular Drive, withDRIVE.files().create()
. The only difference is that you pass in a Team Drive ID rather than regular Drive folder ID. Of course, you also need a folder name too. Here's the code:def create_td_folder(td_id, folder): body = {'name': folder, 'mimeType': FOLDER_MIME, 'parents': [td_id]} return DRIVE.files().create(body=body, supportsTeamDrives=True, fields='id').execute().get('id')
Import/upload files to Team Drives folders
Uploading files to a Team Drives folder is also identical to to uploading to a normal Drive folder, and also done withDRIVE.files().create()
. Importing is slightly different than uploading because you're uploading a file and converting it to a G Suite/Google Apps document format, i.e., uploading CSV as a Google Sheet, or plain text or Microsoft Word® file as Google Docs. In the sample app, we tackle the former:def import_csv_to_td_folder(folder_id, fn, mimeType): body = {'name': fn, 'mimeType': mimeType, 'parents': [folder_id]} return DRIVE.files().create(body=body, media_body=fn+'.csv', supportsTeamDrives=True, fields='id').execute().get('id')The secret to importing is the MIMEtype. That tells Drive whether you want conversion to a G Suite/Google Apps format (or not). The same is true for exporting. The import and export MIMEtypes supported by the Google Drive API can be found in my SO answer here.
Driver app
All these functions are great but kind-of useless without being called by a main application, so here we are:FOLDER_MIME = 'application/vnd.google-apps.folder' SOURCE_FILE = 'inventory' # on disk as 'inventory.csv' SHEETS_MIME = 'application/vnd.google-apps.spreadsheet' td_id = create_td('Corporate shared TD') print('** Team Drive created') perm_id = add_user(td_id, 'email@example.com') print('** User added to Team Drive') folder_id = create_td_folder(td_id, 'Manufacturing data') print('** Folder created in Team Drive') file_id = import_csv_to_td_folder(folder_id, SOURCE_FILE, SHEETS_MIME) print('** CSV file imported as Google Sheets in Team Drives folder')The first set of variables represent some MIMEtypes we need to use as well as the CSV file we're uploading to Drive and requesting it be converted to Google Sheets format. Below those definitions are calls to all four functions described above.
Conclusion
If you run the script, you should get output that looks something like this, with eachprint()
representing each API call:$ python3 td_demo.py ** Team Drive created ** User added to Team Drive ** Folder created in Team Drive ** CSV file imported as Google Sheets in Team Drives folderWhen the script has completed, you should have a new Team Drives folder called "Corporate shared TD", and within, a folder named "Manufacturing data" which contains a Google Sheets file called "inventory".
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!)âby using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function
import uuid
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
SCOPES = 'https://www.googleapis.com/auth/drive'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
def create_td(td_name):
request_id = str(uuid.uuid4()) # random unique UUID string
body = {'name': td_name}
return DRIVE.teamdrives().create(body=body,
requestId=request_id, fields='id').execute().get('id')
def add_user(td_id, user, role='commenter'):
body = {'type': 'user', 'role': role, 'emailAddress': user}
return DRIVE.permissions().create(body=body, fileId=td_id,
supportsTeamDrives=True, fields='id').execute().get('id')
def create_td_folder(td_id, folder):
body = {'name': folder, 'mimeType': FOLDER_MIME, 'parents': [td_id]}
return DRIVE.files().create(body=body,
supportsTeamDrives=True, fields='id').execute().get('id')
def import_csv_to_td_folder(folder_id, fn, mimeType):
body = {'name': fn, 'mimeType': mimeType, 'parents': [folder_id]}
return DRIVE.files().create(body=body, media_body=fn+'.csv',
supportsTeamDrives=True, fields='id').execute().get('id')
FOLDER_MIME = 'application/vnd.google-apps.folder'
SOURCE_FILE = 'inventory' # on disk as 'inventory.csv'... CHANGE!
SHEETS_MIME = 'application/vnd.google-apps.spreadsheet'
td_id = create_td('Corporate shared TD')
print('** Team Drive created')
perm_id = add_user(td_id, 'email@example.com') # CHANGE!
print('** User added to Team Drive')
folder_id = create_td_folder(td_id, 'Manufacturing data')
print('** Folder created in Team Drive')
file_id = import_csv_to_td_folder(folder_id, SOURCE_FILE, SHEETS_MIME)
print('** CSV file imported as Google Sheets in Team Drives folder')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!Code challenge
Write a simple application that moves folders (and its files or folders) in regular Drive to Team Drives. Each folder you move should be a corresponding folder in Team Drives. Remember that files in Team Drives can only have one parent, and the same goes for folders.September 21, 2023 10:31 PM UTC
Stack Abuse
How to Pass Multiple Arguments to the map() Function in Python
Introduction
The goal of Python, with its rich set of built-in functions, is to allow developers to accomplish complex tasks with relative ease. One such powerful, yet often overlooked, function is the map()
function. The map()
function will execute a given function over a set of items, but how do we pass additional arguments to the provided function?
In this Byte, we'll be exploring the map()
function and how to effectively pass multiple arguments to it.
The map() Function in Python
The map()
function in Python is a built-in function that applies a given function to every item of an iterable (like list, tuple etc.) and returns a list of the results.
def square(number):
return number ** 2
numbers = [1, 2, 3, 4, 5]
squared = map(square, numbers)
print(list(squared)) # Output: [1, 4, 9, 16, 25]
In this snippet, we've defined a function square()
that takes a number and returns its square. We then use the map()
function to apply this square()
function to each item in the numbers
list.
Why Pass Multiple Arguments to map()?
You might be wondering, "Why would I need to pass multiple arguments to map()
?" Well, there are scenarios where you might have a function that takes more than one argument, and you want to apply this function to multiple sets of data simultaneously.
Not every function we provide to map()
will take only one argument. What if, instead of a squared
function, we have a more generic math.pow
function, and one of the arguments is what number to raise the item to. How do we handle a case like this?
Or maybe you have two lists of numbers, and you want to find the product of corresponding numbers from these lists. This is another case where passing multiple arguments to map()
can come be helpful.
How to Pass Multiple Arguments to map()
There are a few different types of cases in which you'd want to pass multiple arguments to map()
, two of which we mentioned above. We'll walk through both of those cases here.
Multiple Iterables
Passing multiple arguments to the map()
function is simple once you understand how to do it. You simply pass additional iterables after the function argument, and map()
will take items from each iterable and pass them as separate arguments to the function.
Here's an example:
def multiply(x, y):
return x * y
numbers1 = [1, 2, 3, 4, 5]
numbers2 = [6, 7, 8, 9, 10]
result = map(multiply, numbers1, numbers2)
print(list(result)) # Output: [6, 14, 24, 36, 50]
Note: Make sure that the number of arguments in the function should match the number of iterables passed to map()
!
In the example above, we've defined a function multiply()
that takes two arguments and returns their product. We then pass this function, along with two lists, to the map()
function. The map()
function applies multiply()
to each pair of corresponding items from the two lists, and returns a new list with the results.
Multiple Arguments, One Iterable
Continuing with our math.pow
example, let's see how we can still use map()
to run this function on all items of an array.
The first, and probably simplest, way is to not use map()
at all, but to use something like list comprehension instead.
import math
numbers = [1, 2, 3, 4, 5]
res = [math.pow(n, 3) for n in numbers]
print(res) # Output: [1.0, 8.0, 27.0, 64.0, 125.0]
This is essentiall all map()
really is, but it's not as compact and neat as using a convenient function like map()
.
Now, let's see how we can actually use map()
with a function that requires multiple arguments:
import math
import itertools
numbers = [1, 2, 3, 4, 5]
res = map(math.pow, numbers, itertools.repeat(3, len(numbers)))
print(list(res)) # Output: [1.0, 8.0, 27.0, 64.0, 125.0]
This may seem a bit more complicated at first, but it's actually very simple. We use a helper function, itertools.repeat
, to create a list the same length as numbers
and with only values of 3
.
So the output of itertools.repeat(3, len(numbers))
, when converted to a list, is just [3, 3, 3, 3, 3]
. This works because we're now passing two lists of the same length to map()
, which it happily accepts.
Conclusion
The map()
function is particularly useful when working with multiple iterables, as it can apply a function to the elements of these iterables in pairs, triples, or more. In this Byte, we've covered how to pass multiple arguments to the map()
function and how to work with multiple iterables.
September 21, 2023 08:22 PM UTC
William Minchin
minchin.jrnl v7 âPhoenixâ released
Today, I do something that I should have done 5 years ago, and something that Iâve been putting off for the last 2 years1: Iâm releasing a personal fork of jrnl2! Iâve given this release the codename Phoenix, after the mythical bird of rebirth, that springs forth renewed from the ashes of the past.
You can install it today:
pip install minchin.jrnl
And then to run it from the command line:
minchin.jrnl
(or
jrnl
)
Features Today
Today, the codebase is that of jrnl v2.63 in a new namespace. In particular, that gives us a working yaml exporter; now you can build your Pelican sites again (or maybe only I was doing thatâŠ).
The version number (7) was picked to be larger than the current jrnl-org/jrnl release (currently at 4.0.1). (Plus I thought it would look cool!)
Iâve moved the configuration to a new location on disk4, as to not stomp on your existing jrnl (i.e. jrnl-org/jrnl or âlegacyâ)Â installation.
Limited documentation, to match the current codebase, has been uploaded to my personal site at minchin.ca/minchin.jrnl. (Although it remains incomplete and very much a work in progress.)
And finally, I extend an invitation to all those current or former users of jrnl to move here. I welcome your contributions and support. If there are features missing, please feel free to let me know.
Short Term Update Plans
Iâve pushed this release out with very few changes from the base codebase in a effort to get it live. But I have some updates that Iâm planning to do very shortly. There updates will maintain the Phoenix codename, even if the major version number increments.
The biggest of these is to launch my much anticipated plugin system. The code has been already written (for several years now5, actually), can it just needs to be double checked that it still works as expected.
The other immediate update is to make sure the code works with Python 3.11 (the current version of Python), which seems to already be the case.
Medium to Long Term Project Goals, or My Vision
These are features Iâd like to add, although I realize this will take more than tonight. Also this section lays out my visions for the project and some anti-features I want to avoid.
The Plugin System
The plugin system I think will be huge movement forward to make minchin.jrnl more useful. In particular, it allows minchin.jrnl to import and export to and from new formats, including allowing you to write one-off export formats (which I intend to use personally right away!). Displying the journal entries on the commandline is also handled by exporters, so youâd be able to tweak that output as well. I also intend to extend the plugin system to the storage backends.
My hope is that this will futureproof minchin.jrnl, allowing new formats to quickly and easily be added, retiring deprecated formats to external plugins, and being able to quickly test and integrate new formats by seemlessly bring external plugins âin-houseâ.
In particular, Iâm planning to have separate plugins for âtrueâ yaml + markdown exports and Pelican-formated markdown, to add an export format for Obsidian6 formatted markdown, and add backend format plugins to support jrnl v1 and whatever format theyâre dreaming up for jrnl v47.
In short, I hope plugins will allow you to make minchin.jrnl more useful, without me being the bottleneck.
Plain Text Forever
One of the things that drew to the original jrnl implementation was that is was based on plain text, and using plain text to store journal entries. Plain text has a number of advantages8, but the near universal backwards and forewards compatibility in high on that list. Yes, plain text has itâs limitations9, but I think the advantages far outweight the disadvantages, particularly when it comes to a journal that you might hope will be readable years or generations from now. Also, plain text just makes it so much easier to develop minchin.jrnl.
The included, default option for minchin.jrnl will always be plain text.
If youâre looking to upgrade your plain text, you might consider Markdown10 or ReStructured Text (ReST)11.
If youâre looking for either a database backend or more multimedia functionality (or both), youâre welcome to write something as a backend plugin for minchin.jrnl; that ability is a featured reason for providing the (to be added shortly!) plugin system in the first place!
MITÂ Licensed
The original jrnl was initially released under the MIT license, and that only changed with the v2.4 release to GPLv312. My hope and goal is to remove all GPL-licensed code and release future versions of minchin.jrnl under the MIT license23.
My opposition to the change13 was because Iâve come to feel that Open Source work is both given and received as a gift, and I feel the GPL license moves away from that ethos.
I suspect the least fun part of this partial re-write will be getting the testing system up and running again, as the original library jrnl v1 had been using has gone many years without updates.
To this end, Iâm requesting that any code contributions to the project be dual-licensed under both MIT and GPLv3.
Documentation in Sphinx
Documentation will eventually be moved over to Sphinx (from the current MkDocs), a process Iâve began but not finished. Driving this is the expectation that Iâll have more technical documentation (than is included currently) as I layout how to work with the plugin API, and Sphinx makes it easy to keep code and documentation side by side in the same (code)Â file.
Furthermore, I want to document how to use minchin.jrnl as a Python API generally; this would allow you to interact with your journal from other Python programs.
Easy to Push Releases
Knowing my own schedule, I want to be able to sit down for an evening, make (several) small improvements, and then push out a new release before I go to bed. To that end, I want to make the streamlined to push out new releases. Expect lots of small releases. :)
Drop poetry
poetry is a powerful Python project management tool, but is one Iâve never really liked14. Particular issues include a lack of first class Windows support15 and very conservative upper bounds for dependencies and supported Python versions. Plus I have refind a system elsewhere using pip-tools16 and setup.py to manage these same issues that I find works very well for me.
This has been accomplished with the current release!
Windows Support
jnrl, to date, has always had decent Windows support. As I personally work on Windows, Windows will continue to have first class support.
Where this may show is tools beyond Python will need to be readily available on Windows before theyâre used33, and the Windows Terminal is fairly limited in what in can do, at least compared with some Linux terminals.
Replace the Current Code of Conduct
I donât much care for the current Code of Conduct17: it seems to be overly focused on the horrible things people might do to each other, and Iâm not sure I want that to be the introduction people get to the project. I hope to find a Code of Conduct that focuses more on the positive things I hope people will do as they interact with each other and the project.
My replaced/current Code of Conduct is here (although this may be updated again in the future).
Open Stewardship Discussion
If the time comes when someone else is assuming stewardship for the project, I intend for those discussions to be help publicly18.
My History with the Project, and Why the Fork
This section is different: it is much less about the codebase and more focused on myself and my relationship to it. I warn you it is likely to be somewhat long and winding.
My Introduction to jrnl
Many years ago now, I was new in Python. At that time34 when I came across a problem that I thought programming might solve, I first went looking from a Python program that might solve it.
In looking for a way to manage the regular text notes I was taking at work, I found jrnl, which I eventually augmented with DayOne (Classic) for in-field note entry (on a work-supplied iPad) and Pelican for display.
Jrnl was more though: it was the object of my first Pull Request35, my first contribution to Open Source. My meagre help was appreciated and welcomed warmly, and so I returned often. I found jrnl to be incredibly useful in learning about Python best practices and code organization; here was a program that was more than a toy but simple enough (or logically separated) that I could attempt to understand it, to gork it, as a whole. I contributed in many places, but particularly around Windows support, DayOne backend support, and exports to Markdown (to be fed to Pelican).
In short jrnl became part of the code I used everyday.
jrnl Goes Into Hibernation
I have heard it said that software rewrites are a great way to kill a project. The reasons for this are multiple, but in short it (typically) saps the energy to update the legacy version even as bugs pile up, but the new thing canât go into everyday use until it is feature-compatible with the legacy version, and the re-write always takes way more effort than initial estimates.
For reasons now forgotten36, a âv2â semi-rewrite was undertaken. And then it stalled. And then the project maintainer got a life19 and the re-write stalled even moreso.
The (Near) Endless Beta of v2, or the First Time I Should Have Forked
For me, initially, this wasnât a big deal: I was often running a development build locally as I tinkered away with jrnl, so I just kept doing that. Also, I had just started working on my plugin system (for exporters first, but expecting it could easily be extended to importers and backends).
As the months of inactivity on the part of the maintainer stretched on, and pull requests grew staler, at some point I should have forked the project and âofficiallyâ released my development version. But I never did, because it seemed like a scary new thing to do20.
Invitation to Become a Maintainer
And then21 one day out of the blue, I got an email from the maintainer asking if I wanted to be a co-maintainer for jrnl! I was delighted, and promptly said yes. I was given commit access to the repo on GitHub (but, as far as I knew, no ability to push releases to PyPI), and thenâŠnot much happened. I reached out to the maintainer to suggest some plans, as it still felt like âhisâ project, but I never heard much back. And I was too timid to move forward without at least something from him. And I was busy with the rest of life too. After a few months, I realized my first plan wasnât working and started thinking about how to try again to move the project forward, more on my own. In front of me was the push to v2.0, and a question of how to integrate my in-development plug-in system.
The Coup
And then on a one another day, again out of the blue, I got an unexpected email that I no longer had commit access to the jrnl repo. I searched the project for updates, including the issue tracker and came up with #591 where a transition to new maintainers was brought up; I donât know why I wasnât pinged on it. At the time22, I said I was happy to see new life in the project and to see it move forward. But it was unsettling that Iâd missed the early discussions.
It also seemed odd to me that the two maintainer that stepped forward hadnât seemed to be involved with the project at all before that point.
For a while, things were ok: a âversion 2â was released that was very close to the v2 branch I was using at home, bugs started getting fixed regularly, and new releases continued to come out. But my plugins never made it into a release.
Things Fall Apart (aka I Get Frustrated)
But things under new management didnât stay rosy.
One of the things they did was completely rewrite the Git history, and thus change all the commit hashes. This was a small but continueing annoyance (until I got a new computer), because everytime I would go to push changes to GitHub, my git client would complain about the ânewâ (old) tags it was trying to push, because it couldnât find a commit with the matching hash.
But my two big annoyances were a re-write of the YAML exporter and the continual roadblocks to getting my plugin system merged in.
My plugin system has the longest history, having been started before the change in stewardship. Many times (after the change in stewardship), I created a pull request24 and the response would be to make various changes or to split it into smaller pieces; I would make the changes, and the cycle would continue. But there was never a plan presented that I felt I could successful complete, nor was I ever told the plugin system was unaligned with the vision they had for the project. I lost considerable enthusiasm for trying to get the plugins merged after rewriting the tests for the third time (as the underlying testing framework was changed).
The YAML exporter changes are what ultimately left me feeling frozen out of the project. Without much fanfare, the YAML exporter was changed, because someone25 felt that the resulting output wasnât âtrueâ or âpureâ YAML. This is true, in a sense, because when I had originally written the exporter, it was designed to output files for Pelican with an assumed Markdown body and YAML front matter for metadata. At the request of the (then) maintainer, I called it the âYAML exporterâ, partly because there was already a âMarkdown exporterâ. I didnât realize it had been broken until I went to upgrade jrnl and render my Pelican site (which I use to search and read my notes) and it had just stopped working26. The change wasnât noted (in a meaningful way) in the release notes, and the version numbers didnât give an indication of when this change had happened30. I eventually figured out where the change had happened, explained the history of the exporter (that again, I had written years earlier) and proposed three solutions, each with a corresponding Pull Request: 1) return the YAML exporter to itâs previous output27, 2) duplicate the old exporter under a new name28, or 3) merge my plugin system, which would allow me to write my own plugin, and solve the problem myself. I was denied on all three, and told that I âdidnât understand the purpose or function of the YAML exporterâ31 (yes, of the plugin Iâd written37). The best I got was that they would reconsider what rose to the level of a âbreaking changeâ when dealing with versioning32.
I Walk Away
The combined experience left me feeling very frustrated: jrnl was broken (to me) and all my efforts to fix it were forably rebuffed.
When I tried to express my frustrations at my inability to get a âworkingâ version of jrnl, I was encouraged to take a mantal health break. While I appreciate the awareness of mental health, stepping away wouldnât be helpful in this particlar case because the nothing would happen to fix my broken tooling (the cause of my frustrations). It seemed like the âright wordsâ(tm) someone had picked up at a workshop, but the same workshop figured that the âright wordsâ(tm) would solve everything without requiring a deeper look or deeper changes.
So I took my leave. Iâve been running an outdated version (v2.6) ever since, and because of the strictness of the Poetry metadata29, I canât run it on anything newer than Python 3.9 (even as Iâve upgraded my âdailyâ Python to 3.11).
I Return (Sort Of); The Future and My Fears
So this marks my return. My âmental health breakâ is over. As I realize I can only change myself (and not others), I will do the work to fix the deeper issues (e.g. broken Pelican exports, lack of âmodernâ Python support) by managing my own fork. And so that is the work Iâll do.
Looking forward, if Iâm the only one that uses my fork, that would be somewhat disappointing, but also fine. After all, I write software, first and foremost, for my own usecase and offer it to others as a gift. On the other hand, if a large part of the community moves here, I worry about being able to shepherd that community any better than the one I am leaving.
I worry too that either due to there being conflict at all, or that all of my writings are publically displayed, others will think less of my work or myself because of the failings they see there. It is indeed very hard to get through a disagreement like this without failing in some degree.
But it seems better to act, than to suffer in silence.
A Thank You
Thank you to all those who have worked to make jrnl as successful as it has been to date.
If youâve gotten this far, thank you for reading this all. I hope you will join me, and I hope your experiences with minchin.jrnl are positive!
The header image was generated locally by Standard Diffusion XL.
-
October 18, 2021 todo item: âfork jrnlâ â©
-
main landing page at jrnl.sh, code at jrnl-orl/jrnl on GitHub, and jrnl on PyPIÂ â©
-
this varies by OS, so run
jrnl --list
to see where yours is stored. ⩠-
Iâve started using Obsidian to take notes on my workstation and on my phone, and find it incredible. The backend format remains Markdown with basically Yaml front matter, but the format is slightly different from Pelican, and exported file layout differs. â©
-
The initial draft of this post was written before the v4 release, when there was talk of changing how the journal files were kept. v4 has since been released, and Iâm unclear if that change ever happened, or what âbreaking changeâ occurred that bumped the version number from 3 to 4 generally. In any case, if they change their format, with the plugin system it becomes fairly trivial to add a format-specific importer. â©
-
also: tiny file size, easy to put under version control, no proprietary format or data lock-in, portability across computing platforms, and generally are human readable â©
-
includes limitations on embedded text formating, storing pictures, videos, or sound recordings, and lacking standardized internal metadata â©
-
Markdown has several variants and many extensions. If youâre starting out, I recommend looking at the CommonMark specification. Note however that Markdown was originally designed as a shortcut for creating HTML documents, and so has no built in features for managing groups of Markdown documents. It is also deliberately limited in the formating options available, while officially supporting raw HTML as a fallback for anything missing. â©
-
ReST is older than Markdown and has always had a full specification. It was originally designed for the Python language documentation, and so was designed from the beginning to deal with the interplay between several text documents. Sadly, it doesnât seem to have been adopted much outside of the Python ecosystem. â©
-
version 2.3.1 license (MIT); version 2.4 license (GPLv3), released April 18, 2020. â©
-
as I detailed at the time. But the issue (#918) announcing the change was merged within 20 minutes of being opened, so Iâm not sure anything I could have said would have changed their minds. â©
-
this can and should be flushed out into a full blog post. But another day. â©
-
and I work on Windows. And I work with Python because Python has had good Windows support. â©
-
jrnl-org/jrnlâs Code of Conduct: the Contributor Code of Conduct. â©
-
I imagine in the issue tracker for the project. â©
-
I think he got a job with or founded a startup, and I suspect he probably moved continents. â©
-
In the intervening time, I ended up releasing personal forks of several Pelican plugins. The process is no longer new or scary, but still can be a fair bit of work. And that experience has given me the confidence to go forward with this fork. â©
-
February 16, 2018Â â©
-
July 5, 2019; my comment, at the time â©
-
my (pending) codename for these releases is â Fluer-de-lis. The reference is to the Lily, a flower that is a symbol of purity and rebirth. â©
-
see Pull Request #1216 and Discussion #1006Â â©
-
in particular, Pelican could no longer find the metadata block and instead rendered the text of each entry as if it was a code block. â©
-
Iâm sure I wrote the code to do this, but canât find the Pull Request at the moment. Maybe I figured the suggestion wouldnât go anyway. â©
-
https://github.com/jrnl-org/jrnl/blob/v2.6/pyproject.toml#L25Â â©
-
perhaps because I was looking for a breaking change rather than a bug fix. â©
-
this comment and this one, in particular. I canât find those exact quoted words, but that was the sentiment I was left with. â©
-
So no make. But Invoke, written in Python, works well for many of makeâs use cases. â©
-
and still today â©
-
Pull Request #110, dated November 27, 2013Â â©
-
but likely recorded in the issue tracker â©
-
Pull Request #258, opened July 30, 2014. â©