skip to navigation
skip to content

Planet Python

Last update: September 27, 2023 09:41 PM UTC

September 27, 2023


Zero to Mastery

Python Monthly Newsletter đŸ’»đŸ

46th issue of Andrei Neagoie's must-read monthly Python Newsletter: Python oddities, cProfile trick, and GIL removal. All this and more. Read the full newsletter to get up-to-date with everything you need to know from last month.

September 27, 2023 07:41 PM UTC


Real Python

Python 3.12 Preview: Static Typing Improvements

Python’s support for static typing gradually improves with each new release of Python. The core features were in place in Python 3.5. Since then, there’ve been many tweaks and improvements to the type hinting system. This evolution continues in Python 3.12, which, in particular, simplifies the typing of generics.

In this tutorial, you’ll:

  • Use type variables in Python to annotate generic classes and functions
  • Explore the new syntax for type hinting type variables
  • Model inheritance with the new @override decorator
  • Annotate **kwargs more precisely with typed dictionaries

This won’t be an introduction to using type hints in Python. If you want to review the background that you’ll need for this tutorial, then have a look at Python Type Checking.

You’ll find many other new features, improvements, and optimizations in Python 3.12. The most relevant ones include the following:

Go ahead and check out what’s new in the changelog for more details on these and other features.

Free Bonus: Click here to download your sample code for a sneak peek at Python 3.12, coming in October 2023.

Recap Type Variable Syntax Before Python 3.12

Type variables have been a part of Python’s static typing system since its introduction in Python 3.5. PEP 484 introduced type hints to the language, and type variables and generics play an important role in that document. In this section, you’ll dive into how you’ve used type variables so far. This’ll give you the necessary background to appreciate the new syntax that you’ll learn about later.

A generic type is a type parametrized by another type. Typical examples include a list of integers and a tuple consisting of a float, a string, and another float. You use square brackets to parametrize generics in Python. You can write the two examples above as list[int] and tuple[float, str, float], respectively.

In addition to using built-in generic types, you can define your own generic classes. In the following example, you implement a generic queue based on deque in the standard library:

# generic_queue.py

from collections import deque
from typing import Generic, TypeVar

T = TypeVar("T")

class Queue(Generic[T]):
    def __init__(self) -> None:
        self.elements: deque[T] = deque()

    def push(self, element: T) -> None:
        self.elements.append(element)

    def pop(self) -> T:
        return self.elements.popleft()

This is a first-in, first-out (FIFO) queue. It represents the kind of lines that you’ll find yourself in at stores, where the first person into the queue is also the first one to leave the queue. Before looking closer at the code, and in particular at the type hints, play a little with the class:

>>>
>>> from generic_queue import Queue

>>> queue = Queue[int]()

>>> queue.push(3)
>>> queue.push(12)
>>> queue.elements
deque([3, 12])

>>> queue.pop()
3

You can use .push() to add elements to the queue and .pop() to remove elements from the queue. Note that when you called the Queue() constructor, you included [int]. This isn’t necessary, but it tells the type checker that you expect the queue to only contain integer elements.

Normally, using square brackets like you did in Queue[int]() isn’t valid Python syntax. You can use square brackets with Queue, however, because you defined Queue as a generic class by inheriting from Generic. How does the rest of your class use this int parameter?

To answer that question, you need to look at T, which is a type variable. A type variable is a special variable that can stand in for any type. However, during type checking, the type of T will be fixed.

In your Queue[int] example, T will be int in all annotations on the class. You could also instantiate Queue[str], where T would represent str everywhere. This would then be a queue with string elements.

If you look back at the source code of Queue, then you’ll see that .pop() returns an object of type T. In your special integer queue, the static type checker will make sure that .pop() returns an integer.

Speaking of static type checkers, how do you actually check the types in your code? Type annotations are mostly ignored during runtime. Instead, you need to install a separate type checker and run it explicitly on your code.

Note: In this tutorial, you’ll use Pyright as your type checker. You can install Pyright from PyPI using pip:

$ python -m pip install pyright

If you’re using Visual Studio Code, then you can use Pyright inside the editor through the Pylance extension. You may need to activate it by setting the Python â€ș Analysis: Type Checking Mode option in your settings.

If you install Pyright, then you can use it to type check your code:

$ pyright generic_queue.py
0 errors, 0 warnings, 0 informations

To see an example of the kinds of errors that Pyright can detect, add the following lines to your generic queue implementation:

Read the full article at https://realpython.com/python312-typing/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 27, 2023 02:00 PM UTC


Python Software Foundation

Python Developers Survey Numbers for 2022!

We are excited to announce the results of the sixth official annual Python Developers Survey. This work is done each year as a collaborative effort between the Python Software Foundation and JetBrains. Late last year, more than 23,000 Python developers and enthusiasts from almost 200 countries/regions participated in the survey to reveal the current state of the language and the ecosystem around it. (Spoiler alert: Many people are using Python, and 51% are using it for both work AND personal projects.)

https://lp.jetbrains.com/python-developers-survey-2022/

We know the whole Python community finds this work useful. From Luciana Abud, product manager for Visual Studio Code, “Our teams at Microsoft are truly grateful to the Python Software Foundation and JetBrains for orchestrating the Python Developers Survey! The insights we gain allows us to take a data-driven approach to help with prioritizing feature development, addressing pain points, enhancing usability and anticipating future needs. This survey is invaluable in shaping our approach and continuously improving the Python development experience within the VS Code ecosystem!” 

We’d love to hear how you use these numbers, so please share your thoughts on social media, mentioning @jetbrains and @ThePSF with the #pythondevsurvey hashtag. We are also open to any suggestions and feedback related to this survey which could help us run an even better one next time.

September 27, 2023 11:44 AM UTC


PyCharm

PyCharm 2023.3 Early Access Program Is Open!

UI/UX Enhancements, Support for PEP 647, and More

The Early Access Program for PyCharm 2023.3 kicks off today, offering you a sneak peek of the exciting new features and improvements we expect to include in the next major release.

If you’re not familiar with how the EAP works, please read this blog post for an introduction to the program and an explanation of why your participation is invaluable.

We invite you to join us over the next few weeks, take a closer look at the latest additions to PyCharm, and share your feedback on the new features.

You can download the build from our website, get it from the free Toolbox App, or update to it using snaps if you’re an Ubuntu user.

Download PyCharm 2023.3 EAP

Read on to explore the new features and enhancements that you can test in this version.

User experience

Option to hide the main toolbar in the default viewing mode

In response to your feedback about the new UI, we’ve implemented an option to hide the main toolbar when using the IDE’s default viewing mode, just like in the old UI. To declutter your workspace and remove the toolbar, select View | Appearance and uncheck the Toolbar option.

Option to hide main toolbar

Default tool window layout option 

With the release of PyCharm 2023.1, we introduced the ability to save multiple tool window layouts and switch between them, enhancing the customizability of your workspace. In the first PyCharm 2023.3 EAP build, we’re expanding this functionality by introducing the Default layout option, which provides a quick way to revert your workspace’s appearance to its default state. This layout is not customizable and can be accessed through Window | Layouts.

Default tool window layout

New product icon for macOS

With the launch of the PyCharm 2023.3 EAP, we have redesigned the PyCharm icon for macOS to align it with the standard style guidelines of the operating system.

Django REST Framework

Support for viewset

PyCharm 2023.3 will help you define endpoints when working with the Django REST Framework. The IDE will support code completion, navigation, and rename refactoring for the methods used in the viewsets.

Try this feature and share your feedback with us!

Editor

Support for type guards [PEP 647]

PyCharm 2023.3 will support PEP 647. PEP 647 introduced a way to treat custom functions as “type guards”, which, when used in a conditional statement, leads to the narrowing of their argument types. Think of the built-in functions isinstance and issubclass, which our type-checker already recognizes. Now, the user-defined function returning typing.TypeGuard has the same effect on type inference in PyCharm.

Move code elements in the Python files

In PyCharm 2023.3, you can move code elements left or right in the Python files with Option + Shift + Cmd + Left/Right on macOS (Alt + Shift + Ctrl + Left/Right on Windows/Linux).

Move code elements in the Python files

Python Console

Option to switch between single and double quotes when copying string values from the Variable View

There is a new option to put double quotes instead of single quotes around a string value copied from the Variable View in the Python or Debug Console.

To switch between single and double quotes, go to the Other option (three vertical dots icon) in the main Debug menu bar, choose Debugger Settings | Variable Quoting Policy and pick your preferred option.

Variable Quoting Policy

Navigate between the commands in the Python Console

In PyCharm 2023.3, you can navigate between multi-line commands in the Python Console using Cmd + Up / Cmd + Down shortcuts on macOS (Ctrl + Up / Ctrl + Down on Windows / Linux). When you move to the previously executed command, a caret is set to the end of the first line. When you get to the most recently executed multi-line command from your history, a caret is set to the end of the command.

Navigate between commands in the Console

Static code completion in the Python Console

In PyCharm 2023.2, we added an option to use static code completion in the Python Console. In PyCharm 2023.3, it will be enabled by default. If you would like to switch to runtime code completion, go to Settings | Build, Execution, Deployment | Console and choose the option in the Code completion drop-down menu.

Notable bug fix: execute code with root privileges via sudo

We fixed a regression that prevented users from executing code via an SSH connection in PyCharm with root privileges via sudo. [PY-52690]

These are the most important updates for this week. For the full list of changes in this EAP build, read the release notes.

We’re dedicated to giving you the best possible experience, and your feedback is vital. If you find any bugs, please report them via our issue tracker. And if you have any questions or comments, feel free to share them in the comments below or get in touch with us on X (formerly Twitter).

September 27, 2023 09:54 AM UTC


Anarcat

How big is Debian?

Now this was quite a tease! For those who haven't seen it, I encourage you to check it out, it has a nice photo of a Debian t-shirt I did not know about, to quote the Fine Article:

Today, when going through a box of old T-shirts, I found the shirt I was looking for to bring to the occasion: [...]

For the benefit of people who read this using a non-image-displaying browser or RSS client, they are respectively:

   10 years
  100 countries
 1000 maintainers
10000 packages

and

        1 project
       10 architectures
      100 countries
     1000 maintainers
    10000 packages
   100000 bugs fixed
  1000000 installations
 10000000 users
100000000 lines of code

20 years ago we celebrated eating grilled meat at J0rd1’s house. This year, we had vegan tostadas in the menu. And maybe we are no longer that young, but we are still very proud and happy of our project!

Now
 How would numbers line up today for Debian, 20 years later? Have we managed to get the “bugs fixed” line increase by a factor of 10? Quite probably, the lines of code we also have, and I can only guess the number of users and installations, which was already just a wild guess back then, might have multiplied by over 10, at least if we count indirect users and installs as well


Now I don't know about you, but I really expected someone to come up with an answer to this, directly on Debian Planet! I have patiently waited for such an answer but enough is enough, I'm a Debian member, surely I can cull all of this together. So, low and behold, here are the actual numbers from 2023!

So it doesn't line up as nicely, but it looks something like this:

         1 project
        10 architectures
        30 years
       100 countries (actually 63, but we'd like to have yours!)
      1000 maintainers (yep, still there!)
     35000 packages
    211000 *binary* packages
   1000000 bugs fixed
1000000000 lines of code
 uncounted installations and users, we don't track you

So maybe the the more accurate, rounding to the nearest logarithm, would look something like:

         1 project
        10 architectures
       100 countries (actually 63, but we'd like to have yours!)
      1000 maintainers (yep, still there!)
    100000 packages
   1000000 bugs fixed
1000000000 lines of code
 uncounted installations and users, we don't track you

I really like how the "packages" and "bugs fixed" still have an order of magnitude between them there, but that the "bugs fixed" vs "lines of code" have an extra order of magnitude, that is we have fixed ten times less bugs per line of code since we last did this count, 20 years ago.

Also, I am tempted to put 100 years in there, but that would be rounding up too much. Let's give it another 30 years first.

Hopefully, some real scientist is going to balk at this crude methodology and come up with some more interesting numbers for the next t-shirt. Otherwise I'm available for bar mitzvahs and children parties.

September 27, 2023 02:23 AM UTC

September 26, 2023


PyCoder’s Weekly

Issue #596 (Sept. 26, 2023)

#596 – SEPTEMBER 26, 2023
View in Browser »

The PyCoder’s Weekly Logo


Design and Guidance: Object-Oriented Programming in Python

In this video course, you’ll learn about the SOLID principles, which are five well-established standards for improving your object-oriented design in Python. By applying these principles, you can create object-oriented code that is more maintainable, extensible, scalable, and testable.
REAL PYTHON course

Learning About Code Metrics in Python With Radon

Radon is a code metrics tool. This article introduces you to it and teaches you how you can improve your code based on its measurements.
MIKE DRISCOLL

You Write Great Python Code but How Do You Know it’s Secure Code

alt

If you’re not a security expert, consider Semgrep. Trusted by Slack, Gitlab, Snowflake, and thousands of engineers, it acts like a security spellchecker for your code. Simply point Semgrep to your code; it identifies vulnerabilities and even checks code dependencies and help you ship secure code →
SEMGREP sponsor

Speeding Up Your Code When Multiple Cores Aren’t an Option

Parallelism isn’t the only answer: often you can optimize low-level code to get significant performance improvements.
ITAMAR TURNER-TRAURING

Django 5.0 Alpha 1 Released

DJANGO SOFTWARE FOUNDATION

Python 3.12.0 Release Candidate 3 Available

CPYTHON DEV BLOG

Articles & Tutorials

How to Catch Multiple Exceptions in Python

In this how-to tutorial, you’ll learn different ways of catching multiple Python exceptions. You’ll review the standard way of using a tuple in the except clause, but also expand your knowledge by exploring some other techniques, such as suppressing exceptions and using exception groups.
REAL PYTHON

78% MNIST Accuracy Using GZIP in Under 10 Lines of Code

MNIST is a collection of hand-written digits that is commonly used to play with classification algorithms. It turns out that some compression mechanisms can double as classification tools. This article covers a bit of why with the added code-golf goal of a small amount of code.
JAKOBS.DEV

Don’t Get Caught by IDOR Vulnerabilities

alt

Are insecure direct object reference (IDOR) threatening your applications? Learn about the types of IDOR vulnerabilities in your Python applications, how to recognize their patterns, and protect your system with Snyk →
SNYK.IO sponsor

Bypassing the GIL for Parallel Processing in Python

In this tutorial, you’ll take a deep dive into parallel processing in Python. You’ll learn about a few traditional and several novel ways of sidestepping the global interpreter lock (GIL) to achieve genuine shared-memory parallelism of your CPU-bound tasks.
REAL PYTHON

Creating a Great Python DevX

This article talks about the different tools you commonly come across as part of the Python development experience. It gives an overview of black, nox, ruff, Mypy, and more, covering why you should use them when you code your own projects.
SCOTT HOUSEMAN

Why Are There So Many Python Dataframes?

Ever wonder why there are so many ways libraries that have Dataframes in Python? This article talks about the different perspectives of the popular toolkits and why they are what they are.
MAHESH VASHISHTHA

The Protocol Class

typing.Protocol enables type checking in a Java-esque interface like mechanism. Using it, you can declare that a duck-typed class conform to a specific protocol. Read on for details.
PEPIJN BAKKER

What Does if __name__ == "__main__" Mean in Python?

In this video course, you’ll learn all about Python’s name-main idiom. You’ll learn what it does in Python, how it works, when to use it, when to avoid it, and how to refer to it.
REAL PYTHON course

Why & How Python Uses Bloom Filters in String Processing

Dive into Python’s clever use of Bloom filters in string APIs for speedier performance. Find out how CPython’s unique implementation makes it more efficient.
ABHINAV UPADHYAY

Simulate the Monty Hall Problem in Python

Write a Python simulation to solve this classic probability puzzle that has stumped mathematicians and Nobel Prize winners!
DATASCHOOL.IO ‱ Shared by Kevin Markham

Death by a Thousand Microservices

The software industry is learning once again that complexity kills and trending back towards monoliths and larger services.
ANDREI TARANCHENKO

How to Test Jupyter Notebooks With Pytest and Nbmake

Tutorial on how to use the pytest plugin nbmake to automate end-to-end testing of notebooks.
SEMAPHORECI.COM ‱ Shared by Larisa Ioana

Projects & Code

panther: Web Framework for Building Async APIs

GITHUB.COM/ALIRN76

Clientele: Loveable Python API Clients From OpenAPI Schemas

GITHUB.COM/PHALT ‱ Shared by Paul Hallett

mpire: Easy, but Faster Multiprocessing

GITHUB.COM/SYBRENJANSEN

leaky_ledger: A Fake Bank to Practice Finding Vulnerabilities

GITHUB.COM/ZCHTODD

reader: A Python Feed Reader Library

GITHUB.COM/LEMON24 ‱ Shared by Adrian

Events

Weekly Real Python Office Hours Q&A (Virtual)

September 27, 2023
REALPYTHON.COM

SPb Python Drinkup

September 28, 2023
MEETUP.COM

PyCon India 2023

September 29 to October 3, 2023
PYCON.ORG

PythOnRio Meetup

September 30, 2023
PYTHON.ORG.BR

PyConZA 2023

October 5 to October 7, 2023
PYCON.ORG

PyCon ES Canarias 2023

October 6 to October 9, 2023
PYCON.ORG

Django Day Copenhagen 2023

October 6 to October 7, 2023
DJANGODAY.DK

DjangoCongress JP 2023

October 7 to October 8, 2023
DJANGOCONGRESS.JP


Happy Pythoning!
This was PyCoder’s Weekly Issue #596.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

September 26, 2023 07:30 PM UTC


TechBeamers Python

Python String Splitting: split(), rsplit(), regex

String manipulation is a fundamental skill in Python, and understanding how to split strings is a crucial aspect of it. In this comprehensive guide, we’ll explore various methods and techniques for splitting strings, including the split() and rsplit() functions, regular expressions (regex), and advanced splitting techniques. By the end of this tutorial, you’ll have a [...]

The post Python String Splitting: split(), rsplit(), regex appeared first on TechBeamers.

September 26, 2023 05:46 PM UTC


Real Python

Python Basics Exercises: Conditional Logic and Control Flow

In Python Basics: Conditional Logic and Control Flow, you learned that much of the Python code you’ll write is unconditional. That is, the code doesn’t make any choices. Every line of code is executed in the order that it’s written or in the order that functions are called, with possible repetitions inside loops.

In this course, you’ll revisit how to use conditional logic to write programs that perform different actions based on different conditions. Paired with functions and loops, conditional logic allows you to write complex programs that can handle many different situations.

In this video course, you’ll use the following:

Ultimately, you’ll bring all of your knowledge together to build a text-based role-playing game. Along the way, you’ll also get some insight into how to tackle coding challenges in general, which can be a great way to level up as a developer.

This video course is part of the Python Basics series, which accompanies Python Basics: A Practical Introduction to Python 3. You can also check out the other Python Basics courses.

Note that you’ll be using IDLE to interact with Python throughout this course.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 26, 2023 02:00 PM UTC


Python Bytes

#354 Python 3.12 is Coming!

<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/ptmcg/logmerger"><strong>logmerger</strong></a></li> <li><a href="https://mastodon.social/@hugovk/111091573987175428"><strong>The third and final Python 3.12 RC is out now</strong></a></li> <li><a href="https://jamesg.blog/2023/08/26/python-dictionary-dispatch/"><strong>The Python dictionary dispatch pattern</strong></a></li> <li><a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9"><strong>Visualizing the CPython Release Process</strong></a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=1zf29sQVGow' style='font-weight: bold;'>Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by us! Support our work through:</p> <ul> <li>Our <a href="https://training.talkpython.fm/"><strong>courses at Talk Python Training</strong></a></li> <li><a href="https://pythonpeople.fm"><strong>Python People</strong></a> Podcast</li> <li><a href="https://www.patreon.com/pythonbytes"><strong>Patreon Supporters</strong></a></li> </ul> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.</p> <p><strong>Brian #1:</strong> <a href="https://github.com/ptmcg/logmerger"><strong>logmerger</strong></a> </p> <ul> <li>Paul McGuire</li> <li><code>logmerger</code> is a TUI for viewing a merged display of multiple log files, merged by timestamp.</li> <li>Built on textual</li> <li>Awesome flags: <ul> <li><code>--output -</code> <br /> <ul> <li>to send the merged logs to stdout</li> </ul></li> <li><code>--start START</code> and <code>--end END</code> <ul> <li>start and end time to select time window for merging logs</li> </ul></li> </ul></li> <li>Caveats: <ul> <li>new. no pip install yet. so clone the code or download</li> <li>perhaps I jumped the gun on covering this, but it’s cool</li> </ul></li> </ul> <p><strong>Michael #2:</strong> <a href="https://mastodon.social/@hugovk/111091573987175428"><strong>The third and final Python 3.12 RC is out now</strong></a></p> <ul> <li>Get your final bugs fixed before the full release</li> <li><strong>Call to action</strong>: We strongly encourage maintainers of third-party Python projects to prepare their projects for 3.12 compatibilities during this phase</li> <li><a href="https://dev.to/hugovk/help-test-python-312-beta-1508">How to test</a>.</li> <li><a href="https://discuss.python.org/t/python-3-12-0rc3-released-honestly-the-final-release-candidate-i-swear/34093?u=hugovk">Discussion on the issue</a>.</li> <li>Count down until October 2nd, 2023.</li> </ul> <p><strong>Brian #3:</strong> <a href="https://jamesg.blog/2023/08/26/python-dictionary-dispatch/"><strong>The Python dictionary dispatch pattern</strong></a></p> <ul> <li>I kinda love (and hate) jump tables in C</li> <li>We don’t talk about dictionary dispatch much in Python, so this is nice, if not dangerous.</li> <li>Short story: you can store lambdas or functions in dictionaries, then look them up and call them at the same time.</li> <li>Also, I gotta shout out to the first blogroll I’ve seen in a very long time. <ul> <li>Should we bring back blogrolls?</li> </ul></li> </ul> <p><strong>Michael #4:</strong> <a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9"><strong>Visualizing the CPython Release Process</strong></a></p> <ul> <li>by Seth Larson</li> <li>Here’s the deal (you should <a href="https://sethmlarson.dev/security-developer-in-residence-weekly-report-9">see the image in the article</a> 😉 ) <ol> <li>Freeze the <a href="https://github.com/python/cpython">python/cpython</a> release branch. This is done using GitHub Branch Protections.</li> <li>Update the Release Manager's fork of <a href="https://github.com/python/cpython">python/cpython</a>.</li> <li>Run Python release tools (release-tool, blurb, sphinx, etc).</li> <li>Push diffs and signed tag to Release Manager's fork.</li> <li>Git tag is made available to experts for Windows and macOS binary installers.</li> <li>Source tarballs, Windows, and macOS binary installers built and tested concurrently. <ul> <li>6a: Release manager builds the <code>tgz</code> and <code>tar.xz</code> source files for the Python release. This includes building the updates documentation.</li> <li>6b: Windows expert starts the <a href="https://github.com/python/release-tools/tree/master/windows-release">Azure Pipelines</a> configured to build Python.</li> <li>6c: macOS Expert builds the macOS installers.</li> </ul></li> <li>All artifacts (source and binary) are tested on their platforms.</li> <li>Release manager signs all artifacts using Sigstore and GPG.</li> <li>All artifacts are made available on python.org.</li> <li>After artifacts are published to python.org, the git commit and tag from the Release Manager's fork is pushed to the release branch.</li> </ol></li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://testandcode.teachable.com/p/the-complete-pytest-course"><strong>The Complete pytest Course</strong></a>, <a href="https://testandcode.teachable.com/p/pytest-working-with-projectshttps://testandcode.teachable.com/p/pytest-working-with-projects"><strong>part 2, Ch 7 Testing Strategy</strong></a> went up this weekend. <ul> <li>Only 9 more chapters to go</li> </ul></li> <li><a href="https://testandcode.com/207"><strong>“Test &amp; Code” → “Python Test”</strong></a> <ul> <li>Full version: “The Python Test Podcast” → “Test &amp; Code” → “Python Test”</li> <li>Also: “Python (Bytes | People | Test)” </li> </ul></li> </ul> <p>Michael:</p> <ul> <li>If you’re at <a href="https://pybay.com">PyBay</a>, come say “hi”</li> <li><a href="https://www.youtube.com/playlist?list=PL8uoeex94UhFcwvAfWHybD7SfNgIUBRo-">EuroPython 2023 Videos up</a></li> <li><a href="https://training.talkpython.fm/courses/htmx-django-modern-python-web-apps-hold-the-javascript">Django + HTMX</a> has a few days of early-bird discount left</li> </ul> <p><strong>Joke:</strong> <a href="https://www.reddit.com/r/programminghumor/comments/15tnhhs/so_true/"><strong>Are you sleeping?</strong></a></p>

September 26, 2023 08:00 AM UTC


Test and Code

207: Welcome to "Python Test", pytest course, pytest-repeat and pytest-flakefinder

<ul> <li>Podcast name: "Test &amp; Code" -&gt; "Python Test"</li> <li><a href="https://pythonbytes.fm">Python Bytes Podcast</a></li> <li><a href="https://pythonpeople.fm/">Python People Podcast</a></li> <li> <a href="https://testandcode.com">Python Test Podcast</a> &lt;- you are here<ul><li>which is still, at least for now, at <a href="https://testandcode.com">testandcode.com</a> </li></ul> </li> <li>New course: <a href="https://pythontest.com/courses">"The Complete pytest Course"</a> </li> <li> <a href="https://github.com/pytest-dev/pytest-repeat">pytest-repeat</a>, which I'm starting to contribute to<ul><li>Give `--repeat-scope` a try. You can use it to change from repeating every test to repeating the session, module, or class.</li></ul> </li> <li> <a href="https://github.com/dropbox/pytest-flakefinder">pytest-flakefinder</a>, which is an alternative to pytest-repeat</li> <li> <a href="https://github.com/okken/pytest-check">pytest-check</a> is completely unrelated, but mentioned in the show</li> </ul>

September 26, 2023 01:37 AM UTC


Seth Michael Larson

Starting on Software Bill-of-Materials (SBOM) for CPython

Starting on Software Bill-of-Materials (SBOM) for CPython

Starting on Software Bill-of-Materials (SBOM) for CPython

Published 2023-09-26 by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the OpenSSF Alpha-Omega Project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

I've started dipping my toes into creating an authoritative SBOM for the CPython project, you can follow along in this GitHub repository if you are interested. This project is very early and this will not be the final product or place where this information is published, this is only a place to experiment and get feedback on the approach and outputs before putting the final infrastructure in place.

I started with the most straightforward release artifact, the source tarball, and I am planning to tackle the binary installers later since they'll require more research into the release processes. There is a work-in-progress SBOM file for Python-3.11.5.tgz available in the sboms/ directory on the repository.

I've also included an SBOM for CPython 3.11.0 which can be used to see whether vulnerability scanning tools are capable of consuming the result SBOM and flagging subcomponents for vulnerabilities. I used Grype as an example for this, and indeed it was able to consume the SBOM and flag the known vulnerabilities:

$ grype sbom:sboms/Python-3.11.0.tgz.spdx.json

 ✔ Vulnerability DB                [updated]  
 ✔ Scanned for vulnerabilities     [9 vulnerability matches]  
   ├── by severity: 0 critical, 6 high, 3 medium, 0 low, 0 negligible
   └── by status:   0 fixed, 9 not-fixed, 0 ignored

NAME     INSTALLED  FIXED-IN  TYPE  VULNERABILITY   SEVERITY 
CPython  3.11.0                     CVE-2023-41105  High      
CPython  3.11.0                     CVE-2023-36632  High      
CPython  3.11.0                     CVE-2023-24329  High      
CPython  3.11.0                     CVE-2022-45061  High      
CPython  3.11.0                     CVE-2023-40217  Medium    
CPython  3.11.0                     CVE-2023-27043  Medium    
CPython  3.11.0                     CVE-2007-4559   Medium    
expat    2.4.7                      CVE-2022-43680  High      
expat    2.4.7                      CVE-2022-40674  High

The tool was able to see not only vulnerabilities in CPython but also in the expat subcomponent. Without an SBOM the expat subcomponent wouldn't be detected by current versions of Grype. Running Grype on the CPython 3.11.5 SBOM results in zero known vulnerabilities. đŸ„ł

$ grype sbom:sboms/Python-3.11.5.tgz.spdx.json 

 ✔ Vulnerability DB                [no update available]  
 ✔ Scanned for vulnerabilities     [0 vulnerability matches]  
   ├── by severity: 0 critical, 0 high, 0 medium, 0 low, 0 negligible
   └── by status:   0 fixed, 0 not-fixed, 0 ignored 

No vulnerabilities found

Sigstore signatures for CPython

Now all CPython releases that have Sigstore verification materials have "bundles" (ie .sigstore files) instead of the "disjoint verification materials" (ie .crt and .sig files). These new bundles have been back-filled from existing verification materials using new VerificationMaterials.to_bundle() method in the Python Sigstore client. Thanks to Ɓukasz Langa for verifying the new bundles and publishing them to python.org.

Now that all releases have bundles available, I've also updated the Sigstore verification instructions on python.org to only reference bundles:

$ python -m sigstore verify identity \
  --bundle Python-3.11.0.tgz.sigstore \
  --cert-identity pablogsal@python.org \
  --cert-oidc-issuer https://accounts.google.com \
  Python-3.11.0.tgz

Having bundles means one less file to download to verify a signature and that verification doesn't need to query the transparency log, instead relying on the entry embedded within the bundle.

Truststore support coming for Conda!

Conda has merged the pull request to add Truststore support to Conda which is slated for v23.9.0. This required creating a top-level feedstock for Truststore.

pip has merged the pull request to bundle Truststore into pip, so it's no longer required to "bootstrap" Truststore in order to have support for using system certificates. This feature will be coming in pip v23.3.

Python Security Response Team (PSRT) using GitHub Security Advisories

I spent some time developing a small GitHub App that would add the PSRT GitHub team to all newly created GitHub Security Advisories and have something that works in-theory.

Unfortunately, there's currently no way to get webhook events for the creation of draft GitHub Security Advisories, you can only get a webhook for when security reports are filed. This means that anyone with access to GitHub Security Advisories (ie organization or repository admins) wouldn't trigger the GitHub App action to add the PSRT team.

Security Developer-in-Residence 2023 Q3 update for PSF blog

Since I've just passed 3 months in this role (time sure does fly!) I am drafting a summarized update for my work in 2023 Q3 that will be published to the Python Software Foundation blog. Subscribe to the blog via RSS or other social media platform to get notified.

That's all for this week! 👋 If you're interested in more you can read last week's report.

Wow, you made it to the end!

If you're like me, you don't believe social media should be the way to get updates on the cool stuff your friends are up to. Instead, you should either follow my blog with the RSS reader of your choice or via my email newsletter for guaranteed article publication notifications.

If you really enjoyed a piece I would be grateful if you shared with a friend. If you have follow-up thoughts you can send them via email.

Thanks for reading!
— Seth

September 26, 2023 12:00 AM UTC

September 25, 2023


PyBites

Meet Will Raphaelson: From Script to Production Flow With Prefect & Marvin AI

This week Robin Beer – one of our coaches – interviews Will Raphaelson, Principal Product Manager at Prefect. 😍

They talk about his use of Python, Prefect as a tool and its philosophy, open source + business and Marvin AI. 🐍 đŸ’Ș

And of course they share cool wins and books they are reading. 💡

All in all an insightful chat that hopefully will leave you inspired to go check out these cool new Python tools … 😎

Listen here:

Or watch on YouTube:

Chapters:
00:00 Intro snippet
00:11 Intro music
00:31 Introduction guests + topics
01:32 Welcome Will, do you have a win of the week?
04:12 Go to meet ups
04:37 How do you leverage Python as a product manager?
07:12 Python as a quick prototyping language
08:14 What is Prefect and its philosophy?
10:56 Robin’s experience with Prefect
12:26 How has Prefect evolved over time?
15:54 Orchestrators and observability
18:02 A practical example of an orchestrated flow
21:21 How Prefect handles failures in a flow?
23:24 Open source and business, how to combine them?
27:45 Tips for starting a open source business?
31:05 Rationale vs emotion in making product decisions
34:12 Marvin AI and its relation with Prefect
38:01 Marvin AI is a nice way to start with Python
40:16 Recommended books
43:02 Wrap up
43:55 Outro music

Connect with Will on LinkedIn.

Prefect product links:
– Prefect
– Marvin AI

Mentioned article:
– 28 Dags Later by Stephen Bailey

Books mentioned:
– The Precipice
– Fundamentals of Data Engineering by Joe Reis

September 25, 2023 05:00 PM UTC


Real Python

Python 3.12 Preview: Subinterpreters

With the upcoming release of Python 3.12 this fall and Python 3.13 following a year later, you might have heard about how Python subinterpreters are changing. The upcoming changes will first give extension module developers better control of the GIL and parallelism, potentially speeding up your programs.

The following release may take this even further and allow you to use subinterpreters from Python directly, making it easier for you to incorporate them into your programs!

In this tutorial, you’ll:

  • Get a high-level view of what Python subinterpreters are
  • Learn how changes to CPython’s global state in Python 3.12 may change things for you
  • Get a glimpse of what changes might be coming for subinterpreters in Python 3.13

To get the most out of this tutorial, you should be familiar with the basics of Python, as well as with the global interpreter lock (GIL) and concurrency. You’ll encounter some C code, but only a little.

You’ll find many other new features, improvements, and optimizations in Python 3.12. The most relevant ones include the following:

Go ahead and check out what’s new in the changelog for more details on these and other features. It’s definitely worth your time to explore what’s coming!

Free Bonus: Click here to download your sample code for a sneak peek at Python 3.12, coming in October 2023.

What Are Python Subinterpreters?

Before you start thinking about subinterpreters, recall that an interpreter is a program that executes a script directly in a high-level language instead of translating it to machine code. In the case of most Python users, CPython is the interpreter you’re running. A subinterpreter is a copy of the full CPython interpreter that can run independently from the main interpreter that started alongside your program.

Note: The terms interpreter and subinterpreter get mixed together fairly commonly. For the purposes of this tutorial, you can view the main interpreter as the one that runs when your program starts. All other interpreters that start after that point are considered subinterpreters. Other than a few minor details, subinterpreters are the same type of object as the main interpreter.

Most of the state of the subinterpreter is separate from the main interpreter. This includes elements like the global scope name table and the modules that get imported. However, this doesn’t include some of the items that the operating system provides to the process, like file handles and memory.

This is different from threading or other forms of concurrency in that threads can share the same context and global state while allowing a separate flow of execution. For example, if you start a new thread, then it still has the same global scope name table.

A subinterpreter, however, can be described as a collection of cooperating threads that have some shared state. These threads will have the same set of imports, independent of other subinterpreters. Spawning new threads in a subinterpreter adds new threads to this collection, which won’t be visible from other interpreters.

Some of the upcoming changes that you’ll see will also allow subinterpreters to improve parallelism in Python programs.

Subinterpreters have been a part of the Python language since version 1.5, but they’ve only been available as part of the C-API, not from Python. But there are large changes coming that will make them more useful and interesting for everyday Python users.

What’s Changing in Python 3.12 (PEP 684)?

Now that you know what a Python subinterpreter is, you’ll take a look at what’s changing in the upcoming releases of CPython.

Most of the subinterpreter changes are described in two proposals, PEP 684 and PEP 554. Only PEP 684 will make it into the 3.12 release. PEP 554 is scheduled for the 3.13 release but hasn’t been officially approved yet.

Changes to the Global State and the GIL

The main focus of PEP 684 is refactoring the internals of the CPython source code so that each subinterpreter can have its own global interpreter lock (GIL). The GIL is a lock, or mutex, which allows only one thread to have control of the Python interpreter. Until this PEP, there was a single GIL for all subinterpreters, which meant that no matter how many subinterpreters you created, only one could run at a single time.

Moving the GIL so that each subinterpreter has a separate lock is a great idea. So, why hasn’t it been done already? The issue is that the GIL is preventing multiple threads from accessing some of the global state of CPython simultaneously, so it’s protecting your program from bugs that race conditions could cause.

The core developers needed to move much of the previously global state into per-interpreter storage, meaning each interpreter has its own independent version:

Diagram showing global state surrounds two interpreters, each with it's own per-interpreter state and set of threads.

Read the full article at https://realpython.com/python312-subinterpreters/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 25, 2023 02:00 PM UTC


Mike Driscoll

PyDev of the Week: Claudia Ng

This week we welcome Claudia Ng as the PyDev of the Week! Claudia is an author / contributor at Real Python! If you’d like to see what else Claudia has been up to, you should check out her personal website.

Let’s spend a few moments getting to know Claudia better!

Can you tell us a little about yourself (hobbies, education, etc):

I’m a data scientist and I’ve spent the past five years working in fraud and credit risk in the fintech (financial technology) space. I have a Masters in Public Policy from Harvard University and a Bachelor’s in International Business (Finance) with a minor in Spanish from Northeastern University.

In 2018, I was working at a Fintech called Tala, where I managed the new customer portfolio for their Mexico market. It was an incredible journey where we scaled the customer base by over 500x in only two years! Through this process I saw the power of automating lending decisions enabled by machine learning. I was fascinated by how alternative data could be used to predict customer’s repayment behaviors and fraud risk, unlocking the ability to lend to individuals with no or little credit history.

I’m an impact-driven person and seeing the power of applied ML inspired me to set my mind on pivoting into data science by taking on ML-related projects at work, doing online courses and side projects, and eventually moving onto the data science team.

I love what I do and outside of work, my hobbies include all kinds of water sports, bouldering and sudoku.

Why did you start using Python?

I first started using Python in 2019. I was initially using R for analyses since I had learned to use it in grad school, but the Data Science team used Python, so I started learning and picking it up. I found it to be more robust and there are many good third-party packages to support my work. Python is definitely my preferred language now!

What other programming languages do you know and which is your favorite?

I use Python and SQL daily on the job. I am a huge language nerd and can speak 9 human languages if that counts.

What projects are you working on now?

I am working on my second tutorial for Real Python on type hints for multiple return types in Python. Stay tuned for more when it comes out!

Which Python libraries are your favorite (core or 3rd party)?

I’m a Data Scientist, so I love pretty graphs and visuals. It is a crucial element to being able to tell a good data story and help with better decision-making. I would say that my favorite Python library is plotly. It’s a library for making interactive plots, and I love how versatile it is.

How did you get started writing articles for Real Python?

When I pivoted from an analyst role into data science back in 2019, I started writing because I wanted to share my learnings and hopefully inspire others without a STEM degree to break into data science/ engineering. I was writing blog posts on medium for several publications including Towards Data Science, Towards AI and Analytics Vidhya about different topics related to machine learning, feature engineering and data visualizations.

In early 2023, I saw that Real Python was looking for technical writers and applied. I was a subscriber and learned so much about programming from Real Python’s tutorials and courses, it feels like a dream to be writing for this publication!

What excites you most in the data science world right now?

I am excited about the rise of autoML packages that can automate some of the more tedious parts of ML modeling, like data cleaning, model selection and hyperparameter optimization. This would cut down the time spent during the model development cycle, allowing data scientists to iterate faster.

Is there anything else you’d like to say?

If you would like to check out my work, please visit ds-claudia.com to see past blog posts. You can also subscribe for free to receive emails when I publish new blog posts – no spam I promise!

Thanks so much for doing the interview, Claudia!

The post PyDev of the Week: Claudia Ng appeared first on Mouse Vs Python.

September 25, 2023 12:34 PM UTC


Erik Marsja

Seaborn Confusion Matrix: How to Plot and Visualize in Python

The post Seaborn Confusion Matrix: How to Plot and Visualize in Python appeared first on Erik Marsja.

In this Python tutorial, we will learn how to plot a confusion matrix using Seaborn. Confusion matrices are a fundamental tool in data science and hearing science. They provide a clear and concise way to evaluate the performance of classification models. In this post, we will explore how to plot confusion matrices in Python.

In data science, confusion matrices are commonly used to assess the accuracy of machine learning models. They allow us to understand how well our model correctly classifies different categories. For example, a confusion matrix can help us determine how many emails were correctly classified as spam in a spam email classification model.

How to Plot a Confusion Matrix with Seaborn

In hearing science, confusion matrices are used to evaluate the performance of hearing tests. These tests involve presenting different sounds to individuals and assessing their ability to identify them correctly. A confusion matrix can provide valuable insights into the accuracy of these tests and help researchers make improvements.

plot of confusion matrix created with SeabornExample plot.

Understanding how to interpret and visualize confusion matrices is essential for anyone working with classification models or conducting hearing tests. In the following sections, we will dive deeper into plotting and interpreting confusion matrices using the Seaborn library in Python.

Using Seaborn, a powerful data visualization library in Python, we can create visually appealing and informative confusion matrices. We will learn how to prepare the data, create the matrix, and interpret the results. Whether you are a data scientist or a hearing researcher, this guide will equip you with the skills to analyze and visualize confusion matrices using Seaborn effectively. So, let us get started!

Table of Contents

Outline

The structure of the post is as follows. First, we will begin by discussing prerequisites to ensure you have the necessary knowledge and tools for understanding and working with confusion matrices.

Following that, we will delve into the concept of the confusion matrix, highlighting its significance in evaluating classification model performance. In the “Visualizing a Confusion Matrix” section, we will explore various methods for representing this critical analysis tool, shedding light on the visual aspects.

The heart of the post lies in “How to Plot a Confusion Matrix in Python,” where we will guide you through the process step by step. This is where we will focus on preparing the data for the analysis. Under “Creating a Seaborn Confusion Matrix,” we will outline four key steps, from importing the necessary libraries to plotting the matrix with Seaborn, ensuring a comprehensive understanding of the entire process.

Once the confusion matrix is generated, “Interpreting the Confusion Matrix” will guide you in extracting valuable insights, allowing you to make informed decisions based on model performance.

Before concluding the post, we also look at how to modify the confusion matrix we created using Seaborn. For instance, we explore techniques to enhance the visualization, such as adding percentages instead of raw values to the plot. This additional step provides a deeper understanding of model performance and helps you communicate results more effectively in data science applications.

Prerequisites

Before we explore how to create confusion matrices with Seaborn, there are essential prerequisites to consider. First, a foundational understanding of Python is required. Proficiency in Python and a grasp of programming concepts is needed. If you are new to Python, familiarize yourself with its syntax and fundamental operations.

Moreover, prior knowledge of classification modeling is, of course, needed. You need to know how to get the data needed to generate the confusion matrix.

You must install several Python packages to practice generating and visualizing confusion matrices. Ensure you have Pandas for data manipulation, Seaborn for data visualization, and scikit-learn for machine learning tools. You can install these packages using Python’s package manager, pip. Sometimes, it might be necessary to upgrade pip to the latest version. Installing packages is straightforward; for example, you can install Seaborn using the command pip install seaborn.

Confusion Matrix

A confusion matrix is a performance evaluation tool used in machine learning. It is a table that allows us to visualize the performance of a classification model by comparing the predicted and actual values of a dataset. The matrix is divided into four quadrants: true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

Understanding confusion matrices is crucial for evaluating model performance because they provide valuable insights into the accuracy and effectiveness of a classification model. By analyzing the values in each quadrant, we can determine how well the model performs in correctly identifying positive and negative instances.

The true positive (TP) quadrant represents the cases where the model correctly predicted the positive class. The true negative (TN) quadrant represents the cases where the model correctly predicted the negative class. The false positive (FP) quadrant represents the cases where the model incorrectly predicted the positive class. The false negative (FN) quadrant represents the cases where the model incorrectly predicted the negative class.

We can calculate performance metrics such as accuracy, precision, recall, and F1 score by analyzing these values. These metrics help us assess the model’s performance and make informed decisions about its effectiveness.

The following section will explore different methods to visualize confusion matrices and discuss the importance of choosing the right visualization technique.

Visualizing a Confusion Matrix

When it comes to visualizing a confusion matrix, several methods are available. Each technique offers its advantages and can provide valuable insights into the performance of a classification model.

One common approach is to use heatmaps, which use color gradients to represent the values in the matrix. Heatmaps allow us to quickly identify patterns and trends in the data, making it easier to interpret the model’s performance. Another method is to use bar charts, where the height of the bars represents the values in the matrix. Bar charts are useful for comparing the different categories and understanding the distribution of predictions.

However, Seaborn is one of Python’s most popular and powerful libraries for visualizing confusion matrices. Seaborn offers various functions and customization options, making creating visually appealing and informative plots easy. It provides a high-level interface to create heatmaps, bar charts, and other visualizations.

Choosing the right visualization technique is crucial because it can greatly impact the understanding and interpretation of the confusion matrix. The chosen visualization should convey the information and insights we want to communicate. Seaborn’s flexibility and versatility make it an excellent choice for plotting confusion matrices, allowing us to create clear and intuitive visualizations that enhance our understanding of the model’s performance.

In the next section, we will plot a confusion matrix using Seaborn in Python. We will explore the necessary steps and demonstrate how to create visually appealing and informative plots that help us analyze and interpret the performance of our classification model.

How to Plot a Confusion Matrix in Python

When it comes to plotting a confusion matrix in Python, there are several libraries available that offer this capability.

Steps to Plot a Confusion matrix in Python:

Generating a confusion matrix in Python using any package typically involves the following steps:

  1. Import the Necessary Libraries: Begin by importing the relevant Python libraries, such as the package for generating confusion matrices and other dependencies.
  2. Prepare True and Predicted Labels: Collect the true labels (ground truth) and the predicted labels from your classification model or analysis.
  3. Compute the Confusion Matrix: Utilize the functions or methods the chosen package provides to compute the confusion matrix. This matrix will tabulate the counts of true positives, true negatives, false positives, and false negatives.
  4. Visualize or Analyze the Matrix: Optionally, you can visualize the confusion matrix using various visualization tools or analyze its values to assess the performance of your classification model.

Seaborn

This post will use Seaborn, one of this task’s most popular and powerful libraries. Seaborn provides a high-level interface to create visually appealing and informative plots, including confusion matrices. It offers various functions and customization options, making it easy to generate clear and intuitive visualizations.

One of the advantages of using Seaborn for plotting confusion matrices is its flexibility. It allows you to create heatmaps, bar charts, and other visualizations, allowing you to choose the most suitable representation for your data. Another advantage of Seaborn is its versatility. It provides various customization options, such as color palettes and annotations, which allow you to enhance the visual appearance of your confusion matrix and highlight important information. Using Seaborn, you can create visually appealing and informative plots that help you analyze and interpret the performance of your classification model. Its powerful capabilities and user-friendly interface make it an excellent choice for plotting confusion matrices in Python.

The following sections will dive into the necessary steps to prepare your data for generating a confusion matrix using Seaborn. We will also explore data preprocessing techniques that may be required to ensure accurate and meaningful results. First, however, we will generate a synthetic dataset that can be used to practice generating confusion matrices and plotting them.

Synthetic Data

Here, we generate a synthetic dataset that can be used to practice plotting a confusion matrix with Seaborn:

import pandas as pd
import random

# Define the number of test cases
num_cases = 100

# Create a list of hearing test results (Categorical: Hearing Loss, No Hearing Loss)
hearing_results = ['Hearing Loss'] * 20 + ['No Hearing Loss'] * 70

# Introduce noise (e.g., due to external factors)
noisy_results = [random.choice(hearing_results) for _ in range(10)]

# Combine the results
results = hearing_results + noisy_results

# Create a dataframe:
data = pd.DataFrame({'HearingTestResult': results})Code language: PHP (php)

In the code chunk above, we first imported the Pandas library, which is instrumental for data manipulation and analysis in Python. We also utilized the ‘random’ module for generating random data.

To begin, we defined the variable num_cases to represent the total number of test cases, which in this context amounts to 100 observations. Next, we set the stage for simulating a hearing test dataset. We created hearing_results, a list containing the categories Hearing Loss and No Hearing Loss. This categorical variable represents the results of a hypothetical hearing test, where Hearing Loss indicates an impaired hearing condition and No Hearing Loss signifies normal hearing.

Incorporating an element of real-world variability, we introduced noisy_results. This step involves generating ten observations with random selections from the hearing_results list, mimicking external factors that may affect hearing test outcomes. The purpose is to simulate real-world variability and add diversity to the dataset.

Combining the hearing_results and noisy_results, we created the results list, representing the complete dataset. Finally, we used Pandas to create a dataframe with a dictionary as input. We named it data with a column labeled HearingTestResult, which encapsulates the simulated hearing test data.

Example data.

Preparing Data

Ensuring data is adequately prepared before generating a confusion matrix using Seaborn involves several necessary steps. First, we may need to gather the data we want to evaluate using the confusion matrix. This data should consist of the true and predicted labels from your classification model. Ensure the labels are correctly assigned and aligned with the corresponding data points.

Next, we may need to preprocess the data. Data preprocessing techniques can improve the quality and reliability of your results. Commonly, we use techniques such as handling missing values, scaling or normalizing the data, and encoding categorical variables. We will not go through all these steps to create a Seaborn confusion matrix plot.

For example, we can remove the rows or columns with missing values or impute the missing values using techniques such as mean imputation or regression imputation. Scaling the data can be important to ensure all features are on a similar scale. This can prevent certain features from dominating the analysis and affecting the performance of the confusion matrix.

Encoding categorical variables is necessary if your data includes non-numeric variables. This process can involve converting categorical variables into numerical representations. We can also, as in the example below, recode the categorical variables to True and False. See How to use Pandas get_dummies to Create Dummy Variables in Python for more information about dummy coding.

By following these steps and applying appropriate data preprocessing techniques, you can ensure our data is ready to generate a confusion matrix using Seaborn. The following section will provide step-by-step instructions on how to create a Seaborn confusion matrix, along with sample code and visuals to illustrate the process.

Creating a Seaborn Confusion Matrix

To generate a confusion matrix using Seaborn, follow these step-by-step instructions. First, import the necessary libraries, including Seaborn and Matplotlib. Next, prepare your data by ensuring you have the true and predicted labels from your classification model.

Step 1: Import the Libraries

Here, we import the libraries that we will use to use Seaborn to plot a Confusion Matrix.

import seaborn as sns 
import matplotlib.pyplot as plt 
from sklearn.metrics import confusion_matrixCode language: Python (python)

Step 2: Prepare and Preprocess Data

The following step is to prepare and preprocess data. Note that we do not have any missing values in the example data. However, we need to recode the categorial variables to True and False.

data['HearingTestResult'] = data['HearingTestResult'].replace({'Hearing Loss': True, 
                                                               'No Hearing Loss': False})Code language: Python (python)

In the Python code above, we transformed a categorical variable, HearingTestResult, into a binary format for further analysis. We used the Pandas library’s replace method to map the categories to boolean values. Specifically, we mapped ‘Hearing Loss’ to True, indicating the presence of hearing loss, and ‘No Hearing Loss’ to False, indicating the absence of hearing loss.

Step 3: Create a Confusion Matrix

Once the data is ready, we can create the confusion matrix using the confusion_matrix()function from the Scikit-learn library. This function takes the true and predicted labels as input and returns a matrix that represents the performance of our classification model.

conf_matrix = confusion_matrix(data['HearingTestResult'], 
                               data['PredictedResult'])Code language: Python (python)

In the code snippet above, we computed a confusion matrix using the confusion_matrix function from scikit-learn. We provided the true hearing test results from the dataset and the predicted results to evaluate the performance of a classification model.

Confusion matrix created in Python.

Step 4: Plot the Confusion Matrix with Seaborn

To plot a confusion Matrix with Seaborn, we can use the following code:

# Plot the confusion matrix using Seaborn
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=['Predicted Negative', 'Predicted Positive'],
            yticklabels=['True Negative', 'True Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()Code language: Python (python)

In the code chunk above, we created a visual representation of the confusion matrix using the Seaborn library. We defined the plot’s appearance to provide an insightful view of the model’s performance. The sns.heatmap function generates a heatmap with annotations to depict the confusion matrix values. We specified formatting options (annot and fmt) to display the counts, we chose the Blues color palette for visual clarity. Additionally, we customized the plot’s labels with xticklabels and yticklabels denoting the predicted and actual classes, respectively. The xlabel, ylabel, and title functions helped us label the plot appropriately. This visualization is a powerful tool for comprehending the model’s classification accuracy, making it accessible and easy for data analysts and stakeholders to interpret. Here is the resulting plot:

confusion plot created using the Seaborn LibraryConfusion plot created with Seaborn Library.

Interpreting the Confusion Matrix

Once you have generated a Seaborn confusion matrix for your classification model, it is important to understand how to interpret the results presented in the matrix. The confusion matrix provides valuable information about your model’s performance and can help you evaluate its accuracy. The confusion matrix consists of four main components: true positives, false positives, true negatives, and false negatives. These components represent the different outcomes of your classification model.

True positives (TP) are the cases where the model correctly predicted the positive class. In other words, these are the instances where the model correctly identified the presence of a certain condition or event. False positives (FP) occur when the model incorrectly predicts the positive class. These are the instances where the model falsely identifies the presence of a certain condition or event.

True negatives (TN) represent the cases where the model correctly predicts the negative class. These are the instances where the model correctly identifies the absence of a certain condition or event. False negatives (FN) occur when the model incorrectly predicts the negative class. These are the instances where the model falsely identifies the absence of a certain condition or event.

By analyzing these components, you can gain insights into the performance of your classification model. For example, many false positives may indicate that your model incorrectly identifies certain conditions or events. On the other hand, many false negatives may suggest that your model fails to identify certain conditions or events.

Understanding the meaning of true positives, false positives, and false negatives is crucial for evaluating the effectiveness of your classification model and making informed decisions based on its predictions. Before concluding the post, we will also examine how we can modify the Seaborn plot.

Modifying the Seaborn Confusion Matrix Plot

We can also plot the confusion matrix with percentages instead of raw values using Seaborn:

# Calculate percentages for each cell in the confusion matrix
percentage_matrix = (conf_matrix / conf_matrix.sum().sum())

# Plot the confusion matrix using Seaborn with percentages
plt.figure(figsize=(8, 6))
sns.heatmap(percentage_matrix, annot=True, fmt='.2%', cmap='Blues', cbar=False,
            xticklabels=['Predicted Negative', 'Predicted Positive'],
            yticklabels=['True Negative', 'True Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix (Percentages)')
plt.show()Code language: PHP (php)

In the code snippet above, we changed the code a bit. First, we calculated the percentages and stored them in the variable percentage_matrix by dividing the raw confusion matrix (conf_matrix) by the sum of all its elements.

percentages added to the Seaborn confusion matrix plot.Confusion Matrix Plot with Percentages.

After calculating the percentages, we modified the fmt parameter within the Seaborn heatmap function. Specifically, we set fmt to ‘.2%’ to format the annotations as percentages, ensuring that the values displayed in the matrix represent the proportions of the total observations in the dataset. This change enhances the interpretability of the confusion matrix by expressing classification performance relative to the dataset’s scale. Here are some more tutorials about, e.g., modifying Seaborn plots:

Conclusion

In conclusion, this tutorial has provided a comprehensive overview of how to plot and visualize a confusion matrix using Seaborn in Python. We have explored the concept of confusion matrices and their significance in various industries, such as speech recognition systems in hearing science and cognitive psychology experiments. By analyzing confusion matrices, we can gain valuable insights into the performance of systems and the accuracy of participants’ responses.

Understanding and visualizing a confusion matrix with Seaborn is crucial for data analysis projects. It allows us to assess classification models’ performance and identify improvement areas. Visualizing the confusion matrix will enable us to quickly interpret the results and make informed decisions based on other measures such as accuracy, precision, recall, and F1 score.

We encourage readers to apply their knowledge of confusion matrices and Seaborn in their data analysis projects. By implementing these techniques, they can enhance their understanding of classification models and improve the accuracy of their predictions.

I hope this article has helped demystify confusion matrices and provide practical guidance on plotting and visualizing them using Seaborn. I invite readers to share this post on social media and engage in discussions about their progress and experiences with confusion matrices in their data analysis endeavors.

Additional Resources

In addition to the information provided in this data visualization tutorial, several other resources and tutorials can further enhance your understanding of plotting and visualizing confusion matrices using Seaborn in Python. These resources can provide additional insights, tips, and techniques to help you improve your data analysis projects.

Here are some recommended resources:

  1. Seaborn Documentation: The official documentation for Seaborn is a valuable resource for understanding the various functionalities and options available for creating visualizations, including confusion matrices. It provides detailed explanations, examples, and code snippets to help you get started.
  2. Stack Overflow: Stack Overflow is a popular online community where programmers and data analysts share their knowledge and expertise. Using Seaborn, you can find numerous questions and answers related to plotting and visualizing confusion matrices. This platform can be a great source of solutions to specific issues or challenges.

By exploring these additional resources, you can expand your knowledge and skills in plotting and visualizing confusion matrices using Seaborn. These materials will give you a deeper understanding of the subject and help you apply these techniques effectively in your data analysis projects.

More Tutorials

Here are some more Python tutorials on this blog that you may find helpful:

The post Seaborn Confusion Matrix: How to Plot and Visualize in Python appeared first on Erik Marsja.

September 25, 2023 11:08 AM UTC


Kushal Das

Documentation of Puppet code using sphinx

Sphinx is the primary documentation tooling for most of my projects. I use it for the Linux command line book too. Last Friday while in a chat with Leif about documenting all of our puppet codebase, I thought of mixing these too.

Now puppet already has a tool to generate documentation from it's code, called puppet strings. We can use that to generate markdown output and then use the same in sphix for the final HTML output.

I am using https://github.com/simp/pupmod-simp-simplib as the example puppet code as it comes with good amount of reference documentation.

Install puppet strings and the dependencies

$ gem install yard puppet-strings

Then cloning puppet codebase.

$ git clone https://github.com/simp/pupmod-simp-simplib

Finally generating the initial markdown output.

$ puppet strings generate --format markdown --out simplib.md
Files                     161
Modules                   3 (3 undocumented)
Classes                   0 (0 undocumented)
Constants                 0 (0 undocumented)
Attributes                0 (0 undocumented)
Methods                   5 (0 undocumented)
Puppet Tasks              0 (0 undocumented)
Puppet Types              7 (0 undocumented)
Puppet Providers          8 (0 undocumented)
Puppet Plans              0 (0 undocumented)
Puppet Classes            2 (0 undocumented)
Puppet Data Type Aliases  73 (0 undocumented)
Puppet Defined Types      1 (0 undocumented)
Puppet Data Types         0 (0 undocumented)
Puppet Functions          68 (0 undocumented)
 98.20% documented

sphinx setup

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install sphinx myst_parser

After that create a standard sphinx project or use your existing one, and update the conf.py with the following.

extensions = ["myst_parser"]
source_suffix = {
    '.rst': 'restructuredtext',
    '.txt': 'markdown',
    '.md': 'markdown',
}

Then copy over the generated markdown from the previous step and use sed command to update the title of the document to something better.

$ sed -i '1 s/^.*$/SIMPLIB Documenation/' simplib.md

Don't forget to add the simplib.md file to your index.rst and then build the HTML documentation.

$ make html

We can still improve the markdown generated by the puppet strings command, have to figure out simpler ways to do that part.

Example output

September 25, 2023 09:23 AM UTC


eGenix.com

Python Meeting DĂŒsseldorf - 2023-09-27

The following text is in German, since we're announcing a regional user group meeting in DĂŒsseldorf, Germany.

AnkĂŒndigung

Das nĂ€chste Python Meeting DĂŒsseldorf findet an folgendem Termin statt:

27.09.2023, 18:00 Uhr
Raum 1, 2.OG im BĂŒrgerhaus Stadtteilzentrum Bilk
DĂŒsseldorfer Arcaden, Bachstr. 145, 40217 DĂŒsseldorf


Programm

Bereits angemeldete VortrÀge


Weitere VortrÀge können gerne noch angemeldet werden. Bei Interesse, bitte unter info@pyddf.de melden.

Startzeit und Ort

Wir treffen uns um 18:00 Uhr im BĂŒrgerhaus in den DĂŒsseldorfer Arcaden.

Das BĂŒrgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet sich an der Seite der Tiefgarageneinfahrt der DĂŒsseldorfer Arcaden.

Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der TĂŒr direkt links zu den zwei AufzĂŒgen, dann in den 2. Stock hochfahren. Der Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.

>>> Eingang in Google Street View

⚠ Wichtig: Bitte nur dann anmelden, wenn ihr absolut sicher seid, dass ihr auch kommt. Angesichts der begrenzten Anzahl PlĂ€tze, haben wir kein VerstĂ€ndnis fĂŒr kurzfristige Absagen oder No-Shows.

Einleitung

Das Python Meeting DĂŒsseldorf ist eine regelmĂ€ĂŸige Veranstaltung in DĂŒsseldorf, die sich an Python Begeisterte aus der Region wendet.

Einen guten Überblick ĂŒber die VortrĂ€ge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der VortrĂ€ge nach den Meetings veröffentlichen.

Veranstaltet wird das Meeting von der eGenix.com GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, DĂŒsseldorf:

Format

Das Python Meeting DĂŒsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.

VortrĂ€ge können vorher angemeldet werden, oder auch spontan wĂ€hrend des Treffens eingebracht werden. Ein Beamer mit HDMI und FullHD Auflösung steht zur VerfĂŒgung.

(Lightning) Talk Anmeldung bitte formlos per EMail an info@pyddf.de

Kostenbeteiligung

Das Python Meeting DĂŒsseldorf wird von Python Nutzern fĂŒr Python Nutzer veranstaltet.

Da Tagungsraum, Beamer, Internet und GetrĂ€nke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. SchĂŒler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.

Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.

Anmeldung

Da wir nur 25 Personen in dem angemieteten Raum empfangen können, möchten wir bitten, sich vorher anzumelden.

Meeting Anmeldung bitte per Meetup

Weitere Informationen

Weitere Informationen finden Sie auf der Webseite des Meetings:

              https://pyddf.de/

Viel Spaß !

Marc-Andre Lemburg, eGenix.com

September 25, 2023 09:00 AM UTC

September 22, 2023


Stack Abuse

How to Check for NaN Values in Python

Introduction

Today we're going to explore how to check for NaN (Not a Number) values in Python. NaN values can be quite a nuisance when processing data, and knowing how to identify them can save you from a lot of potential headaches down the road.

Why Checking for NaN Values is Important

NaN values can be a real pain, especially when you're dealing with numerical computations or data analysis. They can skew your results, cause errors, and generally make your life as a developer more difficult. For instance, if you're calculating the average of a list of numbers and a NaN value sneaks in, your result will also be NaN, regardless of the other numbers. It's almost as if it "poisons" the result - a single NaN can throw everything off.

Note: NaN stands for 'Not a Number'. It is a special floating-point value that cannot be converted to any other type than float.

NaN Values in Mathematical Operations

When performing mathematical operations, NaN values can cause lots of issues. They can lead to unexpected results or even errors. Python's math and numpy libraries typically propagate NaN values in mathematical operations, which can lead to entire computations being invalidated.

For example, in numpy, any arithmetic operation involving a NaN value will result in NaN:

import numpy as np

a = np.array([1, 2, np.nan])
print(a.sum())

Output:

nan

In such cases, you might want to consider using functions that can handle NaN values appropriately. Numpy provides nansum(), nanmean(), and others, which ignore NaN values:

print(np.nansum(a))

Output:

3.0

Pandas, on the other hand, generally excludes NaN values in its mathematical operations by default.

How to Check for NaN Values in Python

There are many ways to check for NaN values in Python, and we'll cover some of the most common methods used in different libraries. Let's start with the built-in math library.

Using the math.isnan() Function

The math.isnan() function is an easy way to check if a value is NaN. This function returns True if the value is NaN and False otherwise. Here's a simple example:

import math

value = float('nan')
print(math.isnan(value))  # True

value = 5
print(math.isnan(value))  # False

As you can see, when we pass a NaN value to the math.isnan() function, it returns True. When we pass a non-NaN value, it returns False.

The benefit of using this particular function is that the math module is built-in to Python, so no third party packages need to be installed.

Using the numpy.isnan() Function

If you're working with arrays or matrices, the numpy.isnan() function can be a nice tool as well. It operates element-wise on an array and returns a Boolean array of the same shape. Here's an example:

import numpy as np

array = np.array([1, np.nan, 3, np.nan])
print(np.isnan(array))
# array([False,  True, False,  True])

In this example, we have an array with two NaN values. When we use numpy.isnan(), it returns a Boolean array where True corresponds to the positions of NaN values in the original array.

You'd want to use this method when you're already using NumPy in your code and need a function that works well with other NumPy structures, like np.array.

Using the pandas.isnull() Function

Pandas provides an easy-to-use function, isnull(), to check for NaN values in the DataFrame or Series. Let's take a look at an example:

import pandas as pd

# Create a DataFrame with NaN values
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [5, np.nan, np.nan], 'C': [1, 2, 3]})

print(df.isnull())

The output will be a DataFrame that mirrors the original, but with True for NaN values and False for non-NaN values:

       A      B      C
0  False  False  False
1  False   True  False
2   True   True  False

One thing you'll notice if you test this method out is that it also returns True for None values, hence why it refers to null in the method name. It will return True for both NaN and None.

Comparing the Different Methods

Each method we've discussed — math.isnan(), numpy.isnan(), and pandas.isnull() — has its own strengths and use-cases. The math.isnan() function is a straightforward way to check if a number is NaN, but it only works on individual numbers.

On the other hand, numpy.isnan() operates element-wise on arrays, making it a good choice for checking NaN values in numpy arrays.

Finally, pandas.isnull() is perfect for checking NaN values in pandas Series or DataFrame objects. It's worth mentioning that pandas.isnull() also considers None as NaN, which can be very useful when dealing with real-world data.

Conclusion

Checking for NaN values is an important step in data preprocessing. We've explored three methods — math.isnan(), numpy.isnan(), and pandas.isnull() — each with its own strengths, depending on the type of data you're working with.

We've also discussed the impact of NaN values on mathematical operations and how to handle them using numpy and pandas functions.

September 22, 2023 08:12 PM UTC

How to Position Legend Outside the Plot in Matplotlib

Introduction

In data visualization, often create complex graphs that need to have legends for the reader to be able to interpret the graph. But what if those legends get in the way of the actual data that they need to see? In this Byte, we'll see how you can move the legend so that it's outside of the plot in Matplotlib.

Legends in Matplotlib

In Matplotlib, legends provide a mapping of labels to the elements of the plot. These can be very important to help the reader understand the visualization they're looking at. Without the legend, you might not know which line represented which data! Here's a basic example of how legends work in Matplotlib:

import matplotlib.pyplot as plt

# Create a simple line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label='Sample Data')

# Add a legend
plt.legend()

# Display the plot
plt.show()

This will produce a plot with a legend located in the upper-left corner inside the plot. The legend contains the label 'Sample Data' that we specified in the plt.plot() function.

plot with legend

Why Position the Legend Outside the Plot?

While having the legend inside the plot is the default setting in Matplotlib, it's not always the best choice. Legends can obscure important details of the plot, especially when dealing with complex data visualizations. By positioning the legend outside the plot, we can be sure that all data points are clearly visible, making our plots easier to interpret.

How to Position the Legend Outside the Plot in Matplotlib

Positioning the legend outside the plot in Matplotlib is fairly easy to do. We simply need to use the bbox_to_anchor and loc parameters of the legend() function. Here's how to do it:

import matplotlib.pyplot as plt

# Create a simple line plot
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], label='Sample Data')

# Add a legend outside the plot
plt.legend(bbox_to_anchor=(1, 1.10), loc='upper right')

# Display the plot
plt.show()

In this example, bbox_to_anchor is a tuple specifying the coordinates of the legend's anchor point, and loc indicates the location of the anchor point with respect to the legend's bounding box. The coordinates are in axes fraction (i.e., from 0 to 1) relative to the size of the plot. So, (1, 1.10) positions the anchor point just outside the top right corner of the plot.

legend outside the plot

Positioning this legend is a bit more of an art than a science, so you may need to play around with the values a bit to see what works.

Common Issues and Solutions

One common issue is the legend getting cut off when you save the figure using plt.savefig(). This happens because plt.savefig() doesn't automatically adjust the figure size to accommodate the legend. To fix this, you can use the bbox_inches parameter and set it to 'tight' like so:

plt.savefig('my_plot.png', bbox_inches='tight')

Another common issue is the legend overlapping with the plot when positioned outside. This can be fixed by adjusting the plot size or the legend size to ensure they fit together nicely. Again, this is something you'll likely have to test with many different values to find the right configuration and positioning.

Note: Adjusting the plot size can be done using plt.subplots_adjust(), while the legend size can be adjusted using legend.get_frame().

Conclusion

And there you have it! In this Byte, we showed how you can position the legend outside the plot in Matplotlib and explained some common issues. We've also talked a bit about some use-cases where you'll need to position the legend outside the plot.

September 22, 2023 07:14 PM UTC

Importing Python Modules

Introduction

Python allows us to create just about anything, from simple scripts to complex machine learning models. But to work on any complex project, you'll likely need to use or create modules. These are the building blocks of complex projects. In this article, we'll explore Python modules, why we need them, and how we can import them in our Python files.

Understanding Python Modules

In Python, a module is a file containing Python definitions and statements. The file name is the module name with the suffix .py added. Imagine you're working on a Python project, and you've written a function to calculate the Fibonacci series. Now, you need to use this function in multiple files. Instead of rewriting the function in each file, you can write it once in a Python file (module) and import it wherever needed.

Here's a simple example. Let's say we have a file math_operations.py with a function to add two numbers:

# math_operations.py

def add_numbers(num1, num2):
    return num1 + num2

We can import this math_operations module in another Python file and use the add_numbers function:

# main.py

import math_operations

print(math_operations.add_numbers(5, 10))  # Output: 15

In the above example, we've imported the math_operations module using the import statement and used the add_numbers function defined in the module.

Note: Python looks for module files in the directories defined in sys.path. It includes the directory containing the input script (or the current directory), the PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH), and the installation-dependent default directory. You can check the sys.path using import sys; print(sys.path).

But why do we need to import Python files? Why can't we just write all our code in one file? Let's find out in the next section.

Why Import Python Files?

The concept of importing files in Python is comparable to using a library or a toolbox. Imagine you're working on a project and need a specific tool. Instead of creating that tool from scratch every time you need it, you would look in your toolbox for it, right? The same goes for programming in Python. If you need a specific function or class, instead of writing it from scratch, you can import it from a Python file that already contains it.

This not only helps us from having to continously rewrite code we've already written, but it also makes our code cleaner, more efficient, and easier to manage. This promotes a modular programming approach where the code is broken down into separate parts or modules, each performing a specific function. This modularity makes debugging and understanding the code much easier.

Here's a simple example of importing a Python standard library module:

import math

# Using the math library to calculate the square root
print(math.sqrt(16))

Output:

4.0

We import the math module and use its sqrt function to calculate the square root of 16.

Different Ways to Import Python Files

Python provides several ways to import modules, each with its own use cases. Let's look at the three most common methods.

Using 'import' Statement

The import statement is the simplest way to import a module. It simply imports the module, and you can use its functions or classes by referencing them with the module name.

import math

print(math.pi)

Output:

3.141592653589793

In this example, we import the math module and print the value of pi.

Using 'from...import' Statement

The from...import statement allows you to import specific functions, classes, or variables from a module. This way, you don't have to reference them with the module name every time you use them.

from math import pi

print(pi)

Output:

3.141592653589793

Here, we import only the pi variable from the math module and print it.

Using 'import...as' Statement

The import...as statement is used when you want to give a module a different name in your script. This is particularly useful when the module name is long and you want to use a shorter alias for convenience.

import math as m

print(m.pi)

Output:

3.141592653589793

Here, we import the math module as m and then use this alias to print the value of pi.

Importing Modules from a Package

Packages in Python are a way of organizing related modules into a directory hierarchy. Think of a package as a folder that contains multiple Python modules, along with a special __init__.py file that tells Python that the directory should be treated as a package.

But how do you import a module that's inside a package? Well, Python provides a straightforward way to do this.

Suppose you have a package named shapes and inside this package, you have two modules, circle.py and square.py. You can import the circle module like this:

from shapes import circle

Now, you can access all the functions and classes defined in the circle module. For instance, if the circle module has a function area(), you can use it as follows:

circle_area = circle.area(5)
print(circle_area)

This will print the area of a circle with a radius of 5.

Note: If you want to import a specific function or class from a module within a package, you can use the from...import statement, as we showed earlier.

But what if your package hierarchy is deeper? What if the circle module is inside a subpackage called 2d inside the shapes package? Python has got you covered. You can import the circle module like this:

from shapes.2d import circle

Python's import system is quite flexible and powerful. It allows you to organize your code in a way that makes sense to you, while still providing easy access to your functions, classes, and modules.

Common Issues Importing Python Files

As you work with Python, you may come across several errors while importing modules. These errors could stem from a variety of issues, including incorrect file paths, syntax errors, or even circular imports. Let's see some of these common errors.

Fixing 'ModuleNotFoundError'

The ModuleNotFoundError is a subtype of ImportError. It's raised when you try to import a module that Python cannot find. It's one of the most common issues developers face while importing Python files.

import missing_module

This will raise a ModuleNotFoundError: No module named 'missing_module'.

There are several ways you can fix this error:

  1. Check the Module's Name: Ensure that the module's name is spelled correctly. Python is case-sensitive, which means module and Module are treated as two different modules.

  2. Install the Module: If the module is not a built-in module and you have not created it yourself, you may need to install it using pip. For example:

$ pip install missing_module
  1. Check Your File Paths: Python searches for modules in the directories defined in sys.path. If your module is not in one of these directories, Python won't be able to find it. You can add your module's directory to sys.path using the following code:
import sys
sys.path.insert(0, '/path/to/your/module')
  1. Use a Try/Except Block: If the module you're trying to import is not crucial to your program, you can use a try/except block to catch the ModuleNotFoundError and continue running your program. For example:
try:
    import missing_module
except ModuleNotFoundError:
    print("Module not found. Continuing without it.")

Avoiding Circular Imports

In Python, circular imports can be quite a headache. They occur when two or more modules depend on each other, either directly or indirectly. This leads to an infinite loop, causing the program to crash. So, how do we avoid this common pitfall?

The best way to avoid circular imports is by structuring your code in a way that eliminates the need for them. This could mean breaking up large modules into smaller, more manageable ones, or rethinking your design to remove unnecessary dependencies.

For instance, consider two modules A and B. If A imports B and B imports A, a circular import occurs. Here's a simplified example:

# A.py
import B

def function_from_A():
    print("This is a function in module A.")
    B.function_from_B()

# B.py
import A

def function_from_B():
    print("This is a function in module B.")
    A.function_from_A()

Running either module will result in a RecursionError. To avoid this, you could refactor your code so that each function is in its own module, and they import each other only when needed.

# A.py
def function_from_A():
    print("This is a function in module A.")

# B.py
import A

def function_from_B():
    print("This is a function in module B.")
    A.function_from_A()

Note: It's important to remember that Python imports are case-sensitive. This means that import module and import Module would refer to two different modules and could potentially lead to a ModuleNotFoundError if not handled correctly.

Using __init__.py in Python Packages

In our journey through learning about Python imports, we've reached an interesting stop — the __init__.py file. This special file serves as an initializer for Python packages. But what does it do, exactly?

In the simplest terms, __init__.py allows Python to recognize a directory as a package so that it can be imported just like a module. Previously, an empty __init__.py file was enough to do this. However, from Python 3.3 onwards, thanks to the introduction of PEP 420, __init__.py is no longer strictly necessary for a directory to be considered a package. But it still holds relevance, and here's why.

Note: The __init__.py file is executed when the package is imported, and it can contain any Python code. This makes it a useful place for initialization logic for the package.

Consider a package named animals with two modules, mammals and birds. Here's how you can use __init__.py to import these modules.

# __init__.py file
from . import mammals
from . import birds

Now, when you import the animals package, mammals and birds are also imported.

# main.py
import animals

animals.mammals.list_all()  # Use functions from the mammals module
animals.birds.list_all()  # Use functions from the birds module

By using __init__.py, you've made the package's interface cleaner and simpler to use.

Organizing Imports: PEP8 Guidelines

When working with Python, or any programming language really, it's important to keep your code clean and readable. This not only makes your life easier, but also the lives of others who may need to read or maintain your code. One way to do this is by following the PEP8 guidelines for organizing imports.

According to PEP8, your imports should be grouped in the following order:

  1. Standard library imports
  2. Related third party imports
  3. Local application/library specific imports

Each group should be separated by a blank line. Here's an example:

# Standard library imports
import os
import sys

# Related third party imports
import requests

# Local application/library specific imports
from my_library import my_module

In addition, PEP8 also recommends that imports should be on separate lines, and that they should be ordered alphabetically within each group.

Note: While these guidelines are not mandatory, following them can greatly improve the readability of your code and make it more Pythonic.

To make your life even easier, many modern IDEs, like PyCharm, have built-in tools to automatically organize your imports according to PEP8.

With proper organization and understanding of Python imports, you can avoid common errors and improve the readability of your code. So, the next time you're writing a Python program, give these guidelines a try. You might be surprised at how much cleaner and more manageable your code becomes.

Conclusion

And there you have it! We've taken a deep dive into the world of Python imports, exploring why and how we import Python files, the different ways to do so, common errors and their fixes, and the role of __init__.py in Python packages. We've also touched on the importance of organizing imports according to PEP8 guidelines.

Remember, the way you handle imports can greatly impact the readability and maintainability of your code. So, understanding these concepts is not just a matter of knowing Python's syntax—it's about writing better, more efficient code.

September 22, 2023 03:40 PM UTC

Fix the "AttributeError: module object has no attribute 'Serial'" Error in Python

Introduction

Even if you're a seasoned Python developer, you'll occasionally encounter errors that can be pretty confusing. One such error is the AttributeError: module object has no attribute 'Serial'. This Byte will help you understand and resolve this issue.

Understanding the AttributeError

The AttributeError in Python is raised when you try to access or call an attribute that a module, class, or instance does not have. Specifically, the error AttributeError: module object has no attribute 'Serial' suggests that you're trying to access the Serial attribute from a module that doesn't possess it.

For instance, if you have a module named serial and you're trying to use the Serial attribute from it, you might encounter this error. Here's an example:

import serial

ser = serial.Serial('/dev/ttyUSB0')  # This line causes the error

In this case, the serial module you're importing doesn't have the Serial attribute, hence the AttributeError.

Note: It's important to understand that Python is case-sensitive. Serial and serial are not the same. If the attribute exists but you're using the wrong case, you'll also encounter an AttributeError.

Fixes for the Error

The good news is that this error is usually a pretty easy fix, even if it seems very confusing at first. Let's explore some of the solutions.

Install the Correct pyserial Module

One of the most common reasons for this error is the incorrect installation of the pyserial module. The Serial attribute is a part of the pyserial module, which is used for serial communication in Python.

You might have installed a module named serial instead of pyserial (this is more common than you think!). To fix this, you need to uninstall the incorrect module and install the correct one.

$ pip uninstall serial
$ pip install pyserial

After running these commands, your issue may be resolved. Now you can import Serial from pyserial and use it in your code:

from pyserial import Serial

ser = Serial('/dev/ttyUSB0')  # This line no longer causes an error

If this didn't fix the error, keep reading.

Rename your serial.py File

For how much Python I've written in my career, you'd think that I wouldn't make this simple mistake as much as I do...

Another possibility is that the Python interpreter gets confused when there's a file in your project directory with the same name as a module you're trying to import. This is another common source of the AttributeError error.

Let's say, for instance, you have a file named serial.py in your project directory (or maybe your script itself is named serial.py). When you try to import serial, Python might import your serial.py file instead of the pyserial module, leading to the AttributeError.

The solution here is simple - rename your serial.py file to something else.

$ mv serial.py my_serial.py

Conclusion

In this Byte, we've explored two common causes of the AttributeError: module object has no attribute 'Serial' error in Python: installing the wrong pyserial module, and having a file in your project directory that shares a name with a module you're trying to import. By installing the correct module or renaming conflicting files, you should be able to eliminate this error.

September 22, 2023 12:51 PM UTC


Real Python

The Real Python Podcast – Episode #173: Getting Involved in Open Source & Generating QR Codes With Python

Have you thought about contributing to an open-source Python project? What are possible entry points for intermediate developers? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

September 22, 2023 12:00 PM UTC

September 21, 2023


Wesley Chun

Managing Shared (formerly Team) Drives with Python and the Google Drive API

2023 UPDATE: We are working to put updated versions of all the code into GitHub... stay tuned. The link will provided in all posts once the code sample(s) is(are) available.


2019 UPDATE: "G Suite" is now called "Google Workspace", "Team Drives" is now known as "Shared Drives", and the corresponding supportsTeamDrives flag has been renamed to supportsAllDrives. Please take note of these changes regarding the post below.

NOTE 1: Team Drives is only available for G Suite Business Standard users or higher. If you're developing an application for Team Drives, you'll need similar access.
NOTE 2: The code featured here is also available as a video + overview post as part of this series.

Introduction

Team Drives is a relatively new feature from the Google Drive team, created to solve some of the issues of a user-centric system in larger organizations. Team Drives are owned by an organization rather than a user and with its use, locations of files and folders won't be a mystery any more. While your users do have to be a G Suite Business (or higher) customer to use Team Drives, the good news for developers is that you won't have to write new apps from scratch or learn a completely different API.

Instead, Team Drives features are accessible through the same Google Drive API you've come to know so well with Python. In this post, we'll demonstrate a sample Python app that performs core features that all developers should be familiar with. By the time you've finished reading this post and the sample app, you should know how to:
  • Create Team Drives
  • Add members to Team Drives
  • Create a folder in Team Drives
  • Import/upload files to Team Drives folders

Using the Google Drive API

The demo script requires creating files and folders, so you do need full read-write access to Google Drive. The scope you need for that is:
  • 'https://www.googleapis.com/auth/drive' — Full (read-write) access to Google Drive
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, assume you have a service endpoint to the API and have assigned it to the DRIVE variable.

Create Team Drives

New Team Drives can be created with DRIVE.teamdrives().create(). Two things are required to create a Team Drive: 1) you should name your Team Drive. To make the create process idempotent, you need to create a unique request ID so that any number of identical calls will still only result in a single Team Drive being created. It's recommended that developers use a language-specific UUID library. For Python developers, that's the uuid module. From the API response, we return the new Team Drive's ID. Check it out:
def create_td(td_name):
    request_id = str(uuid.uuid4())
    body = {'name': td_name}
    return DRIVE.teamdrives().create(body=body,
            requestId=request_id, fields='id').execute().get('id')

Add members to Team Drives

To add members/users to Team Drives, you only need to create a new permission, which can be done with  DRIVE.permissions().create(), similar to how you would share a file in regular Drive with another user.  The pieces of information you need for this request are the ID of the Team Drive, the new member's email address as well as the desired role... choose from: "organizer", "owner", "writer", "commenter", "reader". Here's the code:
def add_user(td_id, user, role='commenter'):
    body = {'type': 'user', 'role': role, 'emailAddress': user}
    return DRIVE.permissions().create(body=body, fileId=td_id,
            supportsTeamDrives=True, fields='id').execute().get('id')
Some additional notes on permissions: the user can only be bestowed permissions equal to or less than the person/admin running the script... IOW, they cannot grant someone else greater permission than what they have. Also, if a user has a certain role in a Team Drive, they can be granted greater access to individual elements in the Team Drive. Users who are not members of a Team Drive can still be granted access to Team Drive contents on a per-file basis.

Create a folder in Team Drives

Nothing to see here! Yep, creating a folder in Team Drives is identical to creating a folder in regular Drive, with DRIVE.files().create(). The only difference is that you pass in a Team Drive ID rather than regular Drive folder ID. Of course, you also need a folder name too. Here's the code:
def create_td_folder(td_id, folder):
    body = {'name': folder, 'mimeType': FOLDER_MIME, 'parents': [td_id]}
    return DRIVE.files().create(body=body,
            supportsTeamDrives=True, fields='id').execute().get('id')

Import/upload files to Team Drives folders

Uploading files to a Team Drives folder is also identical to to uploading to a normal Drive folder, and also done with DRIVE.files().create(). Importing is slightly different than uploading because you're uploading a file and converting it to a G Suite/Google Apps document format, i.e., uploading CSV as a Google Sheet, or plain text or Microsoft Word&reg; file as Google Docs. In the sample app, we tackle the former:
def import_csv_to_td_folder(folder_id, fn, mimeType):
    body = {'name': fn, 'mimeType': mimeType, 'parents': [folder_id]}
    return DRIVE.files().create(body=body, media_body=fn+'.csv',
            supportsTeamDrives=True, fields='id').execute().get('id')
The secret to importing is the MIMEtype. That tells Drive whether you want conversion to a G Suite/Google Apps format (or not). The same is true for exporting. The import and export MIMEtypes supported by the Google Drive API can be found in my SO answer here.

Driver app

All these functions are great but kind-of useless without being called by a main application, so here we are:
FOLDER_MIME = 'application/vnd.google-apps.folder'
SOURCE_FILE = 'inventory' # on disk as 'inventory.csv'
SHEETS_MIME = 'application/vnd.google-apps.spreadsheet'

td_id = create_td('Corporate shared TD')
print('** Team Drive created')
perm_id = add_user(td_id, 'email@example.com')
print('** User added to Team Drive')
folder_id = create_td_folder(td_id, 'Manufacturing data')
print('** Folder created in Team Drive')
file_id = import_csv_to_td_folder(folder_id, SOURCE_FILE, SHEETS_MIME)
print('** CSV file imported as Google Sheets in Team Drives folder')
The first set of variables represent some MIMEtypes we need to use as well as the CSV file we're uploading to Drive and requesting it be converted to Google Sheets format. Below those definitions are calls to all four functions described above.

Conclusion

If you run the script, you should get output that looks something like this, with each print() representing each API call:
$ python3 td_demo.py
** Team Drive created
** User added to Team Drive
** Folder created in Team Drive
** CSV file imported as Google Sheets in Team Drives folder
When the script has completed, you should have a new Team Drives folder called "Corporate shared TD", and within, a folder named "Manufacturing data" which contains a Google Sheets file called "inventory".

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!)—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function
import uuid

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/drive'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

def create_td(td_name):
    request_id = str(uuid.uuid4()) # random unique UUID string
    body = {'name': td_name}
    return DRIVE.teamdrives().create(body=body,
            requestId=request_id, fields='id').execute().get('id')

def add_user(td_id, user, role='commenter'):
    body = {'type': 'user', 'role': role, 'emailAddress': user}
    return DRIVE.permissions().create(body=body, fileId=td_id,
            supportsTeamDrives=True, fields='id').execute().get('id')

def create_td_folder(td_id, folder):
    body = {'name': folder, 'mimeType': FOLDER_MIME, 'parents': [td_id]}
    return DRIVE.files().create(body=body,
            supportsTeamDrives=True, fields='id').execute().get('id')

def import_csv_to_td_folder(folder_id, fn, mimeType):
    body = {'name': fn, 'mimeType': mimeType, 'parents': [folder_id]}
    return DRIVE.files().create(body=body, media_body=fn+'.csv',
            supportsTeamDrives=True, fields='id').execute().get('id')

FOLDER_MIME = 'application/vnd.google-apps.folder'
SOURCE_FILE = 'inventory' # on disk as 'inventory.csv'... CHANGE!
SHEETS_MIME = 'application/vnd.google-apps.spreadsheet'

td_id = create_td('Corporate shared TD')
print('** Team Drive created')
perm_id = add_user(td_id, 'email@example.com') # CHANGE!
print('** User added to Team Drive')
folder_id = create_td_folder(td_id, 'Manufacturing data')
print('** Folder created in Team Drive')
file_id = import_csv_to_td_folder(folder_id, SOURCE_FILE, SHEETS_MIME)
print('** CSV file imported as Google Sheets in Team Drives folder')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Write a simple application that moves folders (and its files or folders) in regular Drive to Team Drives. Each folder you move should be a corresponding folder in Team Drives. Remember that files in Team Drives can only have one parent, and the same goes for folders.

September 21, 2023 10:31 PM UTC


Stack Abuse

How to Pass Multiple Arguments to the map() Function in Python

Introduction

The goal of Python, with its rich set of built-in functions, is to allow developers to accomplish complex tasks with relative ease. One such powerful, yet often overlooked, function is the map() function. The map() function will execute a given function over a set of items, but how do we pass additional arguments to the provided function?

In this Byte, we'll be exploring the map() function and how to effectively pass multiple arguments to it.

The map() Function in Python

The map() function in Python is a built-in function that applies a given function to every item of an iterable (like list, tuple etc.) and returns a list of the results.

def square(number):
    return number ** 2

numbers = [1, 2, 3, 4, 5]
squared = map(square, numbers)

print(list(squared))  # Output: [1, 4, 9, 16, 25]

In this snippet, we've defined a function square() that takes a number and returns its square. We then use the map() function to apply this square() function to each item in the numbers list.

Why Pass Multiple Arguments to map()?

You might be wondering, "Why would I need to pass multiple arguments to map()?" Well, there are scenarios where you might have a function that takes more than one argument, and you want to apply this function to multiple sets of data simultaneously.

Not every function we provide to map() will take only one argument. What if, instead of a squared function, we have a more generic math.pow function, and one of the arguments is what number to raise the item to. How do we handle a case like this?

Or maybe you have two lists of numbers, and you want to find the product of corresponding numbers from these lists. This is another case where passing multiple arguments to map() can come be helpful.

How to Pass Multiple Arguments to map()

There are a few different types of cases in which you'd want to pass multiple arguments to map(), two of which we mentioned above. We'll walk through both of those cases here.

Multiple Iterables

Passing multiple arguments to the map() function is simple once you understand how to do it. You simply pass additional iterables after the function argument, and map() will take items from each iterable and pass them as separate arguments to the function.

Here's an example:

def multiply(x, y):
    return x * y

numbers1 = [1, 2, 3, 4, 5]
numbers2 = [6, 7, 8, 9, 10]
result = map(multiply, numbers1, numbers2)

print(list(result))  # Output: [6, 14, 24, 36, 50]

Note: Make sure that the number of arguments in the function should match the number of iterables passed to map()!

In the example above, we've defined a function multiply() that takes two arguments and returns their product. We then pass this function, along with two lists, to the map() function. The map() function applies multiply() to each pair of corresponding items from the two lists, and returns a new list with the results.

Multiple Arguments, One Iterable

Continuing with our math.pow example, let's see how we can still use map() to run this function on all items of an array.

The first, and probably simplest, way is to not use map() at all, but to use something like list comprehension instead.

import math

numbers = [1, 2, 3, 4, 5]
res = [math.pow(n, 3) for n in numbers]

print(res) # Output: [1.0, 8.0, 27.0, 64.0, 125.0]

This is essentiall all map() really is, but it's not as compact and neat as using a convenient function like map().

Now, let's see how we can actually use map() with a function that requires multiple arguments:

import math
import itertools

numbers = [1, 2, 3, 4, 5]
res = map(math.pow, numbers, itertools.repeat(3, len(numbers)))

print(list(res)) # Output: [1.0, 8.0, 27.0, 64.0, 125.0]

This may seem a bit more complicated at first, but it's actually very simple. We use a helper function, itertools.repeat, to create a list the same length as numbers and with only values of 3.

So the output of itertools.repeat(3, len(numbers)), when converted to a list, is just [3, 3, 3, 3, 3]. This works because we're now passing two lists of the same length to map(), which it happily accepts.

Conclusion

The map() function is particularly useful when working with multiple iterables, as it can apply a function to the elements of these iterables in pairs, triples, or more. In this Byte, we've covered how to pass multiple arguments to the map() function and how to work with multiple iterables.

September 21, 2023 08:22 PM UTC


William Minchin

minchin.jrnl v7 “Phoenix” released

Today, I do something that I should have done 5 years ago, and something that I’ve been putting off for the last 2 years1: I’m releasing a personal fork of jrnl2! I’ve given this release the codename Phoenix, after the mythical bird of rebirth, that springs forth renewed from the ashes of the past.

You can install it today:

pip install minchin.jrnl

And then to run it from the command line:

minchin.jrnl

(or

jrnl

)

Features Today

Today, the codebase is that of jrnl v2.63 in a new namespace. In particular, that gives us a working yaml exporter; now you can build your Pelican sites again (or maybe only I was doing that
).

The version number (7) was picked to be larger than the current jrnl-org/jrnl release (currently at 4.0.1). (Plus I thought it would look cool!)

I’ve moved the configuration to a new location on disk4, as to not stomp on your existing jrnl (i.e. jrnl-org/jrnl or “legacy”) installation.

Limited documentation, to match the current codebase, has been uploaded to my personal site at minchin.ca/minchin.jrnl. (Although it remains incomplete and very much a work in progress.)

And finally, I extend an invitation to all those current or former users of jrnl to move here. I welcome your contributions and support. If there are features missing, please feel free to let me know.

Short Term Update Plans

I’ve pushed this release out with very few changes from the base codebase in a effort to get it live. But I have some updates that I’m planning to do very shortly. There updates will maintain the Phoenix codename, even if the major version number increments.

The biggest of these is to launch my much anticipated plugin system. The code has been already written (for several years now5, actually), can it just needs to be double checked that it still works as expected.

The other immediate update is to make sure the code works with Python 3.11 (the current version of Python), which seems to already be the case.

Medium to Long Term Project Goals, or My Vision

These are features I’d like to add, although I realize this will take more than tonight. Also this section lays out my visions for the project and some anti-features I want to avoid.

The Plugin System

The plugin system I think will be huge movement forward to make minchin.jrnl more useful. In particular, it allows minchin.jrnl to import and export to and from new formats, including allowing you to write one-off export formats (which I intend to use personally right away!). Displying the journal entries on the commandline is also handled by exporters, so you’d be able to tweak that output as well. I also intend to extend the plugin system to the storage backends.

My hope is that this will futureproof minchin.jrnl, allowing new formats to quickly and easily be added, retiring deprecated formats to external plugins, and being able to quickly test and integrate new formats by seemlessly bring external plugins “in-house”.

In particular, I’m planning to have separate plugins for “true” yaml + markdown exports and Pelican-formated markdown, to add an export format for Obsidian6 formatted markdown, and add backend format plugins to support jrnl v1 and whatever format they’re dreaming up for jrnl v47.

In short, I hope plugins will allow you to make minchin.jrnl more useful, without me being the bottleneck.

Plain Text Forever

One of the things that drew to the original jrnl implementation was that is was based on plain text, and using plain text to store journal entries. Plain text has a number of advantages8, but the near universal backwards and forewards compatibility in high on that list. Yes, plain text has it’s limitations9, but I think the advantages far outweight the disadvantages, particularly when it comes to a journal that you might hope will be readable years or generations from now. Also, plain text just makes it so much easier to develop minchin.jrnl.

The included, default option for minchin.jrnl will always be plain text.

If you’re looking to upgrade your plain text, you might consider Markdown10 or ReStructured Text (ReST)11.

If you’re looking for either a database backend or more multimedia functionality (or both), you’re welcome to write something as a backend plugin for minchin.jrnl; that ability is a featured reason for providing the (to be added shortly!) plugin system in the first place!

MIT Licensed

The original jrnl was initially released under the MIT license, and that only changed with the v2.4 release to GPLv312. My hope and goal is to remove all GPL-licensed code and release future versions of minchin.jrnl under the MIT license23.

My opposition to the change13 was because I’ve come to feel that Open Source work is both given and received as a gift, and I feel the GPL license moves away from that ethos.

I suspect the least fun part of this partial re-write will be getting the testing system up and running again, as the original library jrnl v1 had been using has gone many years without updates.

To this end, I’m requesting that any code contributions to the project be dual-licensed under both MIT and GPLv3.

Documentation in Sphinx

Documentation will eventually be moved over to Sphinx (from the current MkDocs), a process I’ve began but not finished. Driving this is the expectation that I’ll have more technical documentation (than is included currently) as I layout how to work with the plugin API, and Sphinx makes it easy to keep code and documentation side by side in the same (code) file.

Furthermore, I want to document how to use minchin.jrnl as a Python API generally; this would allow you to interact with your journal from other Python programs.

Easy to Push Releases

Knowing my own schedule, I want to be able to sit down for an evening, make (several) small improvements, and then push out a new release before I go to bed. To that end, I want to make the streamlined to push out new releases. Expect lots of small releases. :)

Drop poetry

poetry is a powerful Python project management tool, but is one I’ve never really liked14. Particular issues include a lack of first class Windows support15 and very conservative upper bounds for dependencies and supported Python versions. Plus I have refind a system elsewhere using pip-tools16 and setup.py to manage these same issues that I find works very well for me.

This has been accomplished with the current release!

Windows Support

jnrl, to date, has always had decent Windows support. As I personally work on Windows, Windows will continue to have first class support.

Where this may show is tools beyond Python will need to be readily available on Windows before they’re used33, and the Windows Terminal is fairly limited in what in can do, at least compared with some Linux terminals.

Replace the Current Code of Conduct

I don’t much care for the current Code of Conduct17: it seems to be overly focused on the horrible things people might do to each other, and I’m not sure I want that to be the introduction people get to the project. I hope to find a Code of Conduct that focuses more on the positive things I hope people will do as they interact with each other and the project.

My replaced/current Code of Conduct is here (although this may be updated again in the future).

Open Stewardship Discussion

If the time comes when someone else is assuming stewardship for the project, I intend for those discussions to be help publicly18.

My History with the Project, and Why the Fork

This section is different: it is much less about the codebase and more focused on myself and my relationship to it. I warn you it is likely to be somewhat long and winding.

My Introduction to jrnl

Many years ago now, I was new in Python. At that time34 when I came across a problem that I thought programming might solve, I first went looking from a Python program that might solve it.

In looking for a way to manage the regular text notes I was taking at work, I found jrnl, which I eventually augmented with DayOne (Classic) for in-field note entry (on a work-supplied iPad) and Pelican for display.

Jrnl was more though: it was the object of my first Pull Request35, my first contribution to Open Source. My meagre help was appreciated and welcomed warmly, and so I returned often. I found jrnl to be incredibly useful in learning about Python best practices and code organization; here was a program that was more than a toy but simple enough (or logically separated) that I could attempt to understand it, to gork it, as a whole. I contributed in many places, but particularly around Windows support, DayOne backend support, and exports to Markdown (to be fed to Pelican).

In short jrnl became part of the code I used everyday.

jrnl Goes Into Hibernation

I have heard it said that software rewrites are a great way to kill a project. The reasons for this are multiple, but in short it (typically) saps the energy to update the legacy version even as bugs pile up, but the new thing can’t go into everyday use until it is feature-compatible with the legacy version, and the re-write always takes way more effort than initial estimates.

For reasons now forgotten36, a “v2” semi-rewrite was undertaken. And then it stalled. And then the project maintainer got a life19 and the re-write stalled even moreso.

The (Near) Endless Beta of v2, or the First Time I Should Have Forked

For me, initially, this wasn’t a big deal: I was often running a development build locally as I tinkered away with jrnl, so I just kept doing that. Also, I had just started working on my plugin system (for exporters first, but expecting it could easily be extended to importers and backends).

As the months of inactivity on the part of the maintainer stretched on, and pull requests grew staler, at some point I should have forked the project and “officially” released my development version. But I never did, because it seemed like a scary new thing to do20.

Invitation to Become a Maintainer

And then21 one day out of the blue, I got an email from the maintainer asking if I wanted to be a co-maintainer for jrnl! I was delighted, and promptly said yes. I was given commit access to the repo on GitHub (but, as far as I knew, no ability to push releases to PyPI), and then
not much happened. I reached out to the maintainer to suggest some plans, as it still felt like “his” project, but I never heard much back. And I was too timid to move forward without at least something from him. And I was busy with the rest of life too. After a few months, I realized my first plan wasn’t working and started thinking about how to try again to move the project forward, more on my own. In front of me was the push to v2.0, and a question of how to integrate my in-development plug-in system.

The Coup

And then on a one another day, again out of the blue, I got an unexpected email that I no longer had commit access to the jrnl repo. I searched the project for updates, including the issue tracker and came up with #591 where a transition to new maintainers was brought up; I don’t know why I wasn’t pinged on it. At the time22, I said I was happy to see new life in the project and to see it move forward. But it was unsettling that I’d missed the early discussions.

It also seemed odd to me that the two maintainer that stepped forward hadn’t seemed to be involved with the project at all before that point.

For a while, things were ok: a “version 2” was released that was very close to the v2 branch I was using at home, bugs started getting fixed regularly, and new releases continued to come out. But my plugins never made it into a release.

Things Fall Apart (aka I Get Frustrated)

But things under new management didn’t stay rosy.

One of the things they did was completely rewrite the Git history, and thus change all the commit hashes. This was a small but continueing annoyance (until I got a new computer), because everytime I would go to push changes to GitHub, my git client would complain about the “new” (old) tags it was trying to push, because it couldn’t find a commit with the matching hash.

But my two big annoyances were a re-write of the YAML exporter and the continual roadblocks to getting my plugin system merged in.

My plugin system has the longest history, having been started before the change in stewardship. Many times (after the change in stewardship), I created a pull request24 and the response would be to make various changes or to split it into smaller pieces; I would make the changes, and the cycle would continue. But there was never a plan presented that I felt I could successful complete, nor was I ever told the plugin system was unaligned with the vision they had for the project. I lost considerable enthusiasm for trying to get the plugins merged after rewriting the tests for the third time (as the underlying testing framework was changed).

The YAML exporter changes are what ultimately left me feeling frozen out of the project. Without much fanfare, the YAML exporter was changed, because someone25 felt that the resulting output wasn’t “true” or “pure” YAML. This is true, in a sense, because when I had originally written the exporter, it was designed to output files for Pelican with an assumed Markdown body and YAML front matter for metadata. At the request of the (then) maintainer, I called it the “YAML exporter”, partly because there was already a “Markdown exporter”. I didn’t realize it had been broken until I went to upgrade jrnl and render my Pelican site (which I use to search and read my notes) and it had just stopped working26. The change wasn’t noted (in a meaningful way) in the release notes, and the version numbers didn’t give an indication of when this change had happened30. I eventually figured out where the change had happened, explained the history of the exporter (that again, I had written years earlier) and proposed three solutions, each with a corresponding Pull Request: 1) return the YAML exporter to it’s previous output27, 2) duplicate the old exporter under a new name28, or 3) merge my plugin system, which would allow me to write my own plugin, and solve the problem myself. I was denied on all three, and told that I ‘didn’t understand the purpose or function of the YAML exporter’31 (yes, of the plugin I’d written37). The best I got was that they would reconsider what rose to the level of a “breaking change” when dealing with versioning32.

I Walk Away

The combined experience left me feeling very frustrated: jrnl was broken (to me) and all my efforts to fix it were forably rebuffed.

When I tried to express my frustrations at my inability to get a “working” version of jrnl, I was encouraged to take a mantal health break. While I appreciate the awareness of mental health, stepping away wouldn’t be helpful in this particlar case because the nothing would happen to fix my broken tooling (the cause of my frustrations). It seemed like the “right words”(tm) someone had picked up at a workshop, but the same workshop figured that the “right words”(tm) would solve everything without requiring a deeper look or deeper changes.

So I took my leave. I’ve been running an outdated version (v2.6) ever since, and because of the strictness of the Poetry metadata29, I can’t run it on anything newer than Python 3.9 (even as I’ve upgraded my “daily” Python to 3.11).

I Return (Sort Of); The Future and My Fears

So this marks my return. My “mental health break” is over. As I realize I can only change myself (and not others), I will do the work to fix the deeper issues (e.g. broken Pelican exports, lack of “modern” Python support) by managing my own fork. And so that is the work I’ll do.

Looking forward, if I’m the only one that uses my fork, that would be somewhat disappointing, but also fine. After all, I write software, first and foremost, for my own usecase and offer it to others as a gift. On the other hand, if a large part of the community moves here, I worry about being able to shepherd that community any better than the one I am leaving.

I worry too that either due to there being conflict at all, or that all of my writings are publically displayed, others will think less of my work or myself because of the failings they see there. It is indeed very hard to get through a disagreement like this without failing in some degree.

But it seems better to act, than to suffer in silence.

A Thank You

Thank you to all those who have worked to make jrnl as successful as it has been to date.

If you’ve gotten this far, thank you for reading this all. I hope you will join me, and I hope your experiences with minchin.jrnl are positive!

The header image was generated locally by Standard Diffusion XL.


  1. October 18, 2021 todo item: “fork jrnl” ↩

  2. main landing page at jrnl.sh, code at jrnl-orl/jrnl on GitHub, and jrnl on PyPI ↩

  3. https://github.com/jrnl-org/jrnl/tree/v2.6 ↩

  4. this varies by OS, so run jrnl --list to see where yours is stored. ↩

  5. Pull Request #1115 ↩

  6. I’ve started using Obsidian to take notes on my workstation and on my phone, and find it incredible. The backend format remains Markdown with basically Yaml front matter, but the format is slightly different from Pelican, and exported file layout differs. ↩

  7. The initial draft of this post was written before the v4 release, when there was talk of changing how the journal files were kept. v4 has since been released, and I’m unclear if that change ever happened, or what “breaking change” occurred that bumped the version number from 3 to 4 generally. In any case, if they change their format, with the plugin system it becomes fairly trivial to add a format-specific importer. ↩

  8. also: tiny file size, easy to put under version control, no proprietary format or data lock-in, portability across computing platforms, and generally are human readable ↩

  9. includes limitations on embedded text formating, storing pictures, videos, or sound recordings, and lacking standardized internal metadata ↩

  10. Markdown has several variants and many extensions. If you’re starting out, I recommend looking at the CommonMark specification. Note however that Markdown was originally designed as a shortcut for creating HTML documents, and so has no built in features for managing groups of Markdown documents. It is also deliberately limited in the formating options available, while officially supporting raw HTML as a fallback for anything missing. ↩

  11. ReST is older than Markdown and has always had a full specification. It was originally designed for the Python language documentation, and so was designed from the beginning to deal with the interplay between several text documents. Sadly, it doesn’t seem to have been adopted much outside of the Python ecosystem. ↩

  12. version 2.3.1 license (MIT); version 2.4 license (GPLv3), released April 18, 2020. ↩

  13. as I detailed at the time. But the issue (#918) announcing the change was merged within 20 minutes of being opened, so I’m not sure anything I could have said would have changed their minds. ↩

  14. this can and should be flushed out into a full blog post. But another day. ↩

  15. and I work on Windows. And I work with Python because Python has had good Windows support. ↩

  16. https://pip-tools.readthedocs.io/en/latest/ ↩

  17. jrnl-org/jrnl’s Code of Conduct: the Contributor Code of Conduct. ↩

  18. I imagine in the issue tracker for the project. ↩

  19. I think he got a job with or founded a startup, and I suspect he probably moved continents. ↩

  20. In the intervening time, I ended up releasing personal forks of several Pelican plugins. The process is no longer new or scary, but still can be a fair bit of work. And that experience has given me the confidence to go forward with this fork. ↩

  21. February 16, 2018 ↩

  22. July 5, 2019; my comment, at the time ↩

  23. my (pending) codename for these releases is ⚜ Fluer-de-lis. The reference is to the Lily, a flower that is a symbol of purity and rebirth. ↩

  24. see Pull Request #1216 and Discussion #1006 ↩

  25. Issue #1065 ↩

  26. in particular, Pelican could no longer find the metadata block and instead rendered the text of each entry as if it was a code block. ↩

  27. I’m sure I wrote the code to do this, but can’t find the Pull Request at the moment. Maybe I figured the suggestion wouldn’t go anyway. ↩

  28. Pull Request #1337 ↩

  29. https://github.com/jrnl-org/jrnl/blob/v2.6/pyproject.toml#L25 ↩

  30. perhaps because I was looking for a breaking change rather than a bug fix. ↩

  31. this comment and this one, in particular. I can’t find those exact quoted words, but that was the sentiment I was left with. ↩

  32. this comment ↩

  33. So no make. But Invoke, written in Python, works well for many of make‘s use cases. ↩

  34. and still today ↩

  35. Pull Request #110, dated November 27, 2013 ↩

  36. but likely recorded in the issue tracker ↩

  37. Pull Request #258, opened July 30, 2014. ↩

September 21, 2023 07:22 PM UTC