skip to navigation
skip to content

Planet Python

Last update: January 26, 2025 07:43 PM UTC

January 26, 2025


Real Python

Python Folium: Create Web Maps From Your Data

Folium is a Python library that lets you create interactive maps using the Leaflet JavaScript library. With Folium, you can visualize geospatial data on a map that you can share as a website. This tutorial guides you through creating an interactive choropleth map with Folium, showcasing how to bind data to GeoJSON layers and style it for intuitive viewing.

By the end of this tutorial, you’ll understand that:

  • Folium allows you to create interactive maps and save them as HTML files.
  • You can choose from different web map tiles to customize the map’s appearance.
  • Anchoring your map to a specific geolocation focuses the view on that area.
  • You can bind data to a GeoJSON layer, which enables for example the creation of a choropleth map.
  • You can customize colors and style elements of a map to enhance data visualization.

This tutorial helps you leverage Folium to visualize data with a geographical component, enhancing insights and creating shareable reports.

You’ll build the web map shown below, which displays the ecological footprint per capita of many countries and is based on a similar map on Wikipedia. Along the way, you’ll learn the basics of using Folium for data visualization.

If you work through the tutorial, then your interactive map will look like this in the end:

Countries by raw ecological footprint per capita

In this tutorial, you’ll create HTML files that you can serve online at a static web hosting service.

An alternative workflow is to use Folium inside of a Jupyter notebook. In that case, the Folium library will render your maps directly in the Jupyter notebook, which gives you a good opportunity to visually explore a geographical dataset or include a map in your data science report.

If you click below to download the associated materials to this tutorial, then you’ll also get a Jupyter notebook set up with the code of this tutorial. Run the notebook to see how well Folium and Jupyter can play together:

Free Source Code: Click here to download the free source code for building web maps in Python with Folium.

Take the Quiz: Test your knowledge with our interactive “Python Folium: Create Web Maps From Your Data” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Python Folium: Create Web Maps From Your Data

Python’s Folium library gives you access to the mapping strengths of the Leaflet JavaScript library through a Python API. It allows you to create interactive geographic visualizations that you can share as a website.

Install Folium

To get started, create and activate a virtual environment and install folium and pandas. You can use the platform switcher below to see the relevant commands for your operating system:

Windows PowerShell
PS> python -m venv venv
PS> venv\Scripts\activate
(venv) PS> python -m pip install folium pandas
Copied!
Shell
$ python -m venv venv
$ source venv/bin/activate
(venv) $ python -m pip install folium pandas
Copied!

You can use many features of Folium without pandas. However, in this tutorial you’ll eventually create a choropleth map using folium.Choropleth, which takes a pandas DataFrame or Series object as one of its inputs.

Create and Style a Map

A useful and beginner-friendly feature of Folium is that you can create a map with only three lines of code. The map looks good by default because the underlying Leaflet JavaScript library works well with a number of different tile providers, which provide high-quality map backgrounds for your web maps.

Note: A tile for a web map is an image or vector data file that represents a specific geographical area. Tiled web maps seamlessly join multiple tiles to present a geographical area that’s larger than a single tile.

Additionally, the library boasts attractive default styles for map features and gives you many options to customize the map to fit your needs exactly.

Display Your Web Map Tiles—in Style!

You want to show data on a world map, so you don’t even need to worry about providing any specific geolocation yet. Open up a new Python file in your favorite text editor and create a tiled web map with three lines of code:

Python
import folium

m = folium.Map()
m.save("footprint.html")
Copied!

When you run your script, Python creates a new HTML file in your working directory that displays an empty world map with the default settings provided by Folium. Open the file in your browser by double-clicking on it and take a look:

Read the full article at https://realpython.com/python-folium-web-maps-from-data/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 26, 2025 02:00 PM UTC

Lists vs Tuples in Python

Python lists and tuples are sequence data types that store ordered collections of items. While lists are mutable and ideal for dynamic, homogeneous data, tuples are immutable, making them suitable for fixed, heterogeneous data. Read on to compare tuples vs. lists.

By the end of this tutorial, you’ll understand that:

  • Lists are mutable, allowing you to modify their content, while tuples are immutable, meaning you can’t change them after creation.
  • You should prefer tuples when you need an immutable sequence, such as function return values or constant data.
  • You can create a list from a tuple using the list() constructor, which converts the tuple into a mutable list.
  • Tuples are immutable, and this characteristic supports their use in scenarios where data should remain unchanged.

In this tutorial, you’ll learn to define, manipulate, and choose between these two data structures. To get the most out of this tutorial, you should know the basics of Python programming, including how to define variables.

Get Your Code: Click here to download the free sample code that shows you how to work with lists and tuples in Python.

Take the Quiz: Test your knowledge with our interactive “Lists vs Tuples in Python” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Lists vs Tuples in Python

Challenge yourself with this quiz to evaluate and deepen your understanding of Python lists and tuples. You'll explore key concepts, such as how to create, access, and manipulate these data types, while also learning best practices for using them efficiently in your code.

Getting Started With Python Lists and Tuples

In Python, a list is a collection of arbitrary objects, somewhat akin to an array in many other programming languages but more flexible. To define a list, you typically enclose a comma-separated sequence of objects in square brackets ([]), as shown below:

Python
>>> colors = ["red", "green", "blue", "yellow"]

>>> colors
['red', 'green', 'blue', 'yellow']
Copied!

In this code snippet, you define a list of colors using string objects separated by commas and enclose them in square brackets.

Similarly, tuples are also collections of arbitrary objects. To define a tuple, you’ll enclose a comma-separated sequence of objects in parentheses (()), as shown below:

Python
>>> person = ("Jane Doe", 25, "Python Developer", "Canada")

>>> person
('Jane Doe', 25, 'Python Developer', 'Canada')
Copied!

In this example, you define a tuple with data for a given person, including their name, age, job, and base country.

Up to this point, it may seem that lists and tuples are mostly the same. However, there’s an important difference:

Feature List Tuple
Is an ordered sequence
Can contain arbitrary objects
Can be indexed and sliced
Can be nested
Is mutable

Both lists and tuples are sequence data types, which means they can contain objects arranged in order. You can access those objects using an integer index that represents their position in the sequence.

Even though both data types can contain arbitrary and heterogeneous objects, you’ll commonly use lists to store homogeneous objects and tuples to store heterogeneous objects.

Note: In this tutorial, you’ll see the terms homogeneous and heterogeneous used to express the following ideas:

  • Homogeneous: Objects of the same data type or the same semantic meaning, like a series of animals, fruits, colors, and so on.
  • Heterogeneous: Objects of different data types or different semantic meanings, like the attributes of a car: model, color, make, year, fuel type, and so on.

You can perform indexing and slicing operations on both lists and tuples. You can also have nested lists and nested tuples or a combination of them, like a list of tuples.

The most notable difference between lists and tuples is that lists are mutable, while tuples are immutable. This feature distinguishes them and drives their specific use cases.

Essentially, a list doesn’t have a fixed length since it’s mutable. Therefore, it’s natural to use homogeneous elements to have some structure in the list. A tuple, on the other hand, has a fixed length so the position of elements can have meaning, supporting heterogeneous data.

Creating Lists in Python

In many situations, you’ll define a list object using a literal. A list literal is a comma-separated sequence of objects enclosed in square brackets:

Python
>>> countries = ["United States", "Canada", "Poland", "Germany", "Austria"]

>>> countries
['United States', 'Canada', 'Poland', 'Germany', 'Austria']
Copied!

In this example, you create a list of countries represented by string objects. Because lists are ordered sequences, the values retain the insertion order.

Read the full article at https://realpython.com/python-lists-tuples/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 26, 2025 02:00 PM UTC

Python's "in" and "not in" Operators: Check for Membership

Python’s in and not in operators allow you to quickly check if a given value is or isn’t part of a collection of values. This type of check is generally known as a membership test in Python. Therefore, these operators are known as membership operators.

By the end of this tutorial, you’ll understand that:

  • The in operator in Python is a membership operator used to check if a value is part of a collection.
  • You can write not in in Python to check if a value is absent from a collection.
  • Python’s membership operators work with several data types like lists, tuples, ranges, and dictionaries.
  • You can use operator.contains() as a function equivalent to the in operator for membership testing.
  • You can support in and not in in custom classes by implementing methods like .__contains__(), .__iter__(), or .__getitem__().

To get the most out of this tutorial, you’ll need basic knowledge of Python, including built-in data types such as lists, tuples, ranges, strings, sets, and dictionaries. You’ll also need to know about Python generators, comprehensions, and classes.

Source Code: Click here to download the free source code that you’ll use to perform membership tests in Python with in and not in.

Getting Started With Membership Tests in Python

Sometimes you need to find out whether a value is present in a collection of values or not. In other words, you need to check if a given value is or is not a member of a collection of values. This kind of check is commonly known as a membership test.

Arguably, the natural way to perform this kind of check is to iterate over the values and compare them with the target value. You can do this with the help of a for loop and a conditional statement.

Consider the following is_member() function:

Python
>>> def is_member(value, iterable):
...     for item in iterable:
...         if value is item or value == item:
...             return True
...     return False
...
Copied!

This function takes two arguments, the target value and a collection of values, which is generically called iterable. The loop iterates over iterable while the conditional statement checks if the target value is equal to the current value. Note that the condition checks for object identity with is or for value equality with the equality operator (==). These are slightly different but complementary tests.

If the condition is true, then the function returns True, breaking out of the loop. This early return short-circuits the loop operation. If the loop finishes without any match, then the function returns False:

Python
>>> is_member(5, [2, 3, 5, 9, 7])
True

>>> is_member(8, [2, 3, 5, 9, 7])
False
Copied!

The first call to is_member() returns True because the target value, 5, is a member of the list at hand, [2, 3, 5, 9, 7]. The second call to the function returns False because 8 isn’t present in the input list of values.

Membership tests like the ones above are so common and useful in programming that Python has dedicated operators to perform these types of checks. You can get to know the membership operators in the following table:

Operator Description Syntax
in Returns True if the target value is present in a collection of values. Otherwise, it returns False. value in collection
not in Returns True if the target value is not present in a given collection of values. Otherwise, it returns False. value not in collection

As with Boolean operators, Python favors readability by using common English words instead of potentially confusing symbols as operators.

Note: Don’t confuse the in keyword when it works as the membership operator with the in keyword in the for loop syntax. They have entirely different meanings. The in operator checks if a value is in a collection of values, while the in keyword in a for loop indicates the iterable that you want to draw from.

Like many other operators, in and not in are binary operators. That means you can create expressions by connecting two operands. In this case, those are:

  1. Left operand: The target value that you want to look for in a collection of values
  2. Right operand: The collection of values where the target value may be found

The syntax of a membership test looks something like this:

Python Syntax
value in collection

value not in collection
Copied!

In these expressions, value can be any Python object. Meanwhile, collection can be any data type that can hold collections of values, including lists, tuples, strings, sets, and dictionaries. It can also be a class that implements the .__contains__() method or a user-defined class that explicitly supports membership tests or iteration.

If you use the in and not in operators correctly, then the expressions that you build with them will always evaluate to a Boolean value. In other words, those expressions will always return either True or False. On the other hand, if you try and find a value in something that doesn’t support membership tests, then you’ll get a TypeError. Later, you’ll learn more about the Python data types that support membership tests.

Now that you know what membership operators are, it’s time to learn the basics of how they work.

Read the full article at https://realpython.com/python-in-operator/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 26, 2025 02:00 PM UTC

Python's Mutable vs Immutable Types: What's the Difference?

Python’s mutable objects, such as lists and dictionaries, allow you to change their value or data directly without affecting their identity. In contrast, immutable objects, like tuples and strings, don’t allow in-place modifications. Instead, you’ll need to create new objects of the same type with different values.

By the end of this tutorial, you’ll understand that:

  • The difference between mutable and immutable objects is that mutable objects can be modified, while immutable objects can’t be altered once created.
  • Python lists are mutable, allowing you to change, add, or remove elements.
  • Strings in Python are immutable, meaning you can’t change their content after creation.
  • To check if an object is mutable, you can try altering its contents. If you succeed without an error, it’s mutable.

In Python, mutability is a characteristic that may profoundly influence your decision when choosing which data type to use for solving a programming problem. Therefore, you need to know how mutable and immutable objects work in Python.

To dive smoothly into this fundamental Python topic, you should be familiar with how variables work in Python. You should also know the basics of Python’s built-in data types, such as numbers, strings, tuples, lists, dictionaries, sets, and others. Finally, knowing how object-oriented programming works in Python is also a good starting point.

Free Sample Code: Click here to download the free sample code that you’ll use to explore mutable vs immutable data types in Python.

Mutability vs Immutability

In programming, you have an immutable object if you can’t change the object’s state after you’ve created it. In contrast, a mutable object allows you to modify its internal state after creation. In short, whether you’re able to change an object’s state or contained data is what defines if that object is mutable or immutable.

Immutable objects are common in functional programming, while mutable objects are widely used in object-oriented programming. Because Python is a multiparadigm programming language, it provides mutable and immutable objects for you to choose from when solving a problem.

To understand how mutable and immutable objects work in Python, you first need to understand a few related concepts. To kick things off, you’ll take a look at variables and objects.

Variables and Objects

In Python, variables don’t have an associated type or size, as they’re labels attached to objects in memory. They point to the memory position where concrete objects live. In other words, a Python variable is a name that refers to or holds a reference to a concrete object. In contrast, Python objects are concrete pieces of information that live in specific memory positions on your computer.

The main takeaway here is that variables and objects are two different animals in Python:

  • Variables hold references to objects.
  • Objects live in concrete memory positions.

Both concepts are independent of each other. However, they’re closely related. Once you’ve created a variable with an assignment statement, then you can access the referenced object throughout your code by using the variable name. If the referenced object is mutable, then you can also perform mutations on it through the variable. Mutability or immutability is intrinsic to objects rather than to variables.

However, if the referenced object is immutable, then you won’t be able to change its internal state or contained data. You’ll just be able to make your variable reference a different object that, in Python, may or may not be of the same type as your original object.

If you don’t have a reference (variable) to an object, then you can’t access that object in your code. If you lose or remove all the references to a given object, then Python will garbage-collect that object, freeing the memory for later use.

Now that you know that there are differences between variables and objects, you need to learn that all Python objects have three core properties: identity, type, and value.

Objects, Value, Identity, and Type

In Python, everything is an object. For example, numbers, strings, functions, classes, and modules are all objects. Every Python object has three core characteristics that define it at a foundational level. These characteristics are:

  1. Value
  2. Identity
  3. Type

Arguably, the value is probably the most familiar object characteristic that you’ve dealt with. An object’s value consists of the concrete piece or pieces of data contained in the object itself. A classic example is a numeric value like an integer or floating-point number:

Python
>>> 42
42
>>> isinstance(42, int)
True

>>> 3.14
3.14
>>> isinstance(3.14, float)
True
Copied!

These numeric values, 42 and 3.14, are both objects. The first number is an instance of the built-in int class, while the second is an instance of float. In both examples, you confirm the object’s type using the built-in isinstance() function.

Note: In this tutorial, you’ll use the term value to refer to an object’s data. Your objects will have custom and meaningful values only if you add those values yourself. Sometimes, you’ll create or find objects that don’t have meaningful values. However, all Python objects contain built-in attributes that also hold data. So, in a way, each Python object has an implicit value.

For example, all Python objects will have special methods and attributes that the language adds automatically under the hood:

Python
>>> obj = object()
>>> dir(obj)
[
    '__class__',
    '__delattr__',
    ...
    '__str__',
    '__subclasshook__'
]
Copied!

In this example, you created a generic object using the built-in object class. You haven’t added any attributes to the object. So, from your programming perspective, this object doesn’t have a meaningful value. So, you may think it only has identity and type, which you’ll explore in just a moment.

However, because all objects have built-in attributes, you can mutate them by adding new attributes and data dynamically, as you’ll learn in the Mutability in Custom Classes section of this tutorial.

Read the full article at https://realpython.com/python-mutable-vs-immutable-types/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 26, 2025 02:00 PM UTC

Python's zipfile: Manipulate Your ZIP Files Efficiently

Python’s zipfile module allows you to efficiently manipulate ZIP files, a standard format for compressing and archiving data. With this module, you can create, read, write, extract, and list files within ZIP archives.

By the end of this tutorial, you’ll understand that:

  • Python’s zipfile module lets you create, read, and modify ZIP files.
  • You can extract specific files or all contents from a ZIP archive using zipfile.
  • Reading metadata about ZIP contents is straightforward with zipfile.
  • You can compress files in ZIP archives using different algorithms.
  • PyZipFile allows you to bundle Python modules and packages into ZIP files for distribution.

This tutorial teaches you how to handle ZIP files using Python’s zipfile, making file management and data exchange over networks more efficient. Understanding these concepts will enhance your ability to work with compressed files in Python, optimizing both data storage and transfer.

To get the most out of this tutorial, you should know the basics of working with files, using the with statement, handling file system paths with pathlib, and working with classes and object-oriented programming.

To get the files and archives that you’ll use to code the examples in this tutorial, click the link below:

Get Materials: Click here to get a copy of the files and archives that you’ll use to run the examples in this zipfile tutorial.

Getting Started With ZIP Files

ZIP files are a well-known and popular tool in today’s digital world. These files are fairly popular and widely used for cross-platform data exchange over computer networks, notably the Internet.

You can use ZIP files for bundling regular files together into a single archive, compressing your data to save some disk space, distributing your digital products, and more. In this tutorial, you’ll learn how to manipulate ZIP files using Python’s zipfile module.

Because the terminology around ZIP files can be confusing at times, this tutorial will stick to the following conventions regarding terminology:

Term Meaning
ZIP file, ZIP archive, or archive A physical file that uses the ZIP file format
File A regular computer file
Member file A file that is part of an existing ZIP file

Having these terms clear in your mind will help you avoid confusion while you read through the upcoming sections. Now you’re ready to continue learning how to manipulate ZIP files efficiently in your Python code!

What Is a ZIP File?

You’ve probably already encountered and worked with ZIP files. Yes, those with the .zip file extension are everywhere! ZIP files, also known as ZIP archives, are files that use the ZIP file format.

PKWARE is the company that created and first implemented this file format. The company put together and maintains the current format specification, which is publicly available and allows the creation of products, programs, and processes that read and write files using the ZIP file format.

The ZIP file format is a cross-platform, interoperable file storage and transfer format. It combines lossless data compression, file management, and data encryption.

Data compression isn’t a requirement for an archive to be considered a ZIP file. So you can have compressed or uncompressed member files in your ZIP archives. The ZIP file format supports several compression algorithms, though Deflate is the most common. The format also supports information integrity checks with CRC32.

Even though there are other similar archiving formats, such as RAR and TAR files, the ZIP file format has quickly become a common standard for efficient data storage and for data exchange over computer networks.

ZIP files are everywhere. For example, office suites such as Microsoft Office and Libre Office rely on the ZIP file format as their document container file. This means that .docx, .xlsx, .pptx, .odt, .ods, and .odp files are actually ZIP archives containing several files and folders that make up each document. Other common files that use the ZIP format include .jar, .war, and .epub files.

You may be familiar with GitHub, which provides web hosting for software development and version control using Git. GitHub uses ZIP files to package software projects when you download them to your local computer. For example, you can download the exercise solutions for Python Basics: A Practical Introduction to Python 3 book in a ZIP file, or you can download any other project of your choice.

ZIP files allow you to aggregate, compress, and encrypt files into a single interoperable and portable container. You can stream ZIP files, split them into segments, make them self-extracting, and more.

Why Use ZIP Files?

Knowing how to create, read, write, and extract ZIP files can be a useful skill for developers and professionals who work with computers and digital information. Among other benefits, ZIP files allow you to:

  • Reduce the size of files and their storage requirements without losing information
  • Improve transfer speed over the network due to reduced size and single-file transfer
  • Pack several related files together into a single archive for efficient management
  • Bundle your code into a single archive for distribution purposes
  • Secure your data by using encryption, which is a common requirement nowadays
  • Guarantee the integrity of your information to avoid accidental and malicious changes to your data

Read the full article at https://realpython.com/python-zipfile/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 26, 2025 02:00 PM UTC

January 25, 2025


Real Python

Python's raise: Effectively Raising Exceptions in Your Code

When you use the raise statement in Python to raise (or throw) an exception, you signal an error or an unusual condition in your program. With raise, you can trigger both built-in and custom exceptions. You can also include custom messages for more clarity or even re-raise exceptions to add context or handle further processing.

By the end of this tutorial, you’ll understand that:

  • Raising an exception in Python signals an error condition, halting normal program flow.
  • You use raise to initiate exceptions for error handling or to propagate existing exceptions.
  • You can raise custom exceptions by defining new exception classes derived from Exception.
  • The difference between raise and assert lies in their use. You use assert for debugging, while raise is used to signal runtime errors.
  • You can re-raise an exception by using a bare raise within an except block to preserve the original traceback.

Learning about the raise statement will allow you to handle errors and exceptional situations effectively in your code. By knowing how to use it, you’ll be able to develop more robust programs and improve the quality of your code.

To get the most out of this tutorial, you should understand the fundamentals of Python, including variables, data types, conditional statements, exception handling, and classes.

Free Bonus: Click here to download the sample code that you’ll use to gracefully raise and handle exceptions in your Python code.

Take the Quiz: Test your knowledge with our interactive “Python's raise: Effectively Raising Exceptions in Your Code” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Python's raise: Effectively Raising Exceptions in Your Code

In this quiz, you'll test your understanding of how to raise exceptions in Python using the raise statement. This knowledge will help you handle errors and exceptional situations in your code, leading to more robust programs and higher-quality code.

Handling Exceptional Situations in Python

Exceptions play a fundamental role in Python. They allow you to handle errors and exceptional situations in your code. But what is an exception? An exception represents an error or indicates that something is going wrong. Some programming languages, such as C, and Go, encourage you to return error codes, which you check. In contrast, Python encourages you to raise exceptions, which you handle.

Note: In Python, not all exceptions are errors. The built-in StopIteration exception is an excellent example of this. Python internally uses this exception to terminate the iteration over iterators. Python exceptions that represent errors have the Error suffix attached to their names.

Python also has a specific category of exceptions that represent warnings to the programmer. Warnings come in handy when you need to alert the user of some condition in a program. However, that condition may not warrant raising an exception and terminating the program. A common example of a warning is DeprecationWarning, which appears when you use deprecated features.

When a problem occurs in a program, Python automatically raises an exception. For example, watch what happens if you try to access a nonexistent index in a list object:

Python
>>> colors = [
...     "red",
...     "orange",
...     "yellow",
...     "green",
...     "blue",
...     "indigo",
...     "violet"
... ]

>>> colors[10]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
Copied!

In this example, your colors list doesn’t have a 10 index. Its indices go from 0 to 6, covering your seven colors. So, if you try to get index number 10, then you get an IndexError exception telling you that your target index is out of range.

Note: In the example above, Python raised the exception on its own, which it’ll only do with built-in exceptions. You, as a programmer, have the option to raise built-in or custom exceptions, as you’ll learn in the section Choosing the Exception to Raise: Built-in vs Custom.

Every raised exception has a traceback, also known as a stack trace, stack traceback, or backtrace, among other names. A traceback is a report containing the sequence of calls and operations that traces down to the current exception.

In Python, the traceback header is Traceback (most recent call last) in most situations. Then you’ll have the actual call stack and the exception name followed by its error message.

Note: Since the introduction of the new PEG parser in Python 3.9, there’s been an ongoing effort to make the error messages in tracebacks more helpful and specific. This effort continues to produce new results, and Python 3.12 and Python 3.13 has incorporated even better error messages.

Exceptions will cause your program to terminate unless you handle them using a tryexcept block:

Python
>>> try:
...     colors[10]
... except IndexError:
...     print("Your list doesn't have that index :-(")
...
Your list doesn't have that index :-(
Copied!

The first step in handling an exception is to predict which exceptions can happen. If you don’t do that, then you can’t handle the exceptions, and your program will crash. In that situation, Python will print the exception traceback so that you can figure out how to fix the problem. Sometimes, you must let the program fail in order to discover the exceptions that it raises.

In the above example, you know beforehand that indexing a list with an index beyond its range will raise an IndexError exception. So, you’re ready to catch and handle that specific exception. The try block takes care of catching exceptions. The except clause specifies the exception that you’re predicting, and the except code block allows you to take action accordingly. The whole process is known as exception handling.

If your code raises an exception in a function but doesn’t handle it there, then the exception propagates to where you called function. If your code doesn’t handle it there either, then it continues propagating until it reaches the main program. If there’s no exception handler there, then the program halts with an exception traceback.

Exceptions are everywhere in Python. Virtually every module in the standard library uses them. Python will raise exceptions in many different circumstances. The Python documentation states that:

Exceptions are a means of breaking out of the normal flow of control of a code block in order to handle errors or other exceptional conditions. An exception is raised at the point where the error is detected; it may be handled by the surrounding code block or by any code block that directly or indirectly invoked the code block where the error occurred. (Source)

In short, Python automatically raises exceptions when an error occurs during a program’s execution. Python also allows you to raise exceptions on demand using the raise keyword. This keyword lets you handle your program’s errors in a more controlled manner.

Read the full article at https://realpython.com/python-raise-exception/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 25, 2025 02:00 PM UTC

Executing Python Scripts With a Shebang

In shell scripts, the shebang line (#!) specifies the path to the interpreter that should execute the file. You can place it at the top of your Python file to tell the shell how to run your script, allowing you to execute the script directly without typing python before the script name. The shebang is essential for Unix-like systems but ignored on Windows unless using specific compatibility layers.

By the end of this tutorial, you’ll understand that:

  • A shebang specifies the path to the Python interpreter in scripts, allowing direct execution.
  • You should include a shebang when a script needs direct execution, but not in import-only modules.
  • Best practices for shebangs include using /usr/bin/env for portability and ensuring the script is executable.
  • Shebangs have limitations, such as being ignored on Windows without compatibility layers like WSL.

To proceed, you should have basic familiarity with the command line and know how to run Python scripts from it. You can also download the supporting materials for this tutorial to follow along with the code examples:

Free Sample Code: Click here to download the free sample code that you’ll use to execute Python scripts with a shebang.

What’s a Shebang, and When Should You Use It?

In short, a shebang is a special kind of comment that you may include in your source code to tell the operating system’s shell where to find the interpreter for the rest of the file:

Python
#!/usr/bin/python3

print("Hello, World!")
Copied!

If you’re using a shebang, it must appear on the first line in your script, and it has to start with a hash sign (#) followed by an exclamation mark (!), colloquially known as the bang, hence the name shebang. The choice of the hash sign to begin this special sequence of characters wasn’t accidental, as many scripting languages use it for inline comments.

You should make sure you don’t put any other comments before the shebang line if you want it to work correctly, or else it won’t be recognized! After the exclamation mark, specify an absolute path to the relevant code interpreter, such as Python. Providing a relative path will have no effect, unfortunately.

Note: The shebang is only recognized by shells, such as Z shell or Bash, running on Unix-like operating systems, including macOS and Linux distributions. It bears no particular meaning in the Windows terminal, which treats the shebang as an ordinary comment by ignoring it.

You can get the shebang to work on Windows by installing the Windows Subsystem for Linux (WSL) that comes with a Unix shell. Alternatively, Windows lets you make a global file association between a file extension like .py and a program, such as the Python interpreter, to achieve a similar effect.

It’s not uncommon to combine a shebang with the name-main idiom, which prevents the main block of code from running when someone imports the file from another module:

Python
#!/usr/bin/python3

if __name__ == "__main__":
    print("Hello, World!")
Copied!

With this conditional statement, Python will call the print() function only when you run this module directly as a script—for example, by providing its path to the Python interpreter:

Shell
$ python3 /path/to/your/script.py
Hello, World!
Copied!

As long as the script’s content starts with a correctly defined shebang line and your system user has permission to execute the corresponding file, you can omit the python3 command to run that script:

Shell
$ /path/to/your/script.py
Hello, World!
Copied!

A shebang is only relevant to runnable scripts that you wish to execute without explicitly specifying the program to run them through. You wouldn’t typically put a shebang in a Python module that only contains function and class definitions meant for importing from other modules. Therefore, use the shebang when you don’t want to prefix the command that runs your Python script with python or python3.

Note: In the old days of Python, the shebang line would sometimes appear alongside another specially formatted comment described in PEP 263:

Python
#!/usr/bin/python3
# -*- coding: utf-8 -*-

if __name__ == "__main__":
    print("Grüß Gott")
Copied!

The highlighted line used to be necessary to tell the interpreter which character encoding it should use to read your source code correctly, as Python defaulted to ASCII. However, this was only important when you directly embedded non-Latin characters, such as ü or ß, in your code.

This special comment is irrelevant today because modern Python versions use the universal UTF-8 encoding, which can handle such characters with ease. Nevertheless, it’s always preferable to replace tricky characters with their encoded representations using Unicode literals:

Python
>>> "Grüß Gott".encode("unicode_escape")
b'Gr\\xfc\\xdf Gott'
Copied!

Your foreign colleagues who have different keyboard layouts will thank you for that!

Now that you have a high-level understanding of what a shebang is and when to use it, you’re ready to explore it in more detail. In the next section, you’ll take a closer look at how it works.

How Does a Shebang Work?

Normally, to run a program in the terminal, you must provide the full path to a particular binary executable or the name of a command present in one of the directories listed on the PATH environment variable. One or more command-line arguments may follow this path or command:

Shell
$ /usr/bin/python3 -c 'print("Hello, World!")'
Hello, World!

$ python3 -c 'print("Hello, World!")'
Hello, World!
Copied!

Here, you run the Python interpreter in a non-interactive mode against a one-liner program passed through the -c option. In the first case, you provide an absolute path to python3, while in the second case, you rely on the fact that the parent folder, /usr/bin/, is included on the search path by default. Your shell can find the Python executable, even if you don’t provide the full path, by looking through the directories on the PATH variable.

Note: If multiple commands with the same name exist in more than one directory listed on the PATH variable, then your shell will execute the first it can find. As a result, the outcome of running a command without explicitly specifying the corresponding path may sometimes be surprising. It’ll depend on the order of directories in your PATH variable. However, this can be useful, as you’ll find out later.

Read the full article at https://realpython.com/python-shebang/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 25, 2025 02:00 PM UTC

How to Flush the Output of the Python Print Function

Python’s flush parameter in the print() function allows you to control when to empty the output data buffer, ensuring your output appears immediately. This is useful when building visual progress indicators or when piping logs to another application in real-time.

By default, print() output is line-buffered in interactive environments and block-buffered otherwise. You can override this behavior by using the flush=True parameter to force the buffer to clear immediately.

By the end of this tutorial, you’ll understand that:

  • Flush in coding refers to emptying the data buffer to ensure immediate output.
  • flush=True in print() forces the buffer to clear immediately.
  • Clearing output involves managing buffer behavior, typically with newlines.
  • Turning off print buffering can be achieved using the -u option or PYTHONUNBUFFERED.

By repeatedly running a short code snippet that you change only slightly, you’ll see that if you run print() with its default arguments, then its execution is line-buffered in interactive mode, and block-buffered otherwise.

You’ll get a feel for what all of that means by exploring the code practically. But before you dive into changing output stream buffering in Python, it’s helpful to revisit how it happens by default, and understand why you might want to change it.

Free Sample Code: Click here to download the free sample code that you’ll use to dive deep into flushing the output of the Python print function.

Understand How Python Buffers Output

When you make a write call to a file-like object, Python buffers the call by default—and that’s a good idea! Disk write and read operations are slow in comparison to random-access memory (RAM) access. When your script makes fewer system calls for write operations by batching characters in a RAM data buffer and writing them all at once to disk with a single system call, then you can save a lot of time.

To put the use case for buffering into a real-world context, think of traffic lights as buffers for car traffic. If every car crossed an intersection immediately upon arrival, it would end in gridlock. That’s why the traffic lights buffer traffic from one direction while the other direction flushes.

Note: Data buffers are generally size-based, not time-based, which is where the traffic analogy breaks down. In the context of a data buffer, the traffic lights would switch if a certain number of cars were queued up and waiting.

However, there are situations when you don’t want to wait for a data buffer to fill up before it flushes. Imagine that there’s an ambulance that needs to get past the crossroads as quickly as possible. You don’t want it to wait at the traffic lights until there’s a certain number of cars queued up.

In your program, you usually want to flush the data buffer right away when you need real-time feedback on code that has executed. Here are a couple of use cases for immediate flushing:

  • Instant feedback: In an interactive environment, such as a Python REPL or a situation where your Python script writes to a terminal

  • File monitoring: In a situation where you’re writing to a file-like object, and the output of the write operation gets read by another program while your script is still executing—for example, when you’re monitoring a log file

In both cases, you need to read the generated output as soon as it generates, and not only when enough output has assembled to flush the data buffer.

There are many situations where buffering is helpful, and there are some situations where too much buffering can be a disadvantage. Therefore, there are different types of data buffering that you can implement where they fit best:

  • Unbuffered means that there’s no data buffer. Every byte creates a new system call and gets written independently.

  • Line-buffered means that there’s a data buffer that collects information in memory, and once it encounters a newline character (\n), the data buffer flushes and writes the whole line in one system call.

  • Fully-buffered (block-buffered) means that there’s a data buffer of a specific size, which collects all the information that you want to write. Once it’s full, it flushes and sends all its contents onward in a single system call.

Python uses block buffering as a default when writing to file-like objects. However, it executes line-buffered if you’re writing to an interactive environment.

To better understand what that means, write a Python script that simulates a countdown:

Python countdown.py
from time import sleep

for second in range(3, 0, -1):
    print(second)
    sleep(1)
print("Go!")
Copied!

By default, each number shows up right when print() is called in the script. But as you develop and tweak your countdown timer, you might run into a situation where all your output gets buffered. Buffering the whole countdown and printing it all at once when the script finishes would lead to a lot of confusion for the athletes waiting at the start line!

So how can you make sure that you won’t run into data buffering issues as you develop your Python script?

Add a Newline for Python to Flush Print Output

If you’re running a code snippet in a Python REPL or executing it as a script directly with your Python interpreter, then you won’t run into any issues with the script shown above.

In an interactive environment, the standard output stream is line-buffered. This is the output stream that print() writes to by default. You’re working with an interactive environment any time that your output will display in a terminal. In this case, the data buffer flushes automatically when it encounters a newline character ("\n"):

Read the full article at https://realpython.com/python-flush-print-output/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 25, 2025 02:00 PM UTC

How to Download Files From URLs With Python

Python makes it straightforward to download files from a URL with its robust set of libraries. For quick tasks, you can use the built-in urllib module or the requests library to fetch and save files. When working with large files, streaming data in chunks can help save memory and improve performance.

You can also perform parallel file downloads using ThreadPoolExecutor for multithreading or the aiohttp library for asynchronous tasks. These approaches allow you to handle multiple downloads concurrently, significantly reducing the total download time if you’re handling many files.

By the end of this tutorial, you’ll understand that:

  • You can use Python to download files with libraries like urllib and requests.
  • To download a file using a URL in Python, you can use urlretrieve() or requests.get().
  • To extract data from a URL in Python, you use the response object from requests.
  • To download a CSV file from a URL in Python, you may need to specify the format in the URL or query parameters.

In this tutorial, you’ll be downloading a range of economic data from the World Bank Open Data platform. To get started on this example project, go ahead and grab the sample code below:

Free Bonus: Click here to download your sample code for downloading files from the Web with Python.

Facilitating File Downloads With Python

While it’s possible to download files from URLs using traditional command-line tools, Python provides several libraries that facilitate file retrieval. Using Python to download files offers several advantages.

One advantage is flexibility, as Python has a rich ecosystem of libraries, including ones that offer efficient ways to handle different file formats, protocols, and authentication methods. You can choose the most suitable Python tools to accomplish the task at hand and fulfill your specific requirements, whether you’re downloading from a plain-text CSV file or a complex binary file.

Another reason is portability. You may encounter situations where you’re working on cross-platform applications. In such cases, using Python is a good choice because it’s a cross-platform programming language. This means that Python code can run consistently across different operating systems, such as Windows, Linux, and macOS.

Using Python also offers the possibility of automating your processes, saving you time and effort. Some examples include automating retries if a download fails, retrieving and saving multiple files from URLs, and processing and storing your data in designated locations.

These are just a few reasons why downloading files using Python is better than using traditional command-line tools. Depending on your project requirements, you can choose the approach and library that best suits your needs. In this tutorial, you’ll learn approaches to some common scenarios requiring file retrievals.

Downloading a File From a URL in Python

In this section, you’ll learn the basics of downloading a ZIP file containing gross domestic product (GDP) data from the World Bank Open Data platform. You’ll use two common tools in Python, urllib and requests, to download GDP by country.

While the urllib package comes with Python in its standard library, it has some limitations. So, you’ll also learn to use a popular third-party library, requests, that offers more features for making HTTP requests. Later in the tutorial, you’ll see additional functionalities and use cases.

Using urllib From the Standard Library

Python ships with a package called urllib, which provides a convenient way to interact with web resources. It has a straightforward and user-friendly interface, making it suitable for quick prototyping and smaller projects. With urllib, you can perform different tasks dealing with network communication, such as parsing URLs, sending HTTP requests, downloading files, and handling errors related to network operations.

As a standard library package, urllib has no external dependencies and doesn’t require installing additional packages, making it a convenient choice. For the same reason, it’s readily accessible for development and deployment. It’s also cross-platform compatible, meaning you can write and run code seamlessly using the urllib package across different operating systems without additional dependencies or configuration.

The urllib package is also very versatile. It integrates well with other modules in the Python standard library, such as re for building and manipulating regular expressions, as well as json for working with JSON data. The latter is particularly handy when you need to consume JSON APIs.

In addition, you can extend the urllib package and use it with other third-party libraries, like requests, BeautifulSoup, and Scrapy. This offers the possibility for more advanced operations in web scraping and interacting with web APIs.

To download a file from a URL using the urllib package, you can call urlretrieve() from the urllib.request module. This function fetches a web resource from the specified URL and then saves the response to a local file. To start, import urlretrieve() from urlllib.request:

Python
>>> from urllib.request import urlretrieve
Copied!

Next, define the URL that you want to retrieve data from. If you don’t specify a path to a local file where you want to save the data, then the function will create a temporary file for you. Since you know that you’ll be downloading a ZIP file from that URL, go ahead and provide an optional path to the target file:

Python
>>> url = (
...     "https://api.worldbank.org/v2/en/indicator/"
...     "NY.GDP.MKTP.CD?downloadformat=csv"
... )
>>> filename = "gdp_by_country.zip"
Copied!

Because your URL is quite long, you rely on Python’s implicit concatenation by splitting the string literal over multiple lines inside parentheses. The Python interpreter will automatically join the separate strings on different lines into a single string. You also define the location where you wish to save the file. When you only provide a filename without a path, Python will save the resulting file in your current working directory.

Read the full article at https://realpython.com/python-download-file-from-url/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 25, 2025 02:00 PM UTC

Python and TOML: New Best Friends

TOML stands for Tom’s Obvious Minimal Language. Its human-readable syntax makes TOML convenient to parse into data structures across various programming languages. In Python, you can use the built-in tomllib module to work with TOML files. TOML plays an essential role in the Python ecosystem. Many of your favorite tools rely on TOML for configuration, and you’ll use pyproject.toml when you build and distribute your own packages.

By the end of this tutorial, you’ll understand that:

  • TOML in Python refers to a minimal configuration file format that’s convenient to read and parse.
  • TOML files are primarily used for configuration, separating code from settings for flexibility.
  • pyproject.toml is crucial for package configuration and specifies the build system and dependencies.
  • Loading a TOML file in Python involves using tomli or tomllib to parse it into a dictionary.
  • tomli and tomllib differ mainly in origin, with tomllib being a standard library module in modern Python.

If you want to know more about why tomllib was added to Python, then have a look at the companion tutorial, Python 3.11 Preview: TOML and tomllib.

Free Download: Get a sample chapter from Python Tricks: The Book that shows you Python’s best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

Use TOML as a Configuration Format

TOML is short for Tom’s Obvious Minimal Language and is humbly named after its creator, Tom Preston-Werner. It was designed expressly to be a configuration file format that should be “easy to parse into data structures in a wide variety of languages” (Source).

In this section, you’ll start thinking about configuration files and look at what TOML brings to the table.

Configurations and Configuration Files

A configuration is an important part of almost any application or system. It’ll allow you to change settings or behavior without changing the source code. Sometimes you’ll use a configuration to specify information needed to connect to another service like a database or cloud storage. Other times you’ll use configuration settings to allow your users to customize their experience with your project.

Using a configuration file for your project is a good way to separate your code from its settings. It also encourages you to be conscious about which parts of your system are genuinely configurable, giving you a tool to name magic values in your source code. For now, consider this configuration file for a hypothetical tic-tac-toe game:

Config File
player_x_color = blue
player_o_color = green
board_size     = 3
server_url     = https://tictactoe.example.com/
Copied!

You could potentially code this directly in your source code. However, by moving the settings into a separate file, you achieve a few things:

  • You give explicit names to values.
  • You provide these values more visibility.
  • You make it simpler to change the values.

Look more closely at your hypothetical configuration file. Those values are conceptually different. The colors are values that your framework probably supports changing. In other words, if you replaced blue with red, that would be honored without any special handling in your code. You could even consider if it’s worth exposing this configuration to your end users through your front end.

However, the board size may or may not be configurable. A tic-tac-toe game is played on a three-by-three grid. It’s not certain that your logic would still work for other board sizes. It may still make sense to keep the value in your configuration file, both to give a name to the value and to make it visible.

Finally, the project URL is usually essential when deploying your application. It’s not something that a typical user will change, but a power user may want to redeploy your game to a different server.

To be more explicit about these different use cases, you may want to add some organization to your configuration. One popular option is to separate your configuration into additional files, each dealing with a different concern. Another option is to group your configuration values somehow. For example, you can organize your hypothetical configuration file as follows:

Config File
[user]
player_x_color = blue
player_o_color = green

[constant]
board_size = 3

[server]
url = https://tictactoe.example.com
Copied!

The organization of the file makes the role of each configuration item clearer. You can also add comments to the configuration file with instructions to anyone thinking about making changes to it.

Note: The actual format of your configuration file isn’t important for this discussion. The above principles hold independently of how you specify your configuration values. As it happens, the examples that you’ve seen so far can be parsed by Python’s ConfigParser class.

There are many ways for you to specify a configuration. Windows has traditionally used INI files, which resemble your configuration file from above. Unix systems have also relied on plain-text, human-readable configuration files, although the actual format varies between different services.

Over time, more and more applications have come to use well-defined formats like XML, JSON, or YAML for their configuration needs. These formats were designed as data interchange or serialization formats, usually meant for computer communication.

On the other hand, configuration files are often written or edited by humans. Many developers have gotten frustrated with JSON’s strict comma rules when updating their Visual Studio Code settings or with YAML’s nested indentations when setting up a cloud service. Despite their ubiquity, these file formats aren’t the easiest to write by hand.

TOML: Tom’s Obvious Minimal Language

Read the full article at https://realpython.com/python-toml/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 25, 2025 02:00 PM UTC

January 24, 2025


Real Python

The Real Python Podcast – Episode #236: Simon Willison: Using LLMs for Python Development

What are the current large language model (LLM) tools you can use to develop Python? What prompting techniques and strategies produce better results? This week on the show, we speak with Simon Willison about his LLM research and his exploration of writing Python code with these rapidly evolving tools.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 24, 2025 12:00 PM UTC


Armin Ronacher

Build It Yourself

Another day, another rant about dependencies. from me. This time I will ask you that we start and support a vibe shift when it comes to dependencies.

You're probably familiar with the concept of “dependency churn.” It's that never-ending treadmill of updates, patches, audits, and transitive dependencies that we as developers love to casually install in the name of productivity. Who doesn't enjoy waiting for yet another cargo upgrade just so you can get that fix for a bug you don't even have?

It's a plague in most ecosystems with good packaging solutions. JavaScript and Rust are particularly badly affected by that. A brand new Tokio project drags in 28 crates, a new Rocket project balloons that to 172, and a little template engine like MiniJinja can exist with just a single dependency — while its CLI variant slurps up 142.

If that doesn't sound like a big deal, let's consider terminal_size. It is a crate that does exactly what its name suggests: it figures out your terminal dimensions. The underlying APIs it uses have effectively been stable since the earliest days of computing terminals—what, 50 years or so? And yet, for one function, terminal-size manages to introduce three or four additional crates, depending on your operating system. That triggers a whole chain reaction, so you end up compiling thousands of other functions just to figure out if your terminal is 80x25 or 120x40. That crate had 26 releases. My own version of that that I have stuck away in a project from 10 years ago still works without a single update. Because shocker: nothing about figuring out terminal sizes has changed. [1]

So why does terminal-size have so many updates if it's so stable? Because it's build on top of platform abstraction libraries that constantly churn, so it needs to update to avoid code duplication and blowing up compile times even more.

But “big supply chain” will tell you that you must do it this way. Don't you dare to copy paste that function into your library. Or don't you date to use “unsafe” yourself. You're not qualified enough to write unsafe code, let the platform abstraction architects do that. Otherwise someone will slap you. There are entire companies who are making a living of supplying you with the tools needed to deal with your dependency mess. In the name of security, we're pushed to having dependencies and keeping them up to date, despite most of those dependencies being the primary source of security problems.

The goal of code in many ways should be to be written in a way that it does not need updates. It should eventually achieve some level of stability. In the Rust ecosystem stable code is punished. If you have a perfectly working dependency but you have a somewhat inactive bug tracker, RUSTSEC will come by and give you a chunk rating.

But there is a simpler path. You write code yourself. Sure, it's more work up front, but once it's written, it's done. No new crates, no waiting for upsteam authors to fix that edge case. If it's broken for you, you fix it yourself. Code that works doesn't necessarily need the maintenance treadmill. Your code has a corner case? Who cares. This is that vibe shift we need in the Rust world: celebrating fewer dependencies rather than more.

We're at a point in the most ecosystems where pulling in libraries is not just the default action, it's seen positively: “Look how modular and composable my code is!” Actually, it might just be a symptom of never wanting to type out more than a few lines.

Now one will make the argument that it takes so much time to write all of this. It's 2025 and it's faster for me to have ChatGPT or Cursor whip up a dependency free implementation of these common functions, than it is for me to start figuring out a dependency. And it makes sense as for many such small functions the maintenance overhead is tiny and much lower than actually dealing with constant upgrading of dependencies. The code is just a few lines and you also get the benefit of no longer need to compile thousands of lines of other people's code for a single function.

But let's face it: corporate code review culture has also has infected Open Source software. Companies are more likely to reward engineers than scold them for pulling in that new “shiny library” that solves the problem they never actually had. That creates problems, so dependabot and friends were born. Today I just dread getting dependabot pull requests but on projects but I have to accept it. I'm part of an ecosystem with my stuff and that ecosystem is all about churn, churn, churn. In companies you can also keep entire internal engineering teams busy with vendoring dependencies, internal audits and upgrading things throughout the company.

Fighting this fight is incredibly hard! Every new hire has been trained on the idea that dependencies are great, that code reuse is great. That having old code sitting around is a sign of bad engineering culture.

It's also hard to fight this in Open Source. Years ago I wrote sha1-smol which originally was just called sha1. It became the standard crate to calculate SHA1 hashes. Eventually I was pressured to donate that package name to rust-crypto and to depend on the rest of the crypto ecosystem as it was so established. If you want to use the new sha1 crate, you get to enjoy 10 dependencies. But there was just no way around it, because that name in the registry is precious and people also wanted to have trait compatibility. It feels tiring to be the only person in a conversation pushing to keep the churn down and dependencies low.

It's time to have a new perspective: we should give kudos to engineers who write a small function themselves instead of hooking in a transitive web of crates. We should be suspicious of big crate graphs. Celebrated are the minimal dependencies, the humble function that just quietly does the job, the code that doesn't need to be touched for years because it was done right once.

And sure, it's not black and white. There are the important libraries that solve hard problems. Graphics libraries that abstract over complex drivers, implementations of protocols like HTTP and QUIC. I won't be able to get rid of tokio and I have no desire to. But when you end up using one function, but you compile hundreds, some alarm bell should go off.

We need that vibe shift. To celebrate building it yourself when it's appropriate to do so. To give credit to library authors who build low to no-dependency Open Source libraries.

For instance minijinja celebrates it in the readme:

$ cargo tree
minimal v0.1.0 (examples/minimal)
└── minijinja v2.6.0 (minijinja)
    └── serde v1.0.144

And it has a PR to eventually get rid of the last dependency. And sometime this year I will make it my goal to go ahead proudly and trim down all that fat in my projects.

[1]Disclaimer: you will need one dependency for UNIX: libc. That's because Rust does not expose the platform's libc constants to you, and they are not standarized. That however is such a common and lightweight dependency that you won't be able to avoid it anyways.

January 24, 2025 12:00 AM UTC


Quansight Labs Blog

libsf_error_state: SciPy's first shared library

The story of the first shared library to make it into the world of low level code that lies beneath SciPy's surface.

January 24, 2025 12:00 AM UTC

January 23, 2025


Django Weblog

Djangonaut Space - New session 2025

We are thrilled to announce that Djangonaut Space, a mentorship program, is open for applicants for our next cohort! 🚀

Djangonaut Space is holding a fourth session! This session will start on February 17th, 2025. We are currently accepting applications until January 29th, 2025 Anywhere on Earth. More details can be found in the website.

Djangonaut Space is a free, 8-week group mentoring program where individuals will work self-paced in a semi-structured learning environment. It seeks to help members of the community who wish to level up their current Django code contributions and potentially take on leadership roles in Django in the future.

“I'm so grateful to have been a part of the Djangonaut Space program. It's a wonderfully warm, diverse, and welcoming space, and the perfect place to get started with Django contributions. The community is full of bright, talented individuals who are making time to help and guide others, which is truly a joy to experience. Before Djangonaut Space, I felt as though I wasn't the kind of person who could become a Django contributor; now I feel like I found a place where I belong.” - Eliana, Djangonaut Session 1

Enthusiastic about contributing to Django but wondering what we have in store for you? No worries, we have got you covered! 🤝

✏️ Mission Briefing

📷 AMA Recap

January 23, 2025 10:20 PM UTC


Test and Code

pytest-cov : The pytest plugin for measuring coverage

pytest-cov is a pytest plugin that helps produce coverage reports using Coverage.py.

In this episode, we'll discuss:

Links:

Errata:


 Learn pytest

<p>pytest-cov is a pytest plugin that helps produce coverage reports using Coverage.py.</p><p>In this episode, we'll discuss:</p><ul><li>what Coverage.py is</li><li>why you should measure code coverage on both your source and test code</li><li>what pytest-cov is</li><li>extra features pytest-cov gives you over and above coverage.py</li><li>and generally why using both is awesome</li></ul><p>Links:</p><ul><li><a href="https://coverage.readthedocs.io">coverage.py</a></li><li><a href="https://pytest-cov.readthedocs.io">pytest-cov</a></li><li><a href="https://pytest-cov.readthedocs.io/en/latest/contexts.html">how to set up context reports</a></li><li><a href="https://pythontest.com/top-pytest-plugins/">Top pytest Plugins</a></li></ul><p>Errata:</p><ul><li>I mentioned that Coverage has the ability to show context (which line is covered by which test) for the past year or so.<ul><li>However, that feature was released in Oct 2018. <a href="https://coverage.readthedocs.io/en/7.6.10/changes.html#version-5-0a3-2018-10-06">coverage 5.0 alpha</a> </li><li>That's over 6 years. Oops. Sorry Ned.</li></ul></li></ul> <br><p><strong> Learn pytest</strong></p><ul><li>pytest is the number one test framework for Python.</li><li>Learn the basics super fast with <a href="https://courses.pythontest.com/hello-pytest">Hello, pytest!</a></li><li>Then later you can become a pytest expert with <a href="https://courses.pythontest.com/the-complete-pytest-course">The Complete pytest Course</a></li><li>Both courses are at <a href="https://courses.pythontest.com/">courses.pythontest.com</a></li></ul>

January 23, 2025 08:06 PM UTC


Brett Cannon

My impressions of Gleam

My impressions of Gleam

When I was about to go on paternity leave, the Gleam programming language reached 1.0. It&aposs such a small language that I was able to learn it over the span of two days. I tried to use it to convert a GitHub Action from JavaScript to Gleam, but I ran into issues due to Gleam wanting to be the top of the language stack instead of the bottom. As such I ended up learning and using ReScript. But I still liked Gleam and wanted to try writing something in it, so over the winter holidays I did another project with it from scratch.

Why Gleam?

First and foremost, their statement about community on their homepage spoke to me:

As a community, we want to be friendly too. People from around the world, of all backgrounds, genders, and experience levels are welcome and respected equally. See our community code of conduct for more.

Black lives matter. Trans rights are human rights. No nazi bullsh*t.

Secondly, the language is very small and tightly designed which I always appreciate (Python&aposs "it fits your brain" slogan has always been one of my favourite tag lines for the language).

Third, it&aposs a typed, functional, immutable language that is impure. I find that a nice balance of practicality while trying to write code that is as reliable as possible by knowing that if you get passed the compiler you&aposre probably doing pretty well (which is good for projects you are not going to work on often but do have the time to put in the extra effort upfront to deal with typing and such).

Fourth, it compiles to either Erlang or JavaScript. Both have their (unique) uses which I appreciate (and in my case the latter is important).

Fifth, it has Lustre. While I liked Elm and loved TEA (The Elm Architecture), I did find Elm&aposs lack of FFI restrictive. Lustre with Gleam fixes those issues.

And finally, my friend Dusty is a fan.

My learning project

I decided I wanted to create a website to help someone choose a coding font. When I was looking for one a while back I created screenshots of code samples which were anonymous so that I could choose one without undue influence (I ended up with MonoLisa). I figured it would be a fun project to create a site that did what I wish I had when choosing a font: a tournament bracket for fonts where you entered example text and then have fonts battle it out until you had a winner. This seemed like a great fit for Lustre and Gleam since it would be all client-side and have some interaction.

😅
It turns out CodingFont came out shortly before I started my project, unbeknownst to me. They take the same approach of a tournament bracket, but in a much prettier site with the bonus of being something I don&apost have to maintain. As such I won&apost be launching a site for my project, but the code is available in case you want to run your own tournament with your own choice of fonts.

The good

Overall, the language was a pleasure to work with. While the functional typing occasionally felt tedious, I knew there was benefit to it if I wanted things to work in the long-term with as little worry as possible that I had a bug in my code. The language was nice and small, and so I didn&apost have issue keeping it in my head while I coded (most of my documentation reading was for the standard library). And it was powerful enough with Lustre for me to need exactly less than 200 lines of Gleam to make it all work (plus less than 90 lines of static HTML and CSS).

The bad

I&aposm a Python fan, and so all the curly braces weren&apost my favourite thing. I know its for familiarity reasons and I&aposm not going to cause me to not use the language in the future, but I wouldn&apost have minded less syntax to denote structure.

The other thing is having to specify a type&aposs name twice for the name be usable as both the type and the constructor for a single record.

pub type Thingy {
    Thingy(...)
}

Once again, it&aposs very minor but something that I had to learn and typing the name twice always felt unnecessary and a typo waiting to happen for the compiler to catch. Having some shorthand like pub record Thingy(...) to represent the same thing would be nice.

The dream

I would love to have a WebAssembly/WASI and Python back-end for Gleam to go along with the Erlang and JavaScript one. I have notes on writing a Python back-end and Dusty did a prototype. Unfortunately I don&apost think the Gleam compiler – which written in Rust – is explicitly designed for adding more back-ends, so I&aposm not sure if any of this will ever come to pass.

Conclusion

I&aposm happy with Gleam! I&aposm interested in trying it with Erlang and the BEAM somehow, although my next project for that realm is with Elixir because Phoenix LiveView is a perfect fit for that project (I suspect there&aposs something in Gleam to compete with Phoenix LiveView, but I do want to learn Elixir). But I definitely don&apost regret learning Gleam and I am still motivated enough to be working my way through Exercism&aposs Gleam track.

January 23, 2025 04:43 AM UTC


meejah.ca

Sending a File in 2025

Making a file appear on ONE other computer

January 23, 2025 12:00 AM UTC

January 22, 2025


PyCharm

Anomaly Detection in Time Series

Anomaly Detection in Time Series

How do you identify unusual patterns in data that might reveal critical issues or hidden opportunities? Anomaly detection helps identify data that deviates significantly from the norm. Time series data, which consists of data collected over time, often includes trends and seasonal patterns. Anomalies in time series data occur when these patterns are disrupted, making anomaly detection a valuable tool in industries like sales, finance, manufacturing, and healthcare.

As time series data has unique characteristics like seasonality and trends, specialized methods are required to detect anomalies effectively. In this blog post, we’ll explore some popular methods for anomaly detection in time series, including STL decomposition and LSTM prediction, with detailed code examples to help you get started.

Time series anomaly detection in businesses

Time series data is essential to many businesses and services. Many businesses record data over time with timestamps, allowing changes to be analyzed and data to be compared over time. Time series are useful when comparing a certain quantity over a certain period, as, for example, in a year-over-year comparison where the data exhibits characteristics of seasonalities.

Sales monitoring

One of the most common examples of time series data with seasonalities is sales data. As a lot of sales are affected by annual holidays and the time of the year, it is hard to draw conclusions about sales data without considering the seasonalities. Because of that, a common method for analyzing and finding anomalies in sales data is STL decomposition, which we will cover in detail later in this blog post.

Finance

Financial data, such as transactions and stock prices, are typical examples of time series data. In the finance industry, analyzing and detecting anomalies in this data is a common practice. For example, time series prediction models can be used in automatic trading. We’ll use a time series prediction to identify anomalies in stock data later in this blog post.

Manufacturing

Another use case of time series anomaly detection is monitoring defects in production lines. Machines are often monitors, making time series data available. Being able to notify management of potential failures is essential, and anomaly detection plays a key role.

Medicine and healthcare

In medicine and healthcare, human vitals are monitored and anomalies can be detected. This is important enough in medical research, but it’s critical in diagnostics. If a patient at a hospital has anomalies in their vitals and is not treated immediately, the results can be fatal.

Why is it important to use special methods for time series anomaly detection?

Time series data is special in the sense that it sometimes cannot be treated like other types of data. For example, when we apply a train test split to time series data, the sequentially related nature of the data means we cannot shuffle it. This is also true when applying time series data to a deep learning model. A recurrent neural network (RNN) is commonly used to take the sequential relationship into account, and training data is input as time windows, which preserve the sequence of events within.

Time series data is also special because it often has seasonality and trends that we cannot ignore. This seasonality can manifest in a 24-hour cycle, a 7-day cycle, or a 12-month cycle, just to name a few common possibilities. Anomalies can only be determined after the seasonality and trends have been considered, as you will see in our example below

Methods used for anomaly detection in time series

Because time series data is special, there are specific methods for detecting anomalies in it. Depending on the type of data, some of the methods and algorithms we mentioned in the previous blog post about anomaly detection can be used on time series data. However, with those methods, the anomaly detection may not be as robust as using ones specifically designed for time series data. In some cases, a combination of detection methods can be used to reconfirm the detection result and avoid false positives or negatives.

STL decomposition

One of the most popular ways to use time series data that has seasonality is STL decomposition – seasonal trend decomposition using LOESS (locally estimated scatterplot smoothing). In this method, a time series is decomposed using an estimate of seasonality (with the period provided or determined using an algorithm), a trend (estimated), and the residual (the noise in the data). A Python library that provides STL decomposition tools is the statsmodels library.

STL decomposition

An anomaly is detected when the residual is beyond a certain threshold. 

Using STL decomposition on beehive data

In an earlier blog post, we explored anomaly detection in beehives using the OneClassSVM and IsolationForest methods. 

In this tutorial, we’ll analyze beehive data as a time series using the STL class provided by the statsmodels library. To get started, set up your environment using this file: requirements.txt

1. Install the library

Since we have only been using the model provided by Scikit-learn, we will need to install statsmodels from PyPI. This is easy to do in PyCharm.

Go to the Python Package window (choose the icon at the bottom of the left-hand side of the IDE) and type in statsmodels in the search box.

Statsmodels in PyCharm

You can see all of the information about the package on the right-hand side. To install it, simply click Install package.

2. Create a Jupyter notebook

To investigate the dataset further, let’s create a Jupyter notebook to take advantage of the tools that PyCharm’s Jupyter notebook environment provides.

Create a Jupyter notebook in PyCharm

We will import pandas and load the .csv file.

import pandas as pd

df = pd.read_csv('../data/Hive17.csv', sep=";")
df = df.dropna()
df
Import pandas in PyCharm

3. Inspect the data as graphs

Now, we can inspect the data as graphs. Here, we would like to see the temperature of hive 17 over time. Click on Chart view in the dataframe inspector and then choose T17 as the y-axis in the series settings.

Inspect the data as graphs in PyCharm

When expressed as a time series, the temperature has a lot of ups and downs. This indicates periodic behavior, likely due to the day-night cycle, so it is safe to assume there is a 24-hour period for the temperature. 

Next, there is a trend of temperature dropping over time. If you inspect the DateTime column, you can see that the dates range from August to November. Since the Kaggle page of the dataset indicates that the data was collected in Turkey, the transition from summer to fall explains our observation that the temperature is dropping over time.

4. Time series decomposition

To understand the time series and detect anomalies, we will perform STL decomposition, importing the STL class from statsmodels and fitting it with our temperature data.

from statsmodels.tsa.seasonal import STL

stl = STL(df["T17"], period=24, robust=True) 
result = stl.fit()

We will have to provide a period for the decomposition to work. As we mentioned before, it is safe to assume a 24-hour cycle.

According to the documentation, STL decomposes a time series into three components: trend, seasonal, and residual. To get a clearer look at the decomposed result, we can use the built-in plot method:

result.plot()
Time series decomposition

You can see the Trend and Season plots seem to align with our assumptions above. However, we are interested in the residual plot at the bottom, which is the original series without the trend and seasonal changes. Any extremely high or low value in the residual indicates an anomaly.

5. Anomaly threshold

Next, we would like to determine what values of the residual we’ll consider abnormal. To do that, we can look at the residual’s histogram.

result.resid.plot.hist()
Anomaly threshold in PyCharm

This can be considered a normal distribution around 0, with a long tail above 5 and below -5, so we’ll set the threshold to 5.

To show the anomalies on the original time series, we can color all of them red in the graph like this:

import matplotlib.pyplot as plt

threshold = 5
anomalies_filter = result.resid.apply(lambda x: True if abs(x) > threshold else False)
anomalies = df["T17"][anomalies_filter]

plt.figure(figsize=(14, 8))
plt.scatter(x=anomalies.index, y=anomalies, color="red", label="anomalies")
plt.plot(df.index, df['T17'], color='blue')
plt.title('Temperatures in Hive 17')
plt.xlabel('Hours')
plt.ylabel('Temperature')
plt.legend()
plt.show()
Anomalies on the original time series in PyCharm

Without STL decomposition, it is very hard to identify these anomalies in a time series consisting of periods and trends.

LSTM prediction

Another way to detect anomalies in time series data is to do a time series prediction on the series using deep learning methods to estimate the outcome of data points. If an estimate is very different from the actual data point, then it could be a sign of anomalous data.

One of the popular deep learning algorithms to perform the prediction of sequential data is the Long short-term memory (LSTM) model, which is a type of recurrent neural network (RNN). The LSTM model has input, forget, and output gates, which are number matrices. This ensures important information is passed on in the next iteration of the data.

LSTM memory cell

Since time series data is sequential data, meaning the order of data points is in sequential order and should not be shuffled, the LSTM model is an effective deep learning model to predict the outcome at a certain time. This prediction can be compared to the actual data and a threshold can be set to determine if the actual data is an anomaly.

Using LSTM prediction on stock prices

Now let’s start a new Jupyter project to detect any anomalies in Apple’s stock price over the past 5 years. The stock price dataset shows the most up-to-date data. If you want to follow along with the blog post, you can download the dataset we are using.

1. Start a Jupyter project

When starting a new project, you can choose to create a Jupyter one, which is optimized for data science. In the New Project window, you can create a Git repository and determine which conda installation to use for managing your environment.

Start a Jupyter project in PyCharm

After starting the project, you will see an example notebook. Go ahead and start a new Jupyter notebook for this exercise.

An example notebook in PyCharm

After that, let’s set up requirements.txt. We will need pandas, matplotlib, and PyTorch, which is named torch on PyPI. Since PyTorch is not included in the conda environment, PyCharm will tell us that we are missing the package. To install the package, click on the lightbulb and select Install all missing packages.

Install all missing packages in PyCharm

2. Loading and inspecting the data

Next, let’s put our dataset apple_stock_5y.csv in the data folder and load it as a pandas DataFrame to inspect it.

import pandas as pd
 
df = pd.read_csv('data/apple_stock_5y.csv')
df

With the interactive table, we can easily see if any data is missing.

There is no missing data, but we have one issue – we would like to use the Close/Last price but it is not a numeric data type. Let’s do a conversion and inspect our data again:

df["Close/Last"] = df["Close/Last"].apply(lambda x: float(x[1:]))
df

Now, we can inspect the price with the interactive table. Click on the plot icon on the left and a plot will be created. By default, it uses Date as the x-axis and Volume as the y-axis. Since we would like to inspect the Close/Last price, go to the settings by clicking the gear icon on the right and choose Close/Last as the y-axis.

3. Preparing the training data for LSTM

Next, we have to prepare the training data to be used in the LSTM model. We need to prepare a sequence of vectors (feature X), each representing a time window, to predict the next price. The next price will form another sequence (target y). Here we can choose how big this time window is with the lookback variable. The following code creates sequences X and y which will then be converted to PyTorch tensors:

import torch

lookback = 5
timeseries = df[["Close/Last"]].values.astype('float32')

X, y = [], []
for i in range(len(timeseries)-lookback):
    feature = timeseries[i:i+lookback]
    target = timeseries[i+1:i+lookback+1]
    X.append(feature)
    y.append(target)
    
X = torch.tensor(X)
y = torch.tensor(y)

print(X.shape, y.shape)

Generally speaking, the bigger the window, the bigger our model will be, since the input vector is bigger. However, with a bigger window, the sequence of inputs will be shorter, so determining this lookback window is a balancing act. We will start with 5, but feel free to try different values to see the differences.

4. Build and train the model

We can build the model by creating a class using the nn module in PyTorch before we train it. The nn module provides building blocks, such as different neural network layers. In this exercise, we will build a simple LSTM layer followed by a linear layer:

import torch.nn as nn

class StockModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.lstm = nn.LSTM(input_size=1, hidden_size=50, num_layers=1, batch_first=True)
        self.linear = nn.Linear(50, 1)
    def forward(self, x):
        x, _ = self.lstm(x)
        x = self.linear(x)
        return x

Next, we will train our model. Before training it, we will need to create an optimizer, a loss function used to calculate the loss between the predicted and actual y values, and a data loader to feed in our training data:

import numpy as np
import torch.optim as optim
import torch.utils.data as data

model = StockModel()
optimizer = optim.Adam(model.parameters())
loss_fn = nn.MSELoss()
loader = data.DataLoader(data.TensorDataset(X, y), shuffle=True, batch_size=8)

The data loader can shuffle the input, as we have already created the time windows. This preserves the sequential relationship in each window.

Training is done using a for loop which loops over each epoch. For every 100 epochs, we will print out the loss and observe while the model converges:

n_epochs = 1000
for epoch in range(n_epochs):
    model.train()
    for X_batch, y_batch in loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    if epoch % 100 != 0:
        continue
    model.eval()
    with torch.no_grad():
        y_pred = model(X)
        rmse = np.sqrt(loss_fn(y_pred, y))
    print(f"Epoch {epoch}: RMSE {rmse:.4f}")

We start at 1000 epochs, but the model converges quite quickly. Feel free to try other numbers of epochs for training to achieve the best result.

Epochs for training

In PyCharm, a cell that requires some time to execute will provide a notification about how much time remains and a shortcut to the cell. This is very handy when training machine learning models, especially deep learning models, in Jupyter notebooks.

5. Plot the prediction and find the errors

Next, we will create the prediction and plot it together with the actual time series. Note that we will have to create a 2D np series to match with the actual time series. The actual time series will be in blue while the predicted time series will be in red.

import matplotlib.pyplot as plt

with torch.no_grad():
    pred_series = np.ones_like(timeseries) * np.nan
    pred_series[lookback:] = model(X)[:, -1, :]

plt.plot(timeseries, c='b')
plt.plot(pred_series, c='r')
plt.show()
Plot the prediction and find the errors

If you observe carefully, you will see that the prediction and the actual values do not align perfectly. However, most of the predictions do a good job.

To inspect the errors closely, we can create an error series and use the interactive table to observe them. We are using the absolute error this time.

error = abs(timeseries-pred_series)
error

Use the settings to create a histogram with the value of the absolute error as the x-axis and the count of the value as the y-axis.

6. Decide on the anomaly threshold and visualize

Most of the points will have an absolute error of less than 6, so we can set that as the anomaly threshold. Similar to what we did for the beehive anomalies, we can plot the anomalous data points in the graph.

threshold = 6
error_series = pd.Series(error.flatten())
price_series = pd.Series(timeseries.flatten())

anomalies_filter = error_series.apply(lambda x: True if x > threshold else False)
anomalies = price_series[anomalies_filter]

plt.figure(figsize=(14, 8))
plt.scatter(x=anomalies.index, y=anomalies, color="red", label="anomalies")
plt.plot(df.index, timeseries, color='blue')
plt.title('Closing price')
plt.xlabel('Days')
plt.ylabel('Price')
plt.legend()
plt.show()
Plot the anomalous data points in the graph

Summary

Time series data is a common form of data used in many applications including business and scientific research. Due to the sequential nature of time series data, special methods and algorithms are used to help determine anomalies in it. In this blog post, we demonstrated how to identify anomalies using STL decomposition to eliminate seasonalities and trends. We have also demonstrated how to use deep learning and the LSTM model to compare the predicted estimate and the actual data in order to determine anomalies.

Detect anomalies using PyCharm

With the Jupyter project in PyCharm Professional, you can organize your anomaly detection project with a lot of data files and notebooks easily. Graphs output can be generated to inspect anomalies and plots are very accessible in PyCharm. Other features, such as auto-complete suggestions, make navigating all the Scikit-learn models and Matplotlib plot settings a blast.

Power up your data science projects by using PyCharm, and check out the data science features offered to streamline your data science workflow.

January 22, 2025 12:14 PM UTC


Python⇒Speed

Faster pip installs: caching, bytecode compilation, and uv

Installing your Python application’s dependencies can be surprisingly slow. Whether you’re running tests in CI, building a Docker image, or installing an application, downloading and installing dependencies can take a while.

So how do you speed up installation with pip?

In this article I’ll cover:

Read more...

January 22, 2025 12:00 AM UTC


Michael Foord

Advanced Python Course

Over the last year I’ve updated my Advanced Python course to be based on a series of modules that can more easily be adapted to the specific needs of any team or group of delegates. There’s a lot more advanced material and the exercises have also been updated. A good engineer’s Python in three days.

Typically taught as a three day hands on course that will take you deeper into the Python programming language and ecosystem. This course will take delegates from beginner/intermediate level in Python to Advanced Python experts. The course provides a solid overview of the Python language including some low level details essential to working confidently and fluidly with Python. The focus is on practical programming and the skills learned here can be applied in any field where Python is used.

This course is taught by Michael Foord. Michael has been teaching Python for over a decade and has over twenty years industry experience as an application developer. Michael is a Python core developer and the creator of unittest.mock in the Python standard library, and is the author of “The Absolute Minimum Every Python Web Application Developer Must Know About Security”.

In this course delegates will learn a great deal of Python, from an essential foundation like how assignment works to taking advantage of multicore systems with multiprocessing. Included is networking, from API clients to understanding sockets and how servers work - the request response cycle of the REST API model, language features like closures, generators, context managers and the whole Python object model along with testing with pytest.

For smart programmers this course provides a solid foundation for working with Python along with many advanced language features and concepts and powerful libraries for tackling many common programming scenarios. As well as learning and discussion every section is backed by lab exercises.

Full list of modules available:

An additional optional module “Data Science Overview” is available on request.

For several of the modules there are previews of the material in a series of “Python Knowledge Share Videos”:

The materials/slides for these sessions can be found here:

January 22, 2025 12:00 AM UTC

January 21, 2025


Brian Okken

Updates to the Top pytest Plugins - now 200

I’m working on some updates to the Top pytest plugins list.

For January’s numbers, I’ve used a tweak on the process. Starting with the process documented by Hugo for the Top PyPI Packages, I’m grabbing the entire data set through BigQuery so that I can grab a nice round number of the top 200 pytest plugins.

I am still working my way through the list to come up with an exclusion list, but having control of the data set allows me to play with the process a bit.

January 21, 2025 11:00 PM UTC


PyCoder’s Weekly

Issue #665: Dict Comprehensions, Data Visualization, Memory Leaks, and More (Jan. 21, 2025)

#665 – JANUARY 21, 2025
View in Browser »

The PyCoder’s Weekly Logo


Building Dictionary Comprehensions in Python

In this video course, you’ll learn how to write dictionary comprehensions in Python. You’ll also explore the most common use cases for dictionary comprehensions and learn about some bad practices that you should avoid when using them in your code.
REAL PYTHON course

PyViz: Python Tools for Data Visualization

This site contains an overview of all the different visualization libraries in the Python ecosystem. If you’re trying to pick a tool, this is a great place to better understand the pros and cons of each.
PYVIZ.ORG

Building AI Agents With Your Enterprise Data: A Developer’s Guide

alt

Building AI agents over enterprise data is hard—legacy systems, scattered data, complex queries. MindsDB, a leading open-source enterprise AI platform, makes it much easier: turn your data into AI-driven ‘skills’ and build smarter, more customized agents →
MINDSDB sponsor

Catching Memory Leaks With Your Test Suite

“If you have a good test suite, you may be able to use pytest fixtures to identify memory and other resource leaks.”
ITAMAR TURNER-TRAURING

Djangonaut Space Session 4 Applications Open

DJANGONAUT.SPACE

Python 3.14.0 Alpha 4 Released

PYTHON.ORG

Django 5.2 Alpha 1 Released

DJANGO SOFTWARE FOUNDATION

Django Security Releases Issued: 5.1.5, 5.0.11, and 4.2.18

DJANGO SOFTWARE FOUNDATION

EuroPython Prague 2025 Call for Proposals

EUROPYTHON.EU

Python Jobs

Backend Software Engineer (Anywhere)

Brilliant.org

More Python Jobs >>>

Articles & Tutorials

10 Ways to Work With Large Files in Python

“Handling large text files in Python can feel overwhelming. When files grow into gigabytes, attempting to load them into memory all at once can crash your program.” This article covers different ways of dealing with this challenge.
ALEKSEI ALEINIKOV

Investigating the Popularity of Python Build Backends

This post analyzes the popularity of build backends used in pyproject.toml files over time. Over half of the projects analyzed use setuptools, with Poetry coming in second around 30%.
BASTIAN VENTHUR

Testing Your Python Package Releases

Upon creating the release for the Django Debug Toolbar, Tim discovered a problem when uploading to PyPI. This article talks about how he could have caught it earlier with better tests.
TIM SCHILLING

How I Automated Data Cleaning in Python

Learn how to automate data cleaning in Python using reusable functions and pipelines. Improve efficiency by handling missing values, duplicates, and data types.
SATYAM SAHU

The “Active Enum” Pattern

This article talks about the challenges associated with enum declarations being separate from their associated business logic, and proposes some alternatives.
GLYPH LEFKOWITZ

A Guide to JAX for PyTorch Developers

“PyTorch users can learn about JAX in this tutorial that connects JAX concepts to the PyTorch building blocks that they’re already familiar with.”
ANFAL SIDDIQUI

Use a Python venv in a Container Like Docker

Bite code talks about why you should use a virtual environment inside of Docker containers, even though that is containment within a container.
BITE CODE!

Creating a Fitness Tracker App With Python Reflex

A detailed guide to building a simple Fitness Tracker app in Python using Reflex that lets you log the number of workouts completed per week.
BOB BELDERBOS • Shared by Bob Belderbos

Database Indexing in Django

This article explores the basics of database indexing, its advantages and disadvantages, and how to apply it in a Django application.
OLUWOLE MAJIYAGBE • Shared by Michael Herman

Python’s range() Function

The range() function can be used for counting upward, countdown downward, or performing an operation a number of times.
TREY HUNNER

Quickly Visualizing an SBOM Document

Quick instructions on how to chart a Software Bill-Of-Materials (SBOM) document.
SETH LARSON

Projects & Code

Zasper: A Supercharged IDE for Data Science

N/A
ZASPER.IO • Shared by Poruri Sai Rahul

pyper: Concurrent Python Made Simple

GITHUB.COM/PYPER-DEV

freezegun: Mocks the datetime Module for Testing

GITHUB.COM/SPULEC

guppy3: Guppy/Heapy Ported to Python3

GITHUB.COM/ZHUYIFEI1999

VocabCLI: Word Insights TUI

GITHUB.COM/HIGHNESSATHARVA • Shared by Atharva Shah

Events

Weekly Real Python Office Hours Q&A (Virtual)

January 22, 2025
REALPYTHON.COM

PyCon+Web 2025

January 24 to January 26, 2025
PYCONWEB.COM

PyDelhi User Group Meetup

January 25, 2025
MEETUP.COM

PythOnRio Meetup

January 25, 2025
PYTHON.ORG.BR

Python Leiden User Group

January 27, 2025
MEETUP.COM

Python Sheffield

January 28, 2025
GOOGLE.COM


Happy Pythoning!
This was PyCoder’s Weekly Issue #665.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

January 21, 2025 07:30 PM UTC


"Mathspp Pydon'ts"

Decorators | Pydon't 🐍

This article teaches the decorator pattern in Python, why it exists, how to use it, and when to use it to write efficient and idiomatic Python code.

Decorators

The decorator pattern is a functional pattern that Python developers leverage to write more modular and composable functions. In this Pydon't, you will learn exactly why the decorator pattern matters, how to use it, and when to use it. You will also learn how to implement your custom decorators and more advanced use cases of decorators.

In this Pydon't, you will:

A function that did too much

The decorator pattern is a pattern that lets you complement a function with behaviour that is useful but that is orthogonal to the original objective of the function. This pattern is relevant because you do not want to overcrowd your functions, and at the same time it allows you to define this useful behaviour in a way that is reusable by other functions.

As an example, consider how you might have implemented the mathematical operation factorial before it was introduced in the module math:

# In modern Python: from math import factorial
def factorial(n):
    r = 1
    while n > 1:
        r *= n
        n -= 1
    return r

If you are calling this function a lot with a few large integers as arguments, you may want to cache the results you compute. For this effect, you may want to use a dictionary that maps inputs to outputs:

_factorial_cache = {}

def factorial(n):
    if n not in _factorial_cache:
        _n = n
        r = 1
        while _n > 1:
            r *= _n
            _n -= 1
        _factorial_cache[n] = r

    return _factorial_cache[n]

This solution is far from ideal, since you introduced a function cache that is only loosely coupled to the function it's relevant for, while also introducing code in the function that is not really relevant for the original objective of the function.

Instead of baking caching into the function, which is a poor one-time solution for something I might want to do with several functions, I can write a higher-order function that adds caching to any function I want. Let me walk you through this transformation.

Factoring out the orthogonal behaviour

Instead of...

January 21, 2025 05:11 PM UTC


Real Python

Exploring Python's tuple Data Type With Examples

In Python, a tuple is a built-in data type that allows you to create immutable sequences of values. The values or items in a tuple can be of any type. This makes tuples pretty useful in those situations where you need to store heterogeneous data, like that in a database record, for example.

Through this tutorial, you’ll dive deep into Python tuples and get a solid understanding of their key features and use cases. This knowledge will allow you to write more efficient and reliable code by taking advantage of tuples.

In this video course, you’ll learn how to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 21, 2025 02:00 PM UTC


Malthe Borch

Keeping tabs on migration processes

To manage migration processes such as adopting a new data integration tool or rehosting applications at scale from one infrastructure service to another, organizations often turn away from established work planning and management tools such as Azure DevOps Boards.

Instead, project managers often develop their own systems based on the trusted spreadsheet, typically Microsoft Excel. While these spreadsheets may be shared with the development team, this approach significantly reduces transparency and results in information redundancy. In addition, the benefits of using an existing tool are lost, perhaps most importantly reporting. When all you have is a hammer, and all that.

Alternatively, existing work items such as user stories and tasks are used to model the migration process, which is an impedance mismatch because "Migrate application X" makes for a terrible user story, tasks must be cloned for each story, the setup is time-consuming and error-prone, and the user interface becomes cluttered due to repetition. When all you have is a hammer, and all that.

A better approach is to introduce a new business process, tailored to the task at hand.

Any half-decent work management system will support this sort of tailoring. For example, with Azure DevOps, we can add a custom work item type to represent this business process, and with Jira you can create a new workflow.

An example from Azure DevOps, where I have defined a number of new states for the "in progress" category to model a rehosting migration process:

In Azure DevOps, the states and transition rules (for example to restrict transition to a specific state) can be adapted during the project's lifetime, if new requirements emerge. You can add a new portfolio backlog to manage the custom work item type on a separate kanban board.

Using this approach, you reap the full benefits of using a dedicated work management tool and we can let the system take care of reporting rather than having to dedicate a significant portion of time for what's essentially a menial task.

January 21, 2025 01:40 PM UTC