skip to navigation
skip to content

Planet Python

Last update: August 19, 2019 01:48 PM UTC

August 19, 2019


Stack Abuse

Debugging Python Applications with the PDB Module

Introduction

In this tutorial, we are going to learn how to use Python's PDB module for debugging Python applications. Debugging refers to the process of removing software and hardware errors from a software application. PDB stands for "Python Debugger", and is a built-in interactive source code debugger with a wide range of features, like pausing a program, viewing variable values at specific instances, changing those values, etc.

In this article, we will be covering the most commonly used functionalities of the PDB module.

Background

Debugging is one of the most disliked activities in software development, and at the same time, it is one of the most important tasks in the software development life cycle. At some stage, every programmer has to debug his/her code, unless he is developing a very basic software application.

There are many different ways to debug a software application. A very commonly used method is using the "print" statements at different instances of your code to see what is happening during execution. However, this method has many problems, such as the addition of extra code that is used to print the variables' values, etc. While this approach might work for a small program, tracking these code changes in a large application with many lines of code, spread over different files, can become a huge problem. The debugger solves that problem for us. It helps us find the error sources in an application using external commands, hence no changes to the code.

Note: As mentioned above, PDB is a built in Python module, so there is no need to install it from an external source.

Key Commands

To understand the main commands or tools that we have at our disposal in PDB, let's consider a basic Python program, and then try to debug it using PDB commands. This way, we will see with an example what exactly each command does.

# Filename: calc.py

operators = ['+', '-', '*', '/']
numbers = [10, 20]

def calculator():
    print("Operators available: ")
    for op in operators:
        print(op)

    print("Numbers to be used: ")
    for num in numbers:
        print(num)

def main():
    calculator()

main()

Here is the output of the script above:

Operators available:
+
-
*
/
Numbers to be used:
10
20

I have not added any comments in the code above, as it is beginner friendly and involves no complex concepts or syntax at all. It's not important to try and understand the "task" that this code achieves, as its purpose was to include certain things so that all of PDB's commands could be tested on it. Alright then, let's start!

Using PDB requires use of the Command Line Interface (CLI), so you have to run your application from the terminal or the command prompt.

Run the command below in your CLI:

$ python -m pdb calc.py

In the command above, my file's name is "calc.py", so you'll need to insert your own file name here.

Note: The -m is a flag, and it notifies the Python executable that a module needs to be imported; this flag is followed by the name of the module, which in our case is pdb.

The output of the command looks like this:

> /Users/junaid/Desktop/calc.py(3)<module>()
-> operators = [ '+', '-', '*', '/' ]
(Pdb)

The output will always have the same structure. It will start with the directory path to our source code file. Then, in brackets, it will indicate the line number from that file that PDB is currently pointing at, which in our case is "(3)". The next line, starting with the "->" symbol, indicates the line currently being pointed to.

In order to close the PDB prompt, simply enter quit or exit in the PDB prompt.

A few other things to note, if your program accepts parameters as inputs, you can pass them through the command line as well. For instance, had our program above required three inputs from the user, then this is what our command would have looked like:

$ python -m pdb calc.py var1 var2 var3

Moving on, if you had earlier closed the PDB prompt through the quit or exit command, then rerun the code file through PDB. After that, run the following command in the PDB prompt:

(Pdb) list

The output looks like this:

  1     # Filename: calc.py
  2
  3  -> operators = ['+', '-', '*', '/']
  4     numbers = [10, 20]
  5
  6     def calculator():
  7         print("Operators available: ")
  8         for op in operators:
  9             print(op)
 10
 11         print("Numbers to be used: ")
(Pdb)

This will show the first 11 lines of your program to you, with the "->" pointing towards the current line being executed by the debugger. Next, try this command in the PDB prompt:

(Pdb) list 4,6

This command should display the selected lines only, which in this case are lines 4 to 6. Here is the output:

  4     numbers = [10, 20]
  5
  6     def calculator():
(Pdb)

Debugging with Break Points

The next important thing that we will learn about is the breakpoint. Breakpoints are usually used for larger programs, but to understand them better we will see how they function on our basic example. Break points are specific locations that we declare in our code. Our code runs up to that location and then pauses. These points are automatically assigned numbers by PDB.

We have the following different options to create break points:

  1. By line number
  2. By function declaration
  3. By a condition

To declare a break point by line number, run the following command in the PDB prompt:

(Pdb) break calc.py:8

This command inserts a breakpoint at the 8th line of code, which will pause the program once it hits that point. The output from this command is shown as:

Breakpoint 1 at /Users/junaid/Desktop/calc.py: 8
(Pdb)

To declare break points on a function, run the following command in the PDB prompt:

(Pdb) break calc.calculator

In order to insert a breakpoint in this way, you must declare it using the file name and then the function name. This outputs the following:

Breakpoint 2 at /Users/junaid/Desktop/calc.py:6

As you can see, this break point has been assigned number 2 automatically, and the line number i.e. 6 at which the function is declared, is also shown.

Break points can also be declared by a condition. In that case the program will run until the condition is false, and will pause when that condition becomes true. Run the following command in the PDB prompt:

(Pdb) break calc.py:8, op == "*"

This will track the value of the op variable throughout execution and only break when its value is "*" at line 8.

To see all the break points that we have declared in the form of a list, run the following command in the PDB prompt:

(Pdb) break

The output looks like this:

Num Type         Disp Enb   Where
1   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py: 8
2   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py: 6
    breakpoint already hit 1 time
3   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py: 8
    stop only if op == "*"
(Pdb)

Lastly, let's see how we can disable, enable, and clear a specific break point at any instance. Run the following command in the PDB prompt:

(Pdb) disable 2

This will disable breakpoint 2, but will not remove it from our debugger instance.

In the output you will see the number of the disabled break point.

Disabled breakpoint 2 at /Users/junaid/Desktop/calc.py:6
(Pdb)

Lets print out the list of all break points again to see the "Enb" value for breakpoint 2:

(Pdb) break

Output:

Num Type         Disp Enb   Where
1   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py:8
2   breakpoint   keep no    at /Users/junaid/Desktop/calc.py:4 # you can see here that the "ENB" column for #2 shows "no"
    breakpoint already hit 1 time
3   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py:8
    stop only if op == "*"
(Pdb)

To re-enable break point 2, run the following command:

(Pdb) enable 2

And again, here is the output:

Enabled breakpoint 2 at /Users/junaid/Desktop/calc.py:6

Now, if you print the list of all break points again, the "Enb" column's value for breakpoint 2 should show a "yes" again.

Let's now clear breakpoint 1, which will remove it all together.

(Pdb) clear 1

The output is as follows:

Deleted breakpoint 1 at /Users/junaid/Desktop/calc.py:8
(Pdb)

If we re-print the list of breakpoints, it should now only show two breakpoint rows. Let's see the "break" command's output:

Num Type         Disp Enb   Where
2   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py:4
    breakpoint already hit 1 time
3   breakpoint   keep yes   at /Users/junaid/Desktop/calc.py:8
    stop only if op == "*"

Exactly what we expected.

Before we move ahead from this section, I want to show you all what is displayed when we actually run the code until the specified breakpoint. To do that, let's clear all the previous breakpoints and declare another breakpoint through the PDB prompt:

1. Clear all breakpoints

(Pdb) clear

After that, type "y" and hit "Enter". You should see an output like this appear:

Deleted breakpoint 2 at /Users/junaid/Desktop/calc.py:6
Deleted breakpoint 3 at /Users/junaid/Desktop/calc.py:8

2. Declare a new breakpoint

What we wish to achieve is that the code should run up until the point that the value of num variable is greater than 10. So basically, the program should pause before the number "20" gets printed.

(Pdb) break calc.py:13, num > 10

3. Run the code until this breakpoint

To run the code, use the "continue" command, which will execute the code until it hits a breakpoint or finishes:

(Pdb) continue

You should see the following output:

Operators available:
+
-
*
/
Numbers to be used:
10
> /Users/junaid/Desktop/calc.py(13)calculator()
-> print(num)

This is exactly what we expected, the program runs until that point and then pauses, now it's up to us if we wish to change anything, inspect variables, or if we want to run the script until completion. To instruct it to run to completion, run the "continue" command again. The output should be the following:

20
The program finished and will be restarted
> /Users/junaid/Desktop/calc.py(3)<module>()
-> operators = [ '+', '-', '*', '/' ]

In the above output, it can be seen that the program continues from exactly where it left off, runs the remaining part, and then restarts to allow us to debug it further if we wish. Let's move to the next section now.

Important Note: Before moving forward, clear all the breakpoints by running the "clear" command, followed by typing in "y" in the PDB prompt.

Next and Step Functions

Last, but not the least, let's study the next and step functions; these will be very frequently used when you start debugging your applications, so let's learn what they do and how they can be implemented.

The step and next functions are used to iterate throughout our code line by line; there's a very little difference between the two. While iterating, if the step function encounters a function call, it will move to the first line of that function's definition and show us exactly what is happening inside the function; whereas, if the next function encounters a function call, it will run all lines of that function in a single go and pause at the next function call.

Confused? Let's see that in an example.

Re-run the program through PDB prompt using the following command:

$ python -m pdb calc.py

Now type in continue in the PDB prompt, and keep doing that until the program reaches the end. I'm going to show a section of the whole input and output sequence below, which is sufficient to explain the point. The full sequence is quite long, and would only confuse you more, so it will be omitted.

> /Users/junaid/Desktop/calc.py(1)<module>()
-> operators = [ '+', '-', '*', '/' ]
(Pdb) step
> /Users/junaid/Desktop/calc.py(2)<module>()
-> numbers = [ 10, 20 ]
.
.
.
.
> /Users/junaid/Desktop/calc.py(6)calculator()
-> print("Operators available: " )
(Pdb) step
Operators available:
> /Users/junaid/Desktop/calc.py(8)calculator()
-> for op in operators:
(Pdb) step
> /Users/junaid/Desktop/calc.py(10)calculator()
-> print(op)
(Pdb) step
+
> /Users/junaid/Desktop/calc.py(8)calculator()
-> for op in operators:
(Pdb) step
> /Users/junaid/Desktop/calc.py(10)calculator()
-> print(op)

.
.
.
.

Now, re-run the whole program, but this time, use the "next" command instead of "step". I have shown the input and output trace for that as well.

> /Users/junaid/Desktop/calc.py(3)<module>()
-> operators = ['+', '-', '*', '/']
(Pdb) next
> /Users/junaid/Desktop/calc.py(4)<module>()
-> numbers = [10, 20]
(Pdb) next
> /Users/junaid/Desktop/calc.py(6)<module>()
-> def calculator():
(Pdb) next
> /Users/junaid/Desktop/calc.py(15)<module>()
-> def main():
(Pdb) next
> /Users/junaid/Desktop/calc.py(18)<module>()
-> main()
(Pdb) next
Operators available:
+
-
*
/
Numbers to be used:
10
20
--Return--

Alright, now that we have output trace for both of these functions, let's see how they are different. For the step function, you can see that when the calculator function is called, it moves inside that function, and iterates through it in "steps", showing us exactly what is happening in each step.

However, if you see the output trace for the next function, when "main" function is called, it doesn't show us what happens inside that function (i.e. a subsequent call to the calculator function), and then directly prints out the end result in a single go/step.

These commands are useful if you are iterating through a program and want to step through certain functions, but not others, in which case you can utilize each command for its purposes.

Conclusion

In this tutorial, we learned about a sophisticated technique for debugging python applications using an built-in module named PDB. We dove into the different troubleshooting commands that PDB provides us with, including the next and step statements, breakpoints, etc. We also applied them to a basic program to see them in action.

August 19, 2019 12:41 PM UTC


Codementor

How to Find and Hire a Python/Django Development Company

Learn how to find and hire a Python development company.

August 19, 2019 10:52 AM UTC

Top 7 Compelling Reasons to Hire Ukrainian Developers

Find out why it’s worth hiring Ukrainian developers.

August 19, 2019 10:51 AM UTC


PSF GSoC students blogs

Weekly blog #6 (week 12): 12/08 to 18/08

Hello there! We are at the end of GSoC! Week 12 was the last week where we’d do any major coding. We still have the next week where we do a submission (and a final check-in?), so let me keep it short and tell you what I worked on this week and what I got stuck on.

 

This week I focused on my two open PR’s - the text PR and the PyTorch PR.

 

For the text PR, I did a couple of things:

 

One interesting issue that arose as a result of the above change (defaulting to “higher level” layers) was that my integration tests broke and the tutorial explanations didn’t look as good. With the guidance of my mentor I resolved the failing tests by explicitly picking a layer that works. For the tutorials I made a comment that using earlier layers gave better results. Interesting issues with text!

 

Next for the PyTorch PR I did the following:

 

I got stuck on the text part of PyTorch because my test model was quite large (large word embeddings). It looks like I will have to train my own embedding layer with a small vocabulary.

 

What’s next

 

That’s all the technical details! I think we have one more blog next week, so I can talk about GSoC in broader terms then?

 

Thanks for reading once again!

Tomas Baltrunas

August 19, 2019 10:34 AM UTC

Twelveth week of GSoC: Getting ready for the final week of GSoC

My GSoC is soon coming to an end so I took some time to write down what still needs to be done:

Making a release of MNE-BIDS

In the past months, there were substantial additions, fixes, and cosmetic changes made to the codebase and documentation of MNE-BIDS. The last release has happened in April (about 4 months ago) and we were quite happy to observe some issues and pull requests raised and submitted by new users. With the next release we can provide some new functionality for this growing user base.

Handling coordinates for EEG and iEEG in MNE-BIDS

In MNE-BIDS, the part of the code that handles the writing of sensor positions in 3D space (=coordinates) is so far restricted to MEG data. Extending this functionality to EEG and iEEG data has been on the to do list for a long time now. Fortunately, I have been learning a bit more about this topic during my GSoC, and Mainak has provided some starting points in an unrelated PR that I can use to finish this issue. (After the release of MNE-BIDS though, to avoid cramming in too much last-minute content before the release)

Writing a data fetcher for OpenNeuro to be used in MNE-Python

While working with BIDS and M/EEG data, the need for good testing data has come up time and time again. For the mne-study-template we solved this issue with a combination of DataLad and OpenNeuro. Meanwhile, MNE-BIDS has its own dataset.py module ... however, we all feel like this module is duplicating the datasets module of MNE-Python and not advancing MNE-BIDS. Rather, it is confusing the purpose of MNE-BIDS.

As a solution, we want to write a generalized data fetching function for MNE-Python that works with OpenNeuro ... without adding the DataLad (and hence Git-Annex) dependency). Once this fetching function is implemented, we can import it in MNE-BIDS and finally deprecate MNE-BIDS' dataset.py module.

Make a PR in MNE-Python that will support making Epochs for duplicate events (will fix ds001971 PR)

In MNE-Python, making data epochs is not possible, if two events share the same time. This became apparent with the dataset ds001971 that we wanted to add to the mne-study-template pipeline: https://github.com/mne-tools/mne-study-template/pull/41. There was a suggestion on how to solve this issue by merging the event codes that occurred at the same time. Once this fix is implemented in MNE-Python, we can use this to finish the PR in the mne-study-template.

Salvage / close the PR on more "read_raw_bids" additions

Earlier in this GSoC, I made a PR intended to improve the reading functionality of MNE-BIDS (https://github.com/mne-tools/mne-bids/pull/244). However, the PR was controversially discussed, because it was not leveraging BIDS and instead relying on introducing a dictionary as a container for keyword arguments.

After lots of discussion, we agreed to solve the situation in a different way (by leveraging BIDS) and Mainak made some initial commits into that direction. However in the further progress, the PR was dropped because other issues had higher priority.

Before finishing my GSoC, I want to salvage what's possible from this PR and then close it ... and improving the original issue report so that the next attempt at this PR can rely on a more detailed objective.

August 19, 2019 10:03 AM UTC


Erik Marsja

The Easiest Data Cleaning Method using Python & Pandas

The post The Easiest Data Cleaning Method using Python & Pandas appeared first on Erik Marsja.

In this post we are going to learn how to do simplify our data preprocessing work using the Python package Pyjanitor. More specifically, we are going to learn how to:

That is, we are going to learn how clean Pandas dataframes using Pyjanitor. In all Python data manipulation examples, here we are also going to see how to carry out them using only Pandas functionality.

What is Pyjanitor?

What is Pyjanitor? Before we continue learning on how to use Pandas and Pyjanitor to clean our datasets, we will learn about this package. The python package Pyjanitor extends Pandas with a verb-based API. This easy to use API is providing us with convenient data cleaning techniques. Apparently, it started out as a port of the R package janitor. Furthermore, it is inspired by the ease-of-use and expressiveness of the r-package dplyr. Note, there are some different ways how to work with the methods and this post will not cover all of them (see the documentation).

How to install Pyjanitor

There are two easy methods to install Pyjanitor:

1. Installing Pyjanitor using Pip

pip install pyjanitor

2. Installing Pyjanitor using Conda:

conda -c install conda-forge pyjanitor

Now that we know what Pyjanitor is and how to install the package we soon can continue the Python data cleaning tutorial by learning how to remove missing values from Pandas. Note, that this Pandas tutorial will walk through each step on how to do it using Pandas and Pyjanitor. In the end, we will have a complete data cleaning example using only Pyjanitor and a link to a Jupyter Notebook with all code.

Fake Data

In the first Python data manipulation examples, we are going to work with a fake dataset. More specifically, we are going to create a dataframe, with an empty column, and missing values. In this part of the post we are, further, going to use the Python packages SciPy, and NumPy. That is, these packages also need to be installed.

In this example we are going create three columns; Subject, RT (response time), and Deg. To create the response time column, we will use SciPy norm to create data that is normally distributed.

import numpy as np
import pandas as pd
from scipy.stats import norm
from random import shuffle

import janitor

subject = ['n0' + str(i) for i in range(1, 201)]

Python Normal Distribution using Scipy

In the next code chunk we create a variable, for response time, using a normal distribution.

a = 457
rt = norm.rvs(a, size=200)

Shuffling the List and Adding Missing Values

Furthermore, we are adding some missing values and shuffling the list of normally distirbuted data:

# Shuffle the response times
shuffle(rt)
rt[4], rt[9], rt[100] = np.nan, np.nan, np.nan

Dataframe from Dictionary

Finally, we are creating a dictionary of our two variables and use the dictionary to create a Pandas dataframe.

data = {
    'Subject': subject,
    'RT': rt,
}

df = pd.DataFrame(data)

df.head()
Dataframe created from dictDataframe created from dictionary

Data Cleaning in Python with Pandas and Pyjanitor

How to Add a Column to Pandas Dataframe

Now that we have created our dataframe from a dictionary we are ready to add a column to it. In the examples, below, we are going to use Pandas and Pyjanitors method.

1. Append a Column to Pandas Dataframe

It’s quite easy to add a column to a dataframe using Pandas. In the example below we will append an empty column to the Pandas dataframe:

df['NewColumnName'] = np.nan
df.head()
How to add Single Column to DataframeColumn added to dataframe

2. Adding a Column to Pandas Dataframe using Pyjanitor

Now, we are going to use the method add_column to append a column to the dataframe. Adding an empty column is not as easy as using the method above. However, as you will see towards the end of this post, we can use all of the methods when creating our dataframe:

newcolvals = [np.nan]*len(df['Subject'])
df = df.add_column('NewColumnName2', newcolvals)
df.head()
Append column to Pandas dataframe

How to Remove Missing Values in Pandas Dataframe

It is quite common that our dataset is far from complete. This may be due to error in the measurement instruments, people forgetting, or refusing, to answer certain questions, amongst many other things. Despite the reason behind missing information these rows are called missing values. In the framework of Pandas the missing values are coded by the symbol NA, much like in R statistical environment. Pandas have the function isna() to help us identify missings in our dataset. If we want to drop missing values, Pandas have the function dropna().

1 Dropping Missing Values using Pandas dropna method

In the code example below we are dropping all rows with missing values. Note, if we want to modify the dataframe we should add the inplace parameter and set it to true.

df.dropna(subset=['RT']).head()

Dropping Missing Values from Pandas Dataframe using PyJanitor

The method to drop missing values from a Pandas Dataframe using Pyjanitor is the same the one above. That is, we are going to use the the dropna method. However, when using Pyjanitor we also use the parameter subset to select which column(s) we are going to use when removing missing data from the dataframe:

df.dropna(subset=['RT'])

How to Remove an Empty Column from Pandas Dataframe

In the next Pandas data manipulation example, we are going to remove the empty column from the dataframe. First, we are going to use Pandas to remove the empty column and, then, we are going to use Pyjanitor. Remember, towards the end of the post we will have a complete example in which we carry out all data cleaning while actually creating the Pandas Dataframe.

1. Removing an Empty Column from Pandas Dataframe

When we want to remove an empty column (e.g., with missing values) we use the Pandas method dropna again. However, we use the axis method and set it to 1 (for column). Furthermore, we also have to use the parameter how and set it to ‘all’. If we don’t it will remove any column with missing values

df.dropna(axis=1, how='all').head()
removing empty columnsRemoved empty columns

2. Deleting an Empty Column from Pandas Dataframe using Pyjanitor

It’s a bit easier to remove an empty column using Pyjanitor:

df.remove_empty()

How to Rename Columns in Pandas Dataframe

Now that we know how to remove missing values, add a column to a Pandas dataframe, and how to remove a column, we are going to continue this data cleaning tutorial learning how to rename columns.

For instance, in the post where we learned how to load data from a JSON file to a Pandas dataframe, we renamed columns to make it easier to work with the dataframe later. In the example below, we will read a JSON file, and rename columns using both Pandas dataframe method rename and Pyjanitor

import requests
from pandas.io.json import json_normalize

url = "https://datahub.io/core/s-and-p-500-companies-financials/r/constituents-financials.json"
resp = requests.get(url=url)

df = json_normalize(resp.json())
df.iloc[:,0:6].head()

More about loading data to dataframes:

1 Renaming Columns in Pandas Dataframe

As can be seen in the image above, there are some whitespaces and special characters that we want to remove. In the first renaming columns example we are going to use Pandas rename method together with regular expressions to rename the columns (i.e., we are going to replace whitespaces and \ with underscores).

import re

df.rename(columns=lambda x: re.sub('(\s|/)','_',x),
          inplace=True)
df.keys()

2. How to Rename Columns using Pyjanitor and clean_names

The task to rename a column (or many columns) is way easier using Pyjanitor. In fact, when we have imported this Python package, we can just use the clean_names method and it will give us the same result as using Pandas rename method. In fact, using clean_names we also get all letters in the column names to lowercase:

df = df.clean_names().head()
df.keys()

How to Clean Data when Loading the Data from Disk

The cool thing with using Pyjanitor to clean our data is that we can do use all of the above methods when loading our data. For instance, in the final data cleaning example we are going to add a column to the dataframe, remove empty columns, drop missing data, and clean the column names. This is what makes working with Pyjanitor our lives easier.

data_id = [1]*200
df = (
    pd.read_csv('./SimData/DF_NA_Janitor.csv',
                index_col=0)
    .add_column('data_id', data_id)
    .remove_empty()
    .dropna()
    .clean_names()
)

df.head()

Aggregating Data using Pyjanitor

In the last example we are going to use Pandas methods agg, groupby, and reset_index together with the Pyjanitor method collapse_levels to calculate the mean and standard for each sector:

df.groupby('sector').agg(['mean',
                          'std']).collapse_levels().reset_index()

More about grouping and aggregating data using Python and Pandas:

Conclusion:

In this post we have learned how to do some data cleaning methods. Specifically, we have learned how to append a column to a Pandas dataframe, remove empty columns, handling missing values, and renaming the columns (i.e., getting better column names). There are, of course, many more data cleaning methods available, both when it comes to Pandas and Pyjanitor.

In conclusion, the methods added by the Python package are both s imilar to the one of the R-package janitor and dplyr. These methods will make our lives easier when preprocessing our data.

What is your favorite data cleaning method and/or Package? It can be either using R, Python, or any other programming language. Leave a comment below!

Add a column to a Pandas dataframe > Remove missing values > Remove an empty column > Cleaning up column names And all of this using only a few lines of code (end of the post)." data-app-id="27344152" data-app-id-name="category_below_content">

The post The Easiest Data Cleaning Method using Python & Pandas appeared first on Erik Marsja.

August 19, 2019 08:27 AM UTC


Reuven Lerner

Weekly Python Exercise A3 (beginner objects) is open

Image

If you’ve been programming in Python for any length of time, then you’ve undoubtedly heard that “everything is an object.”

But what does that mean? And who cares?  And what effect does that have on you as a developer — or on Python, as a language?

Indeed, how can (and should) you take advantage of Python’s object-oriented facilities to make your code more readable, maintainable, standard, and (dare I say it) Pythonic?

If you’re relatively new to Python, and have been struggling with some of these same questions, or if you’re just wondering about the differences between instances, classes, methods, and attributes, then I have good news for you: The upcoming cohort of Weekly Python Exercise is all about object-oriented programming.

In this 15-week course, you’ll learn in the best way I know, by solving problems and discussing them with others. As you work through the exercises, you’ll get a better understanding of:

Weekly Python Exercise, of course, is a family of 15-week classes designed to help improve your Python fluency.  Each course works the same:

WPE A-level courses are for beginners, while B-level courses are for more advanced Python developers. But you can take any or all of them, in any order — and there’s no overlap between the exercises in these classes and any of the previous books/courses I’ve given.

This new cohort (A3) will be starting on Tuesday, September 17th.  To join, you must sign up before September 10th.  But if you sign up by September 3rd, you’ll get the early-bird discount, bringing the price down to $80 — more than $20 off the full price.

I won’t be offering these exercises for at least one more year. So if you want to sharpen your OO skills before the autumn of 2020, then you should act now.

As always, you can get an even better price if you’re a student, pensioner/retiree/senior citizen, or living permanently outside of the world’s 30 richest countries. Just reply to this e-mail, and I”ll send you the appropriate coupon code.

And if several people (at least five) from your company want to join together?  Let me know, and I’ll give you an additional discount, too.

There’s lots more to say about Weekly Python Exercise, now in its third year of helping Python developers from around the world to write better code — doing more in less time, and getting better jobs than before.  You can read more, and try to some sample exercises, at https://WeeklyPythonExercise.com/ .

But if you’ve always wanted to improve your fluency with Python objects, then you can just sign up at https://WeeklyPythonExercise.com/ .

Don’t wait, though! The early-bird discount ends on September 3rd.

The post Weekly Python Exercise A3 (beginner objects) is open appeared first on Reuven Lerner.

August 19, 2019 08:10 AM UTC


Mike Driscoll

PyDev of the Week: Paul Ganssle

This week we welcome Paul Ganssle (@pganssle) as our PyDev of the Week. Paul is the maintainer of the dateutil package and also a maintainer of the setuptools project. You can catch up with Paul on his website or check out some of his talks. Let’s take a few moments to get to know Paul better!

Can you tell us a little about yourself (hobbies, education, etc):

One thing that sometimes surprises people is that I started out my career as a chemist. I have a bachelor’s degree in Chemistry from the University of Massachusetts, Amherst and a Ph.D in Physical Chemistry from the University of California, Berkeley. After that I worked for two years building NMR (nuclear magnetic resonance) devices for use in oil wells. In 2015 I was looking for a career with a bit more flexibility in terms of location and I made the switch to software development; one thing that is nice about the software industry is that tech companies are not afraid to hire people with non-traditional backgrounds if they know how to code.

Paul Ganssle

I have the typical assortment of “hacker” and “autodidact” hobbies – learning languages, picking locks, electronics projects, etc. One of my favorite projects (which has unfortunately fallen a bit by the wayside) is my HapticapMag, a haptic compass that I built into a hat. I had it up and working for 2 or 3 weeks, but some parts broke and I never got around to fixing it. My tentative plan is to start up some new electronics projects in 4-5 years, when my son is old enough to be interested in that sort of thing.

Why did you start using Python?

I have two origin stories for this, actually. The more boring one is that around 2008 a friend of mine told me about this cool and increasingly popular programming language called Python that I should definitely learn, and I sort of picked it up and started using it for little system automation tasks.

What really got me into Python, though, was when I illustrated some point I was making in a forum post using a graph that I had made in Matlab and someone complained about the terrible aliasing in the plot and suggested I use matplotlib instead. I tried it out and the plots were so much better that I was instantly hooked. After that, I moved everything I could over from Matlab to Python and never looked back.

What other programming languages do you know and which is your favorite?

It’s hard to say when you “know” a programming language, but the programming languages I’m most confident with are C++, C and Rust (and probably some others like Matlab that I haven’t used in years but once knew pretty well). I can write enough Javascript to get by, but to say I know it would be kind of like saying I speak Spanish because I can order a beer and ask where the bathroom is.

At the moment, I’m very excited about Rust, which is a memory-safe systems programming language targeting the use cases where C and C++ currently predominate. One of the very nice things about Rust is that there is a very enthusiastic community out there and it already has a flourishing ecosystem of third party packages, which I think is one reason there’s a lot of excitement about Rust in the Python community.

What projects are you working on now?

I do a lot of maintenance tasks that might not be considered “working on a project” for packages I maintain or help maintain like dateutil, setuptools and CPython (and I try to monitor the PyO3 issue tracker, though I have no official standing in that project); things like reviewing PRs, commenting on issues and participating in discussions.

In terms of features and other improvements, I’ve been trying to prepare a proof of concept for PEP 517-compatible editable installations and I’ve started working a bit on adding time zones to the standard library. I also have a few smaller projects in various states of completion that I occasionally work on, like my library variants, which I’ve given a few talks about, or my only half-complete library pyminimp3 – Python bindings around the C minimp3 library for processing MP3 files.

Which Python libraries are your favorite (core or 3rd party)?

I am a big fan of hypothesis, the property-based testing framework. With hypothesis, you write assertions about a property that your code has (e.g. “this parse operation is the inverse of this format operation”), given a domain of inputs (integers, datetimes, etc), and hypothesis randomly generates example inputs for you. I was originally very uneasy about the fact that the tests you write with it are non-deterministic, but I got over that very quickly; in reality, if there’s a bug in your code, it’s very rare for hypothesis to miss it on the first try, particularly if you are running tests for multiple platforms and multiple interpreters on each PR. The bigger problem I’ve had introducing hypothesis into a code base is that it finds a bunch of obscure edge cases that you haven’t handled, so you need to fix a bunch of bugs just to be able to start using it!

How did you become a core developer of Python?

I have been peripherally involved in Python development since I was asked to comment on PEP 495 (adding the fold attribute) as maintainer of dateutil. In late 2017 I started contributing bug fixes and features more actively, and monitoring the issue tracker for datetime-related issues. One of the biggest bottlenecks in the CPython development process is high-quality reviews, which is something I’m used to doing from years of maintaining open source packages, so I stepped in and started reviewing PRs as well.

In terms of the actual process of becoming a core dev, I took an increasingly common path for newer core devs, which is that Victor Stinner noticed my contributions and asked me if I was interested in eventually becoming a core dev; after I said yes, he and Pablo Galindo Salgado agreed to mentor me through the process of becoming a bug triager and eventually core developer.

Which modules do you work on and why?

My main expertise is with the datetime and related modules (e.g. time); this is partially a result of my randomly stumbling into maintaining dateutil back in early 2015 and partially down to the fact that most other people reallywould prefer not to work on datetimes. I’ve also been involved in packaging as well (as a maintainer of setuptools), but these days distutils doesn’t see much improvement because it is almost always preferable to make the improvements in setuptools instead.

Do you have any advice for others who would like to contribute to Python core?

If you have the opportunity to attend a sprint (as part of a conference or otherwise), that is usually the best way to make a first contribution to any open source project. If not, I think the best advice I can give is the same I would give for contributing to any open source project:

  1. Make a small PR with tightly scoped changes: these are much easier to review and merge.
  2. Minimize or eliminate changes to the public API: every change to the public API is something that the maintainers of that module will have to support indefinitely. Behind-the-scenes changes are a lot easier to accept because they’re a lot easier to undo.
  3. Add tests! Good tests are often the hardest part of a PR to write, so it’s very unlikely that your contribution will be accepted without them. They also make a contribution much easier to review, because they demonstrate exactly what your code is supposed to do and they enforce the behavior!

If you’re already a skilled reviewer, I also recommend looking around in the issue tracker for unreviewed PRs and giving your comments. This will really help the project and is a sure fire way to build a reputation as a solid contributor.

Is there anything else you’d like to say?

Thank you for asking me to do this interview, and thanks to all the readers who’ve indulged my verbosity by reading all the way to the end.

Thanks for doing the interview, Paul!

The post PyDev of the Week: Paul Ganssle appeared first on The Mouse Vs. The Python.

August 19, 2019 05:05 AM UTC


PSF GSoC students blogs

Week 10: Cartopy's EPSG

What did you do this week?

This week I started adding support to plot the projections from EPSG code natively using Cartopy's own feature `epsg([epsg code])` from its CRS class.  Apart from that I also successfully corrected the feature to disable or enable gridlines for map which was broken while migrating the UI side code to Cartopy.

What is up next? 

This, EPSG support, still needs a lot more refinement and the way UI handles it is still needs a little improvement so I will work on that in the coming week just before the final evaluation week.

Did you faced any blockers?

Yes, I did faced a couple of blockers specially with enable/disable of gridlines but eventually looking up the documentation it got solved in not much time.

August 19, 2019 03:15 AM UTC

August 18, 2019


William Minchin

Image Process Plugin 1.2.0 for Pelican Released

Image Process is a plugin for Pelican, a static site generator written in Python.

Image Process let you automate the processing of images based on their class attribute. Use this plugin to minimize the overall page weight and to save you a trip to Gimp or Photoshop each time you include an image in your post.

Image Process is used by this blog’s theme to resize the source images so they are the correct size for thumbnails on the main index page and the larger size they are displayed at on top of the articles.

This Release

Version 1.2.0 of the plugin has been released and posted PyPI.

The biggest change this version brings is support for Pelican version 4. Thanks to Nick Perkins for reporting the issue, and to Therry van Neerven for providing a Pull Request I could crib a solution from.

I’ve also made some improvements in the test suite. It still fails on Windows due to issues with filepath separators, but most tests now pass on Travis. The remaining failing test appears to be due to some changes in exactly how Pillow (the image processing library used here) transforms the images.

Upgrading

To upgrade simply use pip:

pip install minchin.pelican.plugins.image_process --upgrade

August 18, 2019 11:13 PM UTC


Python Sweetness

Mitogen v0.2.8 released

Mitogen for Ansible v0.2.8 has been released. This version (finally) supports Ansible 2.8, comes with a supercharged replacement fetch module, and includes roughly 85% of what is needed to implemement fully asynchronous connect.

As usual a huge slew of fixes are included. This is a bumper release, running to over 20k lines of diff. Get it while it's hot, and as always, bug reports are welcome!

August 18, 2019 08:45 PM UTC


PSF GSoC students blogs

Blog post: Week 12

 

Hi everyone!

End of summer is close, and we are doing some interesting stuff here :)

This week we mainly focused on doing some integrity tests to our new atomic files, and finally running some TARDIS simulations with them! My objective was to ensure we're getting an identical atomic files with my new module.

Next week I'll run more simulations and polish the documentation. Also, I started to write my final evaluation for GSoC'19.

Last months have been really fun and exciting!

August 18, 2019 04:37 PM UTC

Coding Period: Week 12

Hello everyone! This is one of the last blogposts. This week I worked on the last features I needed to add in my GSoC period i.e. hg update --abort

What did I do this week?
As stated in the previous week I worked on hg transplant --abort. As suggested by @Pulkit it was modified to --stop flag and finally got it merged [1]. I also resolved most of the hg continue series of patches.
Later this week I was finally able to deduce a logic for
hg update --abort and sent a patch for that [2]. It is still under review though.

What is coming up next?

As almost all patches are dealt with I will modify the patch for hg update --abort as suggested by the community and get that merged.
I will also the continue patches for
evolve at the beginning of the next week and get them merged too.
This issue has been one of the most requested features and I am really happy that I could work on this and get it resolved. Also in this week, I will be working on my GSoC 19 blog which is supposed to be attached as prescribed by final product submission guidelines.

Did you get stuck anywhere?
I got stuck regarding the logic of hg update --abort for long enough but a deeper understanding of mergestate and associated functions help me deduce a logic which was not functional initially but later this week I was able to make it work. I am still adding more extensive test cases for it to make the feature perfect. It is one of the most interesting features that I have worked on this summer which involved developing a deeper understanding of the merge and update workflow.

August 18, 2019 02:09 PM UTC

Coding period: week #12

This is going to be a blog post about fixing an urgent bug and splitting an old RFC patch of mine into stack to be landed. Also, finishing up something I was working for a long time.

 

What did I do this week?

My mentor introduced me to an issue[1] which is an urgent bug to get fixed. While pushing bookmarks, the bookmarks pointing to secret changesets are also pushed, but the changesets are not pushed. So, while pulling the bookmarks, the changeset it is pointing to will be unknown. We wanted to abort on pushing bookmarks which points to secret changes. I sent two patches. One[2] which demonstrates the issue which has tests and the other[3] one which fixes the issue. I also sent a patch[4] to abort on using both `--interactive` and `--keep` together with `unshelve` which got merged. I worked on two patches[5][6] to enhance the config registrar and its usage. Then, there was an old RFC patch[7] which introduces the first prototype of storing/restoring mergestate with `shelve`. I had to split that into a stack[8][9][10][11] to change its priority from RFC to patches which can be landed in code.

 

What is coming up next?

My mentor has requested some follow-ups to some of the merged patches. I shall be working on them. Also, I will be addressing the reviews to the WIP patches that are active now and cleanup the work. Then, I'll work on documentation of the work I did and the importance of that.

 

Did you get stuck anywhere?

Yes. While adding an abort on trying to exchange bookmarks which points to secret changesets, I was unable to solve a test case initially. My mentor asked me to sent a patch to the bitbucket and I did that. He gave me reviews there and it was later fixed by me and I sent the patch to the core.

August 18, 2019 01:22 PM UTC

August 17, 2019


PSF GSoC students blogs

6th Last or Inception of new road Blog Post

My best three months now come close to end but I am very happy from myself because I have completed my project before time. Upto this time my all project goals are completed i.e MVP, test and refactoring of code all of them is done. Just one pr regarding the reconnecting is on verge of merging which is a extended goals for the gsoc period. What I can say about past 3 months I don't have words to describe these days, I learned a lot of things like WebSocket, Gatsby and Jest. I spent entire time learning these technology and implementing them in my GatsbyJs preview project. Websocket for handling the websocket event fired from the backend of Plone CMS. Gatsby API, docs and gatsby-source-filesystem to learn how to implement the functionality for updating content on real time. Jest for writing test of implemented feature. I big thankyou to my mentor datakurre who helped me in completing the project. He was active throughout the project and solve my doubts regarding the project. He helped me out whenever I stuck at certain point. I learned a lot of things at the end of this program which is not avialable to most developer i.e refactoring the codebase. I have flatten the codebase by splitting the large function into the smaller ones. Out project contains a util.Js file which contains all the methods and function for the entire project. I refactored this utils file into small separate function file and now our codebase look neat, clean and enhanced code readability. Splitting into different files also helps us in refactoring the codebase from js to typescript which is our future plan. During the implementation I faced a interesting problem i.e how I grouped different functions into single file and separate those which is not related. I make a proper hierarchical order and resolve this issue.

Thank you to Python Foundation to giving me such a wonderful opportunity to work on real world project which is used by different people all around the world. I learned some cool concept and get a valuable experience of my life :)

August 17, 2019 09:43 AM UTC


TechBeamers Python

Python Filter()

Python filter() function applies another function on a given iterable (List/String/Dictionary, etc.) to test which of its item to keep or discard. In simple words, it filters the ones that don’t pass the test and returns the rest as a filter object. The filter object is of the iterable type. It retains those elements which the function passed by returning True. We can also convert it to List or Tuple or other types using their factory functions. In this tutorial, you’ll learn how to use the filter() function with different types of sequences. Also, you can refer to the examples

The post Python Filter() appeared first on Learn Programming and Software Testing.

August 17, 2019 06:54 AM UTC


Codementor

Creating a Docker Swarm Stack with Terraform (Terrascript Python), Persistent Volumes and Dynamic HAProxy.

This article demonstrate how to create a Docker Swarm cluster with Volume, Firewall, DNS and Load Balance using terraform wrapped by a python script.

August 17, 2019 03:20 AM UTC

August 16, 2019


Brett Cannon

How do you verify that PyPI can be trusted?

A co-worker of mine attended a technical talk about how Go's module mirror works and he asked me whether there was something there that Python should do.

Now Go's packaging story is rather different from Python's since in Go you specify the location of a module by the URL you fetch it from, e.g. github.com/you/hello specifies the hello module as found at https://github.com/you/hello. This means Go's module ecosystem is distributed, which leads to interesting problems of caching so code doesn't disappear off the internet (e.g. a left-pad incident), and needing to verify that a module's provider isn't suddenly changing the code they provide with something malicious.

But since the Python community has PyPI our problems are slightly different in that we just have to worry about a single point of failure (which has its own downsides). Now obviously you can run your own mirror of PyPI (and plenty of companies do), but for the general community no one wants to bother to set something up like that and try to keep it maintained (do you really need your own mirror to download some dependencies for the script you just wrote to help clean up your photos from your latest trip?). But we should still care about whether PyPI has been compromised such that packages hosted there have not been tampered with somehow between when the project owner uploaded their release's files and from when you download them.

Verifying PyPI is okay

So the first thing we can do is see if we can tell if PyPI has been compromised somehow. This takes on two different levels of complexity. One is checking if post-release anything nefarious has occurred. The fancier step is to provide a way for project owners to tell other folks what they are giving PyPI to act as an auditor.

Post-release trust

In a post-release scenario you're trusting that PyPI received a release from a project owner successfully and safely. What you're worrying about here is that at some later point PyPI gets compromised and someone e.g. swapped out the files in requests so that someone could steal some Bitcoin. So what are some options here?

Trust PyPI

The simplest one is don't worry about it. 😁 PyPI is run by some very smart, dedicated folks and so if you feel comfortable trusting them to not mess up then you can simply not stress about compromises.

Trust PyPI up to when you froze your dependencies

Now perhaps you do generally trust the PyPI administrators and don't think anything has happened yet, but you wouldn't mind a fairly cheap way that's available to today to make sure nothing fishy happens in the future. In that case you can record the hashes of your locked dependencies. (If you're an app developer you are locking your dependencies, right?)

Basically what you do is you have whatever tool you're using to lock your dependencies – e.g. pip-tools, pipenv, poetry – record the hash of the files you depend on upon locking. That way in the future you can check for yourself that the files you downloaded from PyPI match bit-for-bit what you previously downloaded and used. Now this doesn't guarantee that what you initially downloaded when you froze your dependencies didn't contain compromised code, but at least you know going forward nothing questionable has occurred.

Trust PyPI or an independent 3rd-party since they started running

Now we're into the "someone would have to do work to make this happen" realm; everything up until now you can do today, but this idea requires money (although PyPI still requires money to simply function as well, so please have your company donate if you use PyPI at work).

What one could do is run a 3rd-party service that records all the hashes of files that end up on PyPI. That way, if one wanted to see if the hash from PyPI hasn't changed since the 3rd-party service started running then one could simply ask the 3rd-party service for the hash for whatever file they want from PyPI, ask PyPI what they think the hash should be, and then check if the hashes match. If they do match then you should be able to trust the hashes, but if they differ then either PyPI or the 3rd-party service is compromised.

Now this is predicated on the idea that the 3rd-party service is truly 3rd-party. If any staff is shared between the 3rd-party service and PyPI then that's a potential point of compromise. This is also assuming that PyPI has not already been compromised. But at least in this scenario the point in time where your trust in PyPI starts from when the 3rd-party service began running and not when you locked your dependencies.

You can also extend this out to multiple 3rd-parties recording file hashes so that you can compare hashes against multiple sources. This not only makes it harder by forcing someone to compromise multiple services in order to cover up a file change, but if someone is compromised you could choose to use quorum to decide who's right and who's wrong.

Auditing what everyone claims

This entire blog post started because of a Twitter thread about how to be able to validate what PyPI claims. At some point I joked that I was shocked no one had mentioned the blockchain yet. And that's when I was informed that Certificate Transparency logs are basically what we would want and they use something called Merkle hash trees that started with P2P networks and have been used in blockchains.

I'm not going to go into all the details as how Certificate Transparency works, but basically they use an append-only log that can be cryptographically verified as having not been manipulated (and you could totally treat recording hashes of files on PyPI as an append-only log).

There are two very nice properties of these hash trees. One is it is very cheap to verify when an update has been made that all the previous entries in the log have not changed. Basically what you need is some key values from the previous version of the hash tree so that when you add new values to the tree and re-balance it's easy to verify the old stuff is still the same. This is great to help monitor for manipulation of previous data while also making it easy to add to the log.

The second property is that checking an entry hasn't been tampered with can be done without having the entire tree available. Basically you only need all nodes along a path from a leaf node to the root plus all immediate siblings of those nodes. This means that even if your hash tree has a massive amount of leaf nodes it doesn't take much to audit that a single leaf node has not not changed.

So all of this leads to a nice system to help keep PyPI honest if you can assume the initial hashes are reliable.

Release-in-progress trust

So all of the above scenarios assume PyPI was secure at the time of initially receiving a file but then potentially was compromised later. But how could we check that PyPI isn't already compromised?

One idea I had was that twine could upload a project releases' hashes to some trusted 3rd-parties as well as to PyPI. Then the 3rd-parties could either directly compare the hashes PyPI claims to have to what they were given independently or they could use their data to create that release's entry in the append-only hash tree log and see if the final hash matched what PyPI claims. And if a 3rd-party wasn't given some hashes by the project owner then they could simply fill in with what PyPI has. But the key point is that by having the project owner directly share hashes with 3rd-parties that are monitoring PyPI we would then have a way to detect if PyPI isn't providing files as the project owner expected.

Making PyPI harder to hack

Now obviously it would be best if PyPI was as hard to compromise as possible as well as detecting compromises on its own. There are actually two PEPs on the topic: PEP 458 and PEP 480. I'm not going to go into details since that's why we have PEPs, but people have thought through how to make PyPI hard to compromise as well as how to detect it.

But knowing that a design is available, you may be wondering why hasn't it been implemented?

What can you do to help?

There is a major reason why the ideas above have not been implemented: money. People using Python for personal projects typically don't worry about this sort of stuff because it just isn't a big concern, so people are not chomping at the bit to implement any of this for fun in their spare time. But for any business relying on packages coming from PyPI, it should be a concern since their business relies on the integrity of PyPI and the Python ecosystem. And so if you work for a company that uses packages from PyPI, then please consider having the company donate to the packaging WG (you can also find the link by going to PyPI and clicking the "Donate" button). Previous donations got us the current back-end and look of PyPI as well as the recent work to add two-factor authentication and API tokens, so they already know how to handle donations and turning them into results. So if anything I talked about here sounds worth doing, then please consider donating to help making it so they can happen.

August 16, 2019 11:26 PM UTC


PSF GSoC students blogs

GSoC Weekly Checkin

Hello everyone!

What did I do this week?

After the Front end was connected, we noticed some bugs with the API. It wasn't working due to some packages that were not being installed in heroku. So we added an Apt file for that. Once that being done, I added extended icons version support for Icons Picker so that there are lots of icons to choose from.

What is coming up next week?

Next week I have to implement some features like, auto download after some time interval, loading animation etc.

Did I get stuck anywhere?

The only problem was figuring out why the API wasn't working right which we did take some good time.

Till next time,
Cheers!

August 16, 2019 08:20 PM UTC

GSoC week #9

Hello everyone,

In week 9, My front end changes to connect the EOS-icons with backend API was merged and it is live at https://eos-icons.eosdesignsystem.com/extended/icons-picker.html

But after this another GSoC student merged his code and some issues were happening and some part of front end seems to be broken due to inherited styling. The good thing is all the features are working and the UI looks pretty smooth too.

August 16, 2019 08:16 PM UTC


Vinta Software

PyBay 2019: Talking about Python in SF

We are back to San Francisco! Our team will be joining PyBay's conference, one of the biggest Python events in the Bay Area. For this year, we'll be giving the talk: Building effective Django queries with expressions. PyBay has been a fantastic place to meet new people, connect with new ideas, and integrate this thriving community. Here is the sl

August 16, 2019 07:47 PM UTC


Quansight Labs Blog

Spyder 4.0 beta4: Kite integration is here

Kite is sponsoring the work discussed in this blog post, and in addition supports Spyder 4.0 development through a Quansight Labs Community Work Order.

As part of our next release, we are proud to announce an additional completion client for Spyder, Kite. Kite is a novel completion client that uses Machine Learning techniques to find and predict the best autocompletion for a given text. Additionally, it collects improved documentation for compiled packages, i.e., Matplotlib, NumPy, SciPy that cannot be obtained easily by using traditional code analysis packages such as Jedi.

alt_text

Read more… (3 min remaining to read)

August 16, 2019 07:19 PM UTC


Stack Abuse

Basics of Memory Management in Python

Introduction

Memory management is the process of efficiently allocating, de-allocating, and coordinating memory so that all the different processes run smoothly and can optimally access different system resources. Memory management also involves cleaning memory of objects that are no longer being accessed.

In Python, the memory manager is responsible for these kinds of tasks by periodically running to clean up, allocate, and manage the memory. Unlike C, Java, and other programming languages, Python manages objects by using reference counting. This means that the memory manager keeps track of the number of references to each object in the program. When an object's reference count drops to zero, which means the object is no longer being used, the garbage collector (part of the memory manager) automatically frees the memory from that particular object.

The user need not to worry about memory management as the process of allocation and de-allocation of memory is fully automatic. The reclaimed memory can be used by other objects.

Python Garbage Collection

As explained earlier, Python deletes objects that are no longer referenced in the program to free up memory space. This process in which Python frees blocks of memory that are no longer used is called Garbage Collection. The Python Garbage Collector (GC) runs during the program execution and is triggered if the reference count reduces to zero. The reference count increases if an object is assigned a new name or is placed in a container, like tuple or dictionary. Similarly, the reference count decreases when the reference to an object is reassigned, when the object's reference goes out of scope, or when an object is deleted.

The memory is a heap that contains objects and other data structures used in the program. The allocation and de-allocation of this heap space is controlled by the Python Memory manager through the use of API functions.

Python Objects in Memory

Each variable in Python acts as an object. Objects can either be simple (containing numbers, strings, etc.) or containers (dictionaries, lists, or user defined classes). Furthermore, Python is a dynamically typed language which means that we do not need to declare the variables or their types before using them in a program.

For example:

>>> x = 5
>>> print(x)
5
>>> del x
>>> print(x)
Traceback (most reent call last):
  File "<mem_manage>", line 1, in <module>
    print(x)
NameError : name 'x' is not defined

If you look at the first 2 lines of the above program, object x is known. When we delete the object x and try to use it, we get an error stating that the variable x is not defined.

You can see that the garbage collection in Python is fully automated and the programmer does not need worry about it, unlike languages like C.

Modifying the Garbage Collector

The Python garbage collector has three generations in which objects are classified. A new object at the starting point of it's life cycle is the first generation of the garbage collector. As the object survives garbage collection, it will be moved up to the next generations. Each of the 3 generations of the garbage collector has a threshold. Specifically, when the threshold of number of allocations minus the number of de0allocations is exceeded, that generation will run garbage collection.

Earlier generations are also garbage collected more often than the higher generations. This is because newer objects are more likely to be discarded than old objects.

The gc module includes functions to change the threshold value, trigger a garbage collection process manually, disable the garbage collection process, etc. We can check the threshold values of different generations of the garbage collector using the get_threshold() method:

import gc
print(gc.get_threshold())

Sample Output:

(700, 10, 10)

As you see, here we have a threshold of 700 for the first generation, and 10 for each of the other two generations.

We can alter the threshold value for triggering the garbage collection process using the set_threshold() method of the gc module:

gc.set_threshold(900, 15, 15)

In the above example, we have increased the threshold value for all the 3 generations. Increasing the threshold value will decrease the frequency of running the garbage collector. Normally, we need not think too much about Python's garbage collection as a developer, but this may be useful when optimizing the Python runtime for your target system. One of the key benefits is that Python's garbage collection mechanism handles a lot of low-level details for the developer automatically.

Why Perform Manual Garbage Collection?

We know that the Python interpreter keeps a track of references to objects used in a program. In earlier versions of Python (until version 1.6), the Python interpreter used only the reference counting mechanism to handle memory. When the reference count drops to zero, the Python interpreter automatically frees the memory. This classical reference counting mechanism is very effective, except that it fails to work when the program has reference cycles. A reference cycle happens if one or more objects are referenced each other, and hence the reference count never reaches zero.

Let's consider an example.

>>> def create_cycle():
...     list = [8, 9, 10]
...     list.append(list)
...     return list
... 
>>> create_cycle()
[8, 9, 10, [...]]

The above code creates a reference cycle, where the object list refers to itself. Hence, the memory for the object list will not be freed automatically when the function returns. The reference cycle problem can't be solved by reference counting. However, this reference cycle problem can be solved by change the behavior of the garbage collector in your Python application.

To do so, we can use the gc.collect() function of the gc module.

import gc
n = gc.collect()
print("Number of unreachable objects collected by GC:", n)

The gc.collect() returns the number of objects it has collected and de-allocated.

There are two ways to perform manual garbage collection: time-based or event-based garbage collection.

Time-based garbage collection is pretty simple: the gc.collect() function is called after a fixed time interval.

Event-based garbage collection calls the gc.collect() function after an event occurs (i.e. when the application is exited or the application remains idle for a specific time period).

Let's understand the manual garbage collection work by creating a few reference cycles.

import sys, gc

def create_cycle():
    list = [8, 9, 10]
    list.append(list)

def main():
    print("Creating garbage...")
    for i in range(8):
        create_cycle()

    print("Collecting...")
    n = gc.collect()
    print("Number of unreachable objects collected by GC:", n)
    print("Uncollectable garbage:", gc.garbage)

if __name__ == "__main__":
    main()
    sys.exit()

The output is as below:

Creating garbage...
Collecting...
Number of unreachable objects collected by GC: 8
Uncollectable garbage: []

The script above creates a list object that is referred by a variable, creatively named list. The first element of the list object refers to itself. The reference count of the list object is always greater than zero even if it is deleted or out of scope in the program. Hence, the list object is not garbage collected due to the circular reference. The garbage collector mechanism in Python will automatically check for, and collect, circular references periodically.

In the above code, as the reference count is at least 1 and can never reach 0, we have forcefully garbage collected the objects by calling gc.collect(). However, remember not to force garbage collection frequently. The reason is that even after freeing the memory, the GC takes time to evaluate the object's eligibility to be garbage collected, taking up processor time and resources. Also, remember to manually manage the garbage collector only after your app has started completely.

Conclusion

In this article, we discussed how memory management in Python is handled automatically by using reference counting and garbage collection strategies. Without garbage collection, implementing a successful memory management mechanism in Python is impossible. Also, programmers need not worry about deleting allocated memory, as it is taken care by Python memory manager. This leads to fewer memory leaks and better performance.

August 16, 2019 12:57 PM UTC


PyCharm

PyCharm 2019.2.1 RC

PyCharm 2019.2.1 release candidate is available now!

Fixed in this Version

Further Improvements

Getting the New Version

Download the RC from Confluence.

The release candidate (RC) is not an early access program (EAP) build, and does not bundle an EAP license. If you get the PyCharm Professional Edition RC, you will either need a currently active PyCharm subscription, or you will receive a 30-day free trial.

August 16, 2019 12:48 PM UTC


Test and Code

83: PyBites Code Challenges behind the scenes - Bob Belderbos

Bob Belderbos and Julian Sequeira started PyBites a few years ago.
They started doing code challanges along with people around the world and writing about it.

Then came the codechalleng.es platform, where you can do code challenges in the browser and have your answer checked by pytest tests. But how does it all work?

Bob joins me today to go behind the scenes and share the tech stack running the PyBites Code Challenges platform.

We talk about the technology, the testing, and how it went from a cool idea to a working platform.

Special Guest: Bob Belderbos.

Sponsored By:

Support Test & Code - Python Testing & Development

Links:

<p>Bob Belderbos and Julian Sequeira started <a href="https://pybit.es/" rel="nofollow">PyBites</a> a few years ago.<br> They started doing code challanges along with people around the world and writing about it. </p> <p>Then came the <a href="https://codechalleng.es/" rel="nofollow">codechalleng.es</a> platform, where you can do code challenges in the browser and have your answer checked by pytest tests. But how does it all work?</p> <p>Bob joins me today to go behind the scenes and share the tech stack running the PyBites Code Challenges platform.</p> <p>We talk about the technology, the testing, and how it went from a cool idea to a working platform.</p><p>Special Guest: Bob Belderbos.</p><p>Sponsored By:</p><ul><li><a href="https://testandcode.com/pycharm" rel="nofollow">PyCharm Professional</a>: <a href="https://testandcode.com/pycharm" rel="nofollow">PyCharm is designed by programmers, for programmers, to provide all the tools you need for productive Python development.</a></li></ul><p><a href="https://www.patreon.com/testpodcast" rel="payment">Support Test & Code - Python Testing & Development</a></p><p>Links:</p><ul><li><a href="https://pybit.es/" title="PyBites" rel="nofollow">PyBites</a></li><li><a href="https://codechalleng.es/" title="PyBites Code Challenges coding platform" rel="nofollow">PyBites Code Challenges coding platform</a></li><li><a href="https://codechalleng.es/bites/paths" title="Learning Paths" rel="nofollow">Learning Paths</a></li><li><a href="https://pybit.es/whiteboard-interviews.html" title="Julian's article on whiteboard interviews" rel="nofollow">Julian's article on whiteboard interviews</a></li><li><a href="https://www.youtube.com/watch?v=Jpwn2yOppPo" title="Selenium running on CodeChalleng.es" rel="nofollow">Selenium running on CodeChalleng.es</a></li></ul>

August 16, 2019 07:00 AM UTC