skip to navigation
skip to content

Planet Python

Last update: October 31, 2020 04:47 AM UTC

October 30, 2020


NumFOCUS

Public Apology to Jeremy Howard

We, the NumFOCUS Code of Conduct Enforcement Committee, issue a public apology to Jeremy Howard for our handling of the JupyterCon 2020 reports. We should have done better. We thank you for sharing your experience and we will use it to improve our policies going forward. We acknowledge that it was an extremely stressful experience, […]

The post Public Apology to Jeremy Howard appeared first on NumFOCUS.

October 30, 2020 06:51 PM UTC


PythonClub - A Brazilian collaborative blog about Python

Fazendo backup do banco de dados no Django

Apresentação

Em algum momento, durante o seu processo de desenvolvimento com Django, pode ser que surja a necessidade de criar e restaurar o banco de dados da aplicação. Pensando nisso, resolvi fazer um pequeno tutorial, básico, de como realizar essa operação.

Nesse tutorial, usaremos o django-dbbackup, um pacote desenvolvido especificamente para isso.

Configurando nosso ambiente

Primeiro, partindo do início, vamos criar uma pasta para o nosso projeto e, nela, isolar o nosso ambiente de desenvolvimento usando uma virtualenv:

mkdir projeto_db && cd projeto_db #criando a pasta do nosso projeto

virtualenv -p python3.8 env && source env/bin/activate #criando e ativando a nossa virtualenv

Depois disso e com o nosso ambiente já ativo, vamos realizar os seguintes procedimentos:

pip install -U pip #com isso, atualizamos a verão do pip instalado

Instalando as dependências

Agora, vamos instalar o Django e o pacote que usaremos para fazer nossos backups.

pip install Django==3.1.2 #instalando o Django

pip install django-dbbackup #instalando o django-dbbackup

Criando e configurando projeto

Depois de instaladas nossas dependências, vamos criar o nosso projeto e configurar o nosso pacote nas configurações do Django.

django-admin startproject django_db . #dentro da nossa pasta projeto_db, criamos um projeto Django com o nome de django_db.

Depois de criado nosso projeto, vamos criar e popular o nosso banco de dados.

python manage.py migrate #com isso, sincronizamos o estado do banco de dados com o conjunto atual de modelos e migrações.

Criado nosso banco de dados, vamos criar um superusuário para podemos o painel admin do nosso projeto.

python manage.py createsuperuser

Perfeito. Já temos tudo que precisamos para executar nosso projeto. Para execução dele, é só fazermos:

python manage.py runserver

Você terá uma imagem assim do seu projeto:

Configurando o django-dbbackup

Dentro do seu projeto, vamos acessar o arquivo settings.py, como expresso abaixo:

django_db/
├── settings.py

Dentro desse arquivos iremos, primeiro, adiconar o django-dbbackup às apps do projeto:

INSTALLED_APPS = (
    ...
    'dbbackup',  # adicionando django-dbbackup
)

Depois de adicionado às apps, vamos dizer para o Django o que vamos salvar no backup e, depois, indicar a pasta para onde será encaminhado esse arquivo. Essa inserção deve ou pode ser feita no final do arquivo settings.py:

DBBACKUP_STORAGE = 'django.core.files.storage.FileSystemStorage' #o que salvar
DBBACKUP_STORAGE_OPTIONS = {'location': 'backups/'} # onde salvar

Percebam que dissemos para o Django salvar o backup na pasta backups, mas essa pasta ainda não existe no nosso projeto. Por isso, precisamos criá-la [fora da pasta do projeto]:

mkdir backups

Criando e restaurando nosso backup

Já temos tudo pronto. Agora, vamos criar o nosso primeiro backup:

python manage.py dbbackup

Depois de exetudado, será criado um arquivo -- no nosso exemplo, esse arquivo terá uma extensão .dump --, salvo na pasta backups. Esse arquivo contem todo backup do nosso banco de dados.

Para recuperarmos nosso banco, vamos supor que migramos nosso sistema de um servidor antigo para um novo e, por algum motivo, nossa base de dados foi corrompida, inviabilizando seu uso. Ou seja, estamos com o sistema/projeto sem banco de dados -- ou seja, exlua ou mova a a sua base dados .sqlite3 para que esse exemplo seja útil --, mas temos os backups. Com isso, vamos restaurar o banco:

python manage.py dbrestore

Prontinho, restauramos nosso banco de dados. O interessante do django-dbbackup, dentre outras coisas, é que ele gera os backups com datas e horários específicos, facilitando o processo de recuperação das informações mais recentes.

Por hoje é isso, pessoal. Até a próxima. ;)

October 30, 2020 01:40 PM UTC


Real Python

The Real Python Podcast – Episode #33: Going Beyond the Basic Stuff With Python and Al Sweigart

You probably have heard of the bestselling Python book, "Automate the Boring Stuff with Python." What are the next steps after starting to dabble in the Python basics? Maybe you've completed some tutorials, created a few scripts, and automated repetitive tasks in your life. This week on the show, we have author Al Sweigart to talk about his new book, "Beyond the Basic Stuff with Python: Best Practices for Writing Clean Code."


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 30, 2020 12:00 PM UTC


Reuven Lerner

Join the data revolution with my “Intro to SQL” course!

Have you heard? Data is “the new oil” — meaning, data is the most valuable and important thing in the modern world. Which means that if you can store, retrieve, and organize your data, then you (and your company) are positioned for greater success.

This usually means working with a database — and frequently, a relational database, with which you communicate using a language called SQL.

In other words: SQL is the key to the modern data revolution. But too often, people are put off from learning SQL. It seems weird, even when compared with a programming language.

Well, I have good news: If you want to join the data revolution and work with databases, I’m offering a new course. On November 15th, I’ll be teaching a live, 4-hour online course, “Intro to SQL.” I’ll teach you the basics of what you need to work with a database.

The course includes:

I’ve been using databases since 1995, and have been teaching SQL for more than 20 years. This course is based on that corporate training, and is meant to get you jump started into the world of data and relational databases. We’ll be using PostgreSQL, a powerful open-source database I’ve been using for more than two decades.

Questions? Learn more at https://store.lerner.co.il/intro-to-sql (where there’s an extensive FAQ). Or contact me on Twitter (@reuvenmlerner) or via e-mail (reuven@lerner.co.il). I’ll answer as soon as I can.

I hope to see you there!

The post Join the data revolution with my “Intro to SQL” course! appeared first on Reuven Lerner.

October 30, 2020 02:15 AM UTC

October 29, 2020


Python Morsels

Data structures contain pointers

Watch First:

Transcript

Data structures in Python don't actually contain objects. They references to objects (aka "pointers").

Referencing the same object in multiple places

Let's take a list of three zeroes:

>>> row = [0, 0, 0]

If we make a new list like this:

>>> matrix = [row, row, row]
>>> matrix
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

We'll end up with a list of lists of zeros. We now have three lists, and each of them has three zeros inside it.

If we change one of the values in this list of lists to 1:

>>> matrix[1][1] = 1

What do you think will happen? What do you expect will change?

We're asking to change the middle item in the middle list.

So, matrix[1] is referencing index one inside the matrix, which is the second list (the middle one). Index one inside of matrix[1] (i.e. matrix[1][1]) is the second element in that list, so we should be changing the middle zero in the middle list here.

That's not quite what happens:

>>> matrix
[[0, 1, 0], [0, 1, 0], [0, 1, 0]]

Instead we changed the middle number in every list!

This happened because our matrix list doesn't actually contain 3 lists, it contains three references to the same list:

>>> matrix[0] is matrix[1]
True

We talked about the fact that all variables in Python are actually pointers. **Variables point to objects, they don't contain objects: they aren't buckets containing objects

So unlike many other programming languages, Python's varibales are not buckets containing objects. Likewise, Python's data structures are also not buckets containing objects. Python's data structures contain pointers to objects, they don't contain the objects themselves.

If we look at the row list, we'll see that it's changed too:

>>> row
[0, 1, 0]

We stored three pointers to the same list. When we "changed" one of these lists, we mutated that list (one of our two types of change in Python). And that seems to change any variable that references that list.

So matrix[0], matrix[1], and row, all are exactly the same object. We can verify this using id:

>>> id(row)
1972632707784
>>> id(matrix[0])
1972632707784
>>> id(matrix[1])
1972632707784
>>> id(matrix[2])
1972632707784

Avoiding referencing the same object

If we wanted to avoid this issue, we could manually make a list of three lists:

>>> matrix = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> matrix
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

This is not going to suffer from the same problem, because these are three independent lists.

>>> matrix[1][1] = 1
>>> matrix
[[0, 0, 0], [0, 1, 0], [0, 0, 0]]

They're different lists stored in different parts of memory:

>>> matrix[0] is matrix[1]
False
>>> matrix[0] is matrix[2]
False

An ouroboros: A list that contains itself

So data structures contain pointers, not objects.

This is the ultimate demonstration of this fact:

>>> x = []
>>> x.append(x)

The ultimate demonstration of this fact is that we can take a list and stick that list inside of itself:

At this point the first element (and only element) of this list is the list itself:

>>> x[0] is x
True

And the first element of that list is also the list itself:

>>> x[0][0] is x
True

We can index this list of lists as far down as we want because we've made an infinitely recursive data structure:

>>> x[0][0][0] is x
True
>>> x[0][0][0][0][0] is x
True

Python represents this list at the Python prompt by putting three dots inside those square brackets (it's smart enough not to show an infinite number of square brackets):

>>> x
[[...]]

We didn't stick a bucket inside itself here: we didn't stick a list inside of the same list. Instead we stuck a pointer to a list inside of itself.

Lists are allowed to store pointers to anything, even themselves.

Summary

The takeaway here is that just as variables in Python are pointers, data structures in Python contain pointers. You can't "contain" an object inside another object in Python, you can really only point to an object. You can only reference objects in Python. Lists, tuples, dictionaries, and all other data structures contain pointers.

October 29, 2020 03:00 PM UTC


Stack Abuse

How to Sort a Dictionary by Value in Python

Introduction

A dictionary in Python is a collection of items that stores data as key-value pairs. In Python 3.7 and later versions, dictionaries are sorted by the order of item insertion. In earlier versions, they were unordered.

Let's have a look at how we can sort a dictionary on basis of the values they contain.

Sort Dictionary Using a for Loop

We can sort a dictionary with the help of a for loop. First, we use the sorted() function to order the values of the dictionary. We then loop through the sorted values, finding the keys for each value. We add these keys-value pairs in the sorted order into a new dictionary.

Note: Sorting does not allow you to re-order the dictionary in-place. We are writing the ordered pairs in a completely new, empty dictionary.

dict1 = {1: 1, 2: 9, 3: 4}
sorted_values = sorted(dict1.values()) # Sort the values
sorted_dict = {}

for i in sorted_values:
    for k in dict1.keys():
        if dict1[k] == i:
            sorted_dict[k] = dict1[k]
            break

print(sorted_dict)

If you run this with the Python interpreter you would see:

{1: 1, 3: 4, 2: 9}

Now that we've seen how to sort with loops, let's look at a more popular alternative that uses the sorted() function.

Sort Dictionary Using the sorted() Function

We previously used the sorted() function to sort the values of an array. When sorting a dictionary, we can pass one more argument to the sorted() function like this: sorted(dict1, key=dict1.get).

Here, key is a function that's called on each element before the values are compared for sorting. The get() method on dictionary objects returns the value of for a dictionary's key.

The sorted(dict1, key=dict1.get) expression will return the list of keys whose values are sorted in order. From there, we can create a new, sorted dictionary:

dict1 = {1: 1, 2: 9, 3: 4}
sorted_dict = {}
sorted_keys = sorted(dict1, key=dict1.get)  # [1, 3, 2]

for w in sorted_keys:
    sorted_dict[w] = dict1[w]

print(sorted_dict) # {1: 1, 3: 4, 2: 9}

Using the sorted() function has reduced the amount of code we had to write when using for loops. However, we can further combine the sorted() function with the itemgetter() function for a more succinct solution to sorting dictionaries by values.

Sort Dictionary Using the operator Module and itemgetter()

The operator module includes the itemgetter() function. This function returns a callable object that returns an item from an object.

For example, let's use to itemgetter() to create a callable object that returns the value of any dictionary with a key that's 2:

import operator

dict1 = {1: 1, 2: 9}
get_item_with_key_2 = operator.itemgetter(2)

print(get_item_with_key_2(dict1))  # 9

Every dictionary has access to the items() method. This function returns the key-value pairs of a dictionary as a list of tuples. We can sort the list of tuples by using the itemgetter() function to pull the second value of the tuple i.e. the value of the keys in the dictionary.

Once it's sorted, we can create a dictionary based on those values:

import operator

dict1 = {1: 1, 2: 9, 3: 4}
sorted_tuples = sorted(dict1.items(), key=operator.itemgetter(1))
print(sorted_tuples)  # [(1, 1), (3, 4), (2, 9)]
sorted_dict = {k: v for k, v in sorted_tuples}

print(sorted_dict) # {1: 1, 3: 4, 2: 9}

With much less effort, we have a dictionary sorted by values!

As the key argument accepts any function, we can use lambda functions to return dictionary values so they can be sorted. Let's see how.

Sort Dictionary Using a Lambda Function

Lambda functions are anonymous, or nameless, functions in Python. We can use lamba functions to get the value of a dictionary item without having to import the operator module for itemgetter(). If you'd like to learn more about lambas, you can read about them in our guide to Lambda Functions in Python.

Let's sort a dictionary by values using a lambda function in the key argument of sorted():

dict1 = {1: 1, 2: 9, 3: 4}
sorted_tuples = sorted(dict1.items(), key=lambda item: item[1])
print(sorted_tuples)  # [(1, 1), (3, 4), (2, 9)]
sorted_dict = {k: v for k, v in sorted_tuples}

print(sorted_dict)  # {1: 1, 3: 4, 2: 9}

Note that the methods we've discussed so far only work with Python 3.7 and later. Let's see what we can do for earlier versions of Python.

Returning a New Dictionary with Sorted Values

After sorting a dictionary by values, to keep a sorted dictionary in Python versions before 3.7, you have to use the OrderedDict - available in the collections module. These objects are dictionaries that keep the order of insertion.

Here's an example of sorting and using OrderedDict:

import operator
from collections import OrderedDict

dict1 = {1: 1, 2: 9, 3: 4}
sorted_tuples = sorted(dict1.items(), key=operator.itemgetter(1))
print(sorted_tuples)  # [(1, 1), (3, 4), (2, 9)]

sorted_dict = OrderedDict()
for k, v in sorted_tuples:
    sorted_dict[k] = v

print(sorted_dict)  # {1: 1, 3: 4, 2: 9}

Conclusion

This tutorial showed how a dictionary can be sorted based on its values. We first sorted a dictionary using two for loops. We then improved our sort by using the sorted() function. We've also seen the itemgetter() function from the operator module can make our solution more succinct.

Lastly, we adapted our solution to work on Python versions lower than 3.7.

Variations of the sorted() function are the most popular and reliable to sort a dictionary by values.

October 29, 2020 12:30 PM UTC


Matt Layman

Sending Invites - Building SaaS #77

In this episode, I worked on the form that will send invites to users for the new social network app that I’m building. We built the view, the form, and the tests and wired a button to the new view. The first thing that we do was talk through the new changes since the last stream. After discussing the progress, I took some time to cover the expected budget for the application to get it to an MVP.

October 29, 2020 12:00 AM UTC

October 28, 2020


PyCharm

PyCharm 2020.3 EAP #3

The third build of PyCharm 2020.3 is now available in the Early Access Program with features and fixes for a smoother, more productive experience.

We invite you to join our EAP to try out the latest features we have coming up, test that they work properly in your environments, and help us make a better PyCharm for everyone!

pycharm EAP program

DOWNLOAD PYCHARM 2020.3 EAP

Highlights

Interpreter settings

Now it is easier to create an environment for your project and set up all the dependencies at once.
When you clone a project from the repo, PyCharm checks if there is a requirements.txt, setup.py, environment.yml, or pipfile inside it. If there is, the IDE suggests per-project environment creation based on the detected files.

i2020_10_28_env

If you skip the environment creation at this step, autoconfiguration will still be available in the editor itself.

Inverting an “if” statement

Now you can easily invert “if” statements and switch them back in PyCharm. Kudos to Vasya Aksyonov, who contributed this feature to our open-source PyCharm Community Edition.

Go to the context menu for “if”, choose Show Context Actions, and then select “Invert ‘if’ condition”. The condition of the “if” statement will be inverted and the branches will switch places, preserving the initial semantics of the code.

invert_if

When there is an “if” statement without an “else”, then after it has been inverted a “pass” will be created for the “if” that was inverted and an “else” clause will be added to the statement.

early_return

This feature works for all “if” statements without “elif” branches. The action also understands control flow, and can handle things like early return, producing sensible code.

Learn more.

VCS

We’ve added a Git tab to the Search Everywhere dialog. In it you can find commit hashes and messages, tags, and branches.

ide_git

Web development

Create a React component from its usage

As you might know, PyCharm constantly checks that referenced variables and fields are valid. When they aren’t, in many cases it can suggest creating the relevant code construct for you. Now it can do this for React components, too. Place the caret at an unresolved component, press Alt+Enter, and then select the corresponding inspection. And you’re done!

create-react-component-from-usage

Plugins enabled per project

We have taken plugin customization one step further. In Settings | Preferences / Plugins, the drop-down list next to the plugin name has been replaced with a new gear icon that has all the activation options. You can enable the plugin just for the current project or for all of them by selecting Enable for Current Project or Enable for All Projects.

Reader Mode

To make reading comments easier, we’ve implemented Reader Mode for read-only files and files from External Libraries. We’ve added a nicer display for font ligatures, code vision hints with the number of usages, and more. To configure the new mode, go to Preferences | Settings / Editor / Reader Mode.

ide_reader_mode-1

Other updates

Notable fixes

The problem that caused copying the prompt together with the code when copying multiline commands is now fixed.

Ready to join the EAP?

Some ground rules

How to download

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP. If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP and stay up to date. You can find the installation instructions on our website.

This is all for today! For the full list of features and fixes present in this build, see our release notes. We also encourage you to stay tuned for more improvements, so come and share your feedback in the comments below, on Twitter, or via our issue tracker.

The PyCharm team

October 28, 2020 08:46 PM UTC


Stack Abuse

Change Tick Frequency in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. Much of Matplotlib's popularity comes from its customization options - you can tweak just about any element from its hierarchy of objects.

In this tutorial, we'll take a look at how to change the tick frequency in Matplotlib. We'll do this on the figure-level as well as the axis-level.

How to Change Tick Frequency in Matplotlib?

Let's start off with a simple plot. We'll plot two lines, with random values:

import matplotlib.pyplot as plt
import numpy as np

fig = plt.subplots(figsize=(12, 6))

x = np.random.randint(low=0, high=50, size=100)
y = np.random.randint(low=0, high=50, size=100)

plt.plot(x, color='blue')
plt.plot(y, color='black')

plt.show()

x and y range from 0-50, and the length of these arrays is 100. This means, we'll have 100 datapoints for each of them. Then, we just plot this data onto the Axes object and show it via the PyPlot instance plt:

plot random line plot in matplotlib

Now, the frequency of the ticks on the X-axis is 20. They're automatically set to a frequency that seems fitting for the dataset we provide.

Sometimes, we'd like to change this. Maybe we want to reduce or increase the frequency. What if we wanted to have a tick on every 5 steps, not 20?

The same goes for the Y-axis. What if the distinction on this axis is even more crucial, and we'd want to have each tick on evey step?

Setting Figure-Level Tick Frequency in Matplotlib

Let's change the figure-level tick frequency. This means that if we have multiple Axes, the ticks on all of these will be uniform and will have the same frequency:

import matplotlib.pyplot as plt
import numpy as np

fig = plt.subplots(figsize=(12, 6))

x = np.random.randint(low=0, high=50, size=100)
y = np.random.randint(low=0, high=50, size=100)

plt.plot(x, color='blue')
plt.plot(y, color='black')

plt.xticks(np.arange(0, len(x)+1, 5))
plt.yticks(np.arange(0, max(y), 2))

plt.show()

You can use the xticks() and yticks() functions and pass in an array. On the X-axis, this array starts on 0 and ends at the length of the x array. On the Y-axis, it starts at 0 and ends at the max value of y. You can hard code the variables in as well.

The final argument is the step. This is where we define how large each step should be. We'll have a tick at every 5 steps on the X-axis and a tick on every 2 steps on the Y-axis:

change figure-level tick frequency matplotlib

Setting Axis-Level Tick Frequency in Matplotlib

If you have multiple plots going on, you might want to change the tick frequency on the axis-level. For example, you'll want rare ticks on one graph, while you want frequent ticks on the other.

You can use the set_xticks() and set_yticks() functions on the returned Axes instance when adding subplots to a Figure. Let's create a Figure with two axes and change the tick frequency on them separately:

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(figsize=(12, 6))

ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

x = np.random.randint(low=0, high=50, size=100)
y = np.random.randint(low=0, high=50, size=100)
z = np.random.randint(low=0, high=50, size=100)

ax.plot(x, color='blue')
ax.plot(y, color='black')
ax2.plot(y, color='black')
ax2.plot(z, color='green')

ax.set_xticks(np.arange(0, len(x)+1, 5))
ax.set_yticks(np.arange(0, max(y), 2))
ax2.set_xticks(np.arange(0, len(x)+1, 25))
ax2.set_yticks(np.arange(0, max(y), 25))

plt.show()

Now, this results in:

change axis-level tick frequency in matplotlib

Conclusion

In this tutorial, we've gone over several ways to change the tick frequency in Matplotlib both on the figure-level as well as the axis-level.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

October 28, 2020 07:55 PM UTC


Python Engineering at Microsoft

Python in Visual Studio Code – October 2020 Release

We are pleased to announce that the October 2020 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. If you already have the Python extension installed, you can also get the latest update by restarting Visual Studio Code. You can learn more about Python support in Visual Studio Code in the documentation.

This was a short release where we addressed 14 issues, and it includes debugpy 1.0!

If you’re interested, you can check the full list of improvements in our changelog.

Debugpy 1.0

We’re excited to announce that we’re releasing the 1.0 version of our debugger, debugpy, that was first announced in March this year.

Debugpy offers a great number of features that can help you understand bugs, errors and unexpected behaviors in your code. You can find an extensive list on our documentation, but check below for some of our favorite ones!

Debugging Web Apps

Debugpy supports live reload of web applications, such as Django and Flask apps, when debugging. This means that when you make edits to your application, you don’t need to restart the debugger to get them applied: the web server is automatically reloaded in the same debugging session once the changes are saved.  

To try it out, open a web application and add a debug configuration (by clicking on Run > Add Configuration…, or by opening the Run view and clicking on create launch.json file).  Then select the framework used in your web application – in this example, we selected Flask. 

Now you hit F5 to start debugging, and then just watch the application reload once you make a change and save it!

Live reload of Flask application when debugging

You can also debug Django and Flask HTML templates. Just set up breakpoints to the relevant lines in the HTML files and watch the magic happen:

Execution stopping on breakpoint in a template file

Debugging local processes

With the debugpy and the Python extension, you can get a list of processes running locally, and easily select one to attach debugpy to. Or, if you know the process ID, you can also add it directly to the “Attach using Process Id” configuration in the launch.json file:

Adding configuration for the debugger to attach to a local process

Attaching the debugger to a process running locally

Debugging remotely

Remote Development Extensions

You can use debugpy to debug your applications inside remote environments like Docker containers or remote machines (or even in WSL!) through the Remote Development extension. It allows VS Code to work seamlessly by running a light-weight server in the remote environment, while providing the same development experience as you get when developing locally:

Running the debugger inside a docker container

This way, you can use the same configurations for debugpy as you would locally – but it will actually be installed and executed in the remote scope. No more messing around with your local environment!

You can learn more about the VS Code Remote Development extensions on the documentation.

Remote attach

You can also configure the debugger to attach to a debugpy server running on a remote machine. All you need to provide is the host name and the port number the debugpy server is listening to in the remote environment:

Configuration for attaching the debugger to a remote machine

You can learn more about remote debugging in the documentation.

Other changes and enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. Some notable changes include:

We’re constantly A/B testing new features. If you see something different that was not announced by the team, you may be part of the experiment! To see if you are part of an experiment, you can check the first lines in the Python extension output channel. If you wish to opt-out of A/B testing, you can open the user settings.json file (View > Command Palette… and run Preferences: Open Settings (JSON)) and set the “python.experiments.enabled” setting to false.

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code – October 2020 Release appeared first on Python.

October 28, 2020 07:03 PM UTC


Python Software Foundation

Key generation and signing ceremony for PyPI

On Friday October 30th at 11:15 AM EDT the Python Software Foundation will be live streaming a remote key generation and signing ceremony to bootstrap The Update Framework for The Python Package Index. You can click here to see what time this is in your local timezone.

This ceremony is one of the first practical steps in deploying The Update Framework to PyPI per PEP 458.

The Python Software Foundation Director of Infrastructure, Ernest W. Durbin III, and Trail of Bits Senior Security Engineer, William Woodruff, will be executing the runbook developed at https://github.com/psf/psf-tuf-runbook.

For transparency purposes a live stream will be hosted from the Python Software Foundation's YouTube channel. Please subscribe to the channel to be notified when the stream is live if you'd like to follow along.

Additionally the recording will be archived on the Python Software Foundation's YouTube channel.


This work is being funded by Facebook Research and was originally announced in late 2018 and a portion of it commenced in 2019 while awaiting PEP 458's acceptance. With PEP 458 in place we announced that work would commence in March.

We appreciate the patience and contributions of the community, Facebook Research, and Trail of Bits in seeing through the implementation of PEP 458.

Additionally volunteers from The Secure Systems Lab at NYUDatadog, and VMWare have helped to develop the implementation for PyPI but have begun work on client implementations to verify the results in pip.

October 28, 2020 04:38 PM UTC


Real Python

Get Started With Django Part 3: Django View Authorization

In part 1 of this series, you learned the fundamentals of Django models and views. In part 2, you learned about user management. In this tutorial, you’ll see how to combine these concepts to do Django view authorization and restrict what users can see and do in your views based on their roles.

Allowing users to log in to your website solves two problems: authentication and authorization. Authentication is the act of verifying a user’s identity, confirming they are who they say they are. Authorization is deciding whether a user is allowed to perform an action. The two concepts go hand in hand: if a page on your website is restricted to logged-in users, then users have to authenticate before they can be authorized to view the page.

Django provides tools for both authentication and authorization. Django view authorization is typically done with decorators. This tutorial will show you how to use these view decorators to enforce authorized viewing of pages in your Django site.

By the end of this tutorial you’ll know how to:

  • Use HttpRequest and HttpRequest.user objects
  • Authenticate and authorize users
  • Differentiate between regular, staff, and admin users
  • Secure a view with the @login_required decorator
  • Restrict a view to different roles with the @user_passes_test decorator
  • Use the Django messages framework to notify your users

Getting Started#

To better understand authorization, you’ll need a project to experiment with. The code in this tutorial is very similar to that shown in part 1 and part 2. You can follow along by downloading the sample code from the link below:

Get the Source Code: Click here to get the source code you’ll use to learn about Django view authorization in this tutorial.

All the demonstration code was tested with Python 3.8 and Django 3.0.7. It should work with other versions, but there may be subtle differences.

Creating a Project#

First, you’ll need to create a new Django project. Since Django isn’t part of the standard library, it’s considered best practice to use a virtual environment. Once you have the virtual environment, you’ll need to take the following steps:

  1. Install Django.
  2. Create a new project.
  3. Create an app inside the project.
  4. Add a templates directory to the project.
  5. Create a site superuser.

To accomplish all that, use the following commands:

$ python -m pip install django==3.0.7
$ django-admin startproject Blog
$ cd Blog
$ python manage.py startapp core
$ mkdir templates
$ python manage.py migrate
$ python manage.py createsuperuser
Username: superuser
Email address: superuser@example.com
Password:
Password (again):

You now have a Blog project, but you still need to tell Django about the app you created and the new directory you added for templates. You can do this by modifying the Blog/settings.py file, first by changing INSTALLED_APPS:

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "core",
]

The highlighted line indicates the addition of the core app to the list of installed apps. Once you’ve added the app, you need to modify the TEMPLATES declaration:

TEMPLATES = [
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "DIRS": [os.path.join(BASE_DIR, "templates")],
        "APP_DIRS": True,
        "OPTIONS": {
            "context_processors": [
                "django.template.context_processors.debug",
                "django.template.context_processors.request",
                "django.contrib.auth.context_processors.auth",
                "django.contrib.messages.context_processors.messages",
            ],
        },
    },
]

The highlighted line indicates the change you need to make. It modifies the DIRS list to include your templates folder. This tells Django where to look for your templates.

Note: Django 3.1 has moved from using the os library to pathlib and no longer imports os by default. If you’re using Django 3.1, then you need to either add import os above the TEMPLATES declaration or convert the "DIRS" entry to use pathlib instead.

The sample site you’ll be working with is a basic blogging application. The core app needs a models.py file to contain the models that store the blog content in the database. Edit core/models.py and add the following:

from django.db import models

class Blog(models.Model):
    title = models.CharField(max_length=50)
    content = models.TextField()

Now for some web pages. Create two views, one for listing all the blogs and one for viewing a blog. The code for your views goes in core/views.py:

from django.http import HttpResponse
from django.shortcuts import render, get_object_or_404
from core.models import Blog

def listing(request):
    data = {
        "blogs": Blog.objects.all(),
    }

    return render(request, "listing.html", data)

def view_blog(request, blog_id):
    blog = get_object_or_404(Blog, id=blog_id)
    data = {
        "blog": blog,
    }

    return render(request, "view_blog.html", data)

The listing() view does a query looking for all the Blog objects and passes that to the render() shortcut function. render() takes the request object that provides context to the view, the name of a template to render (listing.html), and the data object containing the query set of Blog objects.

Read the full article at https://realpython.com/django-view-authorization/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 28, 2020 02:00 PM UTC


Will Kahn-Greene

Everett v1.0.3 released!

What is it?

Everett is a configuration library for Python apps.

Goals of Everett:

  1. flexible configuration from multiple configured environments

  2. easy testing with configuration

  3. easy documentation of configuration for users

From that, Everett has the following features:

  • is composeable and flexible

  • makes it easier to provide helpful error messages for users trying to configure your software

  • supports auto-documentation of configuration with a Sphinx autocomponent directive

  • has an API for testing configuration variations in your tests

  • can pull configuration from a variety of specified sources (environment, INI files, YAML files, dict, write-your-own)

  • supports parsing values (bool, int, lists of things, classes, write-your-own)

  • supports key namespaces

  • supports component architectures

  • works with whatever you're writing--command line tools, web sites, system daemons, etc

v1.0.3 released!

This is a minor maintenance update that fixes a couple of minor bugs, addresses a Sphinx deprecation issue, drops support for Python 3.4 and 3.5, and adds support for Python 3.8 and 3.9 (largely adding those environments to the test suite).

Why you should take a look at Everett

At Mozilla, I'm using Everett for a variety of projects: Mozilla symbols server, Mozilla crash ingestion pipeline, and some other tooling. We use it in a bunch of other places at Mozilla, too.

Everett makes it easy to:

  1. deal with different configurations between local development and server environments

  2. test different configuration values

  3. document configuration options

First-class docs. First-class configuration error help. First-class testing. This is why I created Everett.

If this sounds useful to you, take it for a spin. It's a drop-in replacement for python-decouple and os.environ.get('CONFIGVAR', 'default_value') style of configuration so it's easy to test out.

Enjoy!

Where to go for more

For more specifics on this release, see here: https://everett.readthedocs.io/en/latest/history.html#october-28th-2020

Documentation and quickstart here: https://everett.readthedocs.io/

Source code and issue tracker here: https://github.com/willkg/everett

October 28, 2020 01:00 PM UTC


Peter Bengtsson

Generating random avatar images in Django/Python

tl;dr; <img src="/avatar.random.png" alt="Random avataaar"> generates this image:

Random avataaar
(try reloading to get a random new one. funny aren't they?)

When you use Gravatar you can convert people's email addresses to their mugshot.
It works like this:

<img src="https://www.gravatar.com/avatar/$(md5(user.email))">

But most people don't have their mugshot on Gravatar.com unfortunately. But you still want to display an avatar that is distinct per user. Your best option is to generate one and just use the user's name or email as a seed (so it's always random but always deterministic for the same user). And you can also supply a fallback image to Gravatar that they use if the email doesn't match any email they have. That's where this blog post comes in.

I needed that so I shopped around and found avataaars generator which is available as a React component. But I need it to be server-side and in Python. And thankfully there's a great port called: py-avataaars.

It depends on CairoSVG to convert an SVG to a PNG but it's easy to install. Anyway, here's my hack to generate random "avataaars" from Django:

import io
import random

import py_avataaars
from django import http
from django.utils.cache import add_never_cache_headers, patch_cache_control


def avatar_image(request, seed=None):
    if not seed:
        seed = request.GET.get("seed") or "random"

    if seed != "random":
        random.seed(seed)

    bytes = io.BytesIO()

    def r(enum_):
        return random.choice(list(enum_))

    avatar = py_avataaars.PyAvataaar(
        style=py_avataaars.AvatarStyle.CIRCLE,
        # style=py_avataaars.AvatarStyle.TRANSPARENT,
        skin_color=r(py_avataaars.SkinColor),
        hair_color=r(py_avataaars.HairColor),
        facial_hair_type=r(py_avataaars.FacialHairType),
        facial_hair_color=r(py_avataaars.FacialHairColor),
        top_type=r(py_avataaars.TopType),
        hat_color=r(py_avataaars.ClotheColor),
        mouth_type=r(py_avataaars.MouthType),
        eye_type=r(py_avataaars.EyesType),
        eyebrow_type=r(py_avataaars.EyebrowType),
        nose_type=r(py_avataaars.NoseType),
        accessories_type=r(py_avataaars.AccessoriesType),
        clothe_type=r(py_avataaars.ClotheType),
        clothe_color=r(py_avataaars.ClotheColor),
        clothe_graphic_type=r(py_avataaars.ClotheGraphicType),
    )
    avatar.render_png_file(bytes)

    response = http.HttpResponse(bytes.getvalue())
    response["content-type"] = "image/png"
    if seed == "random":
        add_never_cache_headers(response)
    else:
        patch_cache_control(response, max_age=60, public=True)

    return response

It's not perfect but it works. The URL to this endpoint is /avatar.<seed>.png and if you make the seed parameter random the response is always different.

To make the image not random, you replace the <seed> with any string. For example (use your imagination):

{% for comment in comments %}
  <img src="/avatar.{{ comment.user.id }}.png" alt="{{ comment.user.name }}">
  <blockquote>{{ comment.text }}</blockquote>
  <i>{{ comment.date }}</i>
{% endfor %}

I've put together this test page if you want to see more funny avatar combinations instead of doing work :)

October 28, 2020 12:10 PM UTC


Codementor

Dissecting a Web stack

A layer-by-layer review of the components of a web stack and the reasons behind them

October 28, 2020 11:20 AM UTC

Introducing AutoScraper: A Smart, Fast and Lightweight Web Scraper For Python

Scraping the web just got a lot more automated

October 28, 2020 11:04 AM UTC


Stefan Scherfke

Raise … from … in Python

When you recently upgraded to pylint 2.6.0, you may have stumbled across a new warning:

src/mylib/core.py:74:20: W0707:
  Consider explicitly re-raising using the
  'from' keyword (raise-missing-from)

The reason for this message is an exception that you raised from within an except block like this:

>>> class MyLibError(Exception):
...     """Base class for all errors raised by mylib"""
...
>>> def do_stuff("onoes"):
...     try:
...         int(text)
...     except ValueError as e:
...         raise MyLibError(e)

When you run do_stuff(), you’ll get the following traceback:

>>> do_stuff(text)
Traceback (most recent call last):
  File "<stdin>", line 3, in do_stuff
ValueError: invalid literal for int() with base 10: 'onoes'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in do_stuff
__main__.MyLibError: invalid literal for int() with base 10: 'onoes'

The important line here is:

During handling of the above exception, another exception occurred

This means that while you were handling the ValueError, another (unexpected) exception occurred: a MyLibError.

But this is not what we wanted to do – we wanted to replace the ValueError with MyLibError, so that our uses only have to handle a single exception type!

Enter raise … from …

To express I want to modify and forward an existing exception, you can use the raise NewException from cause syntax:

>>> def do_stuff(text):
...     try:
...         int(text)
...     except ValueError as e:
...         raise MyLibError(e) from e

When we run this now, we’ll get a different traceback printed:

>>> do_stuff("onoes")
Traceback (most recent call last):
  File "<stdin>", line 3, in do_stuff
ValueError: invalid literal for int() with base 10: 'onoes'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in do_stuff
__main__.MyLibError: invalid literal for int() with base 10: 'onoes'

Your users will now receive a MyLibError with the attached information, that the cause of this error was a ValueError somewhere in your code.

When the underlying cause is not important

If your users shouldn’t care about the underlying cause, because the new exception contains all the relevant information (i.e., that the provided input cannot be parsed), you may also omit the cause:

>>> def do_stuff(text):
...     try:
...         int(text)
...     except ValueError as e:
...         raise MyLibError(e) from None

When you run this now, you’ll get a nice an clean traceback:

>>> do_stuff("onoes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in do_stuff
__main__.MyLibError: invalid literal for int() with base 10: 'onoes'

Summary

When you raise an exception from within an except block in Python, you have three options:

October 28, 2020 10:23 AM UTC


Reuven Lerner

Now playing on YouTube: Answers to your Python questions

Over the last year, I’ve gotten increasingly active on my YouTube channel, https://YouTube.com/reuvenlerner. Each week, I upload 1-2 new videos, typically answering questions that I’ve gotten in my corporate training classes or from people online — via e-mail, or on Twitter (@reuvenmlerner).

So if you’re looking to learn about Jupyter shortcuts, or inner classes in Python, or the differences between “modules” and “packages,” then head on over to https://YouTube.com/reuvenlerner, and subscribe! And if there are Python topics you would like me to address, don’t hesitate to contact me. You might see your question answered in a video!

The post Now playing on YouTube: Answers to your Python questions appeared first on Reuven Lerner.

October 28, 2020 07:28 AM UTC

October 27, 2020


Exxact Corp

PyTorch 1.7.0 Now Available

PyTorch 1.7.0

PyTorch is a widely used, open source deep learning platform used for easily writing neural network layers in Python enabling a seamless workflow from research to production. Based on Torch, PyTorch has become a powerful machine learning framework favored by esteemed researchers around the world.

The newest stable release of PyTorch, version 1.7.0, has a number of new highlights including  CUDA 11, New APIs for FFTs, Windows support for Distributed training and more.

PyTorch 1.7.0 Release Notes


Interested in a deep learning solution?
Learn more about Exxact AI workstations starting at $3,700


Highlights

The PyTorch 1.7 release includes a number of new APIs including support for NumPy-Compatible FFT operations, profiling tools and major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training. In addition, several features moved to stable including custom C++ Classes, the memory profiler, the creation of custom tensor-like objects, user async functions in RPC and a number of other features in torch.distributed such as Per-RPC timeout, DDP dynamic bucketing and RRef helper.

A few of the highlights include:

To reiterate, starting PyTorch 1.6, features are now classified as stable, beta and prototype. You can see the detailed announcement here. Note that the prototype features listed in this blog are available as part of this release.

Front End APIs

[Beta] NumPy Compatible torch.fft module

FFT-related functionality is commonly used in a variety of scientific fields like signal processing. While PyTorch has historically supported a few FFT-related functions, the 1.7 release adds a new torch.fft module that implements FFT-related functions with the same API as NumPy.

This new module must be imported to be used in the 1.7 release, since its name conflicts with the historic (and now deprecated) torch.fft function.

Example usage:

>>> import torch.fft>>> t = torch.arange(4)>>> ttensor([0, 1, 2, 3]) >>> torch.fft.fft(t)tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j]) >>> t = tensor([0.+1.j, 2.+3.j, 4.+5.j, 6.+7.j])>>> torch.fft.fft(t)tensor([12.+16.j, -8.+0.j, -4.-4.j,  0.-8.j])

[Beta] C++ Support for Transformer NN Modules

Since PyTorch 1.5, we’ve continued to maintain parity between the python and C++ frontend APIs. This update allows developers to use the nn.transformer module abstraction from the C++ Frontend. And moreover, developers no longer need to save a module from python/JIT and load into C++ as it can now be used it in C++ directly.

[Beta] torch.set_deterministic

Reproducibility (bit-for-bit determinism) may help identify errors when debugging or testing a program. To facilitate reproducibility, PyTorch 1.7 adds the torch.set_deterministic(bool) function that can direct PyTorch operators to select deterministic algorithms when available, and to throw a runtime error if an operation may result in nondeterministic behavior. By default, the flag this function controls is false and there is no change in behavior, meaning PyTorch may implement its operations nondeterministically by default.

More precisely, when this flag is true:

Note that this is necessary, but not sufficient, for determinism within a single run of a PyTorch program. Other sources of randomness like random number generators, unknown operations, or asynchronous or distributed computation may still cause nondeterministic behavior.

See the documentation for torch.set_deterministic(bool) for the list of affected operations.

Performance & Profiling

[Beta] Stack traces added to profiler

Users can now see not only operator name/inputs in the profiler output table but also where the operator is in the code. The workflow requires very little change to take advantage of this capability. The user uses the autograd profiler as before but with optional new parameters: with_stack and group_by_stack_n. Caution: regular profiling runs should not use this feature as it adds significant overhead.

Distributed Training & RPC

[Stable] TorchElastic now bundled into PyTorch docker image

Torchelastic offers a strict superset of the current torch.distributed.launch CLI with the added features for fault-tolerance and elasticity. If the user is not be interested in fault-tolerance, they can get the exact functionality/behavior parity by setting max_restarts=0 with the added convenience of auto-assigned RANK and MASTER_ADDR|PORT (versus manually specified in torch.distributed.launch).

By bundling torchelastic in the same docker image as PyTorch, users can start experimenting with torchelastic right-away without having to separately install torchelastic. In addition to convenience, this work is a nice-to-have when adding support for elastic parameters in the existing Kubeflow’s distributed PyTorch operators.

[Beta] Support for uneven dataset inputs in DDP

PyTorch 1.7 introduces a new context manager to be used in conjunction with models trained using torch.nn.parallel.DistributedDataParallel to enable training with uneven dataset size across different processes. This feature enables greater flexibility when using DDP and prevents the user from having to manually ensure dataset sizes are the same across different process. With this context manager, DDP will handle uneven dataset sizes automatically, which can prevent errors or hangs at the end of training.

[Beta] NCCL Reliability – Async Error/Timeout Handling

In the past, NCCL training runs would hang indefinitely due to stuck collectives, leading to a very unpleasant experience for users. This feature will abort stuck collectives and throw an exception/crash the process if a potential hang is detected. When used with something like torchelastic (which can recover the training process from the last checkpoint), users can have much greater reliability for distributed training. This feature is completely opt-in and sits behind an environment variable that needs to be explicitly set in order to enable this functionality (otherwise users will see the same behavior as before).

[Beta] TorchScript remote and rpc_sync

torch.distributed.rpc.rpc_async has been available in TorchScript in prior releases. For PyTorch 1.7, this functionality will be extended the remaining two core RPC APIs, torch.distributed.rpc.rpc_sync and torch.distributed.rpc.remote. This will complete the major RPC APIs targeted for support in TorchScript, it allows users to use the existing python RPC APIs within TorchScript (in a script function or script method, which releases the python Global Interpreter Lock) and could possibly improve application performance in multithreaded environment.

[Beta] Distributed optimizer with TorchScript support

PyTorch provides a broad set of optimizers for training algorithms, and these have been used repeatedly as part of the python API. However, users often want to use multithreaded training instead of multiprocess training as it provides better resource utilization and efficiency in the context of large scale distributed training (e.g. Distributed Model Parallel) or any RPC-based training application). Users couldn’t do this with with distributed optimizer before because we need to get rid of the python Global Interpreter Lock (GIL) limitation to achieve this.

In PyTorch 1.7, we are enabling the TorchScript support in distributed optimizer to remove the GIL, and make it possible to run optimizer in multithreaded applications. The new distributed optimizer has the exact same interface as before but it automatically converts optimizers within each worker into TorchScript to make each GIL free. This is done by leveraging a functional optimizer concept and allowing the distributed optimizer to convert the computational portion of the optimizer into TorchScript. This will help use cases like distributed model parallel training and improve performance using multithreading.

Currently, the only optimizer that supports automatic conversion with TorchScript is Adagrad and all other optimizers will still work as before without TorchScript support. We are working on expanding the coverage to all PyTorch optimizers and expect more to come in future releases. The usage to enable TorchScript support is automatic and exactly the same with existing python APIs, here is an example of how to use this:

import torch.distributed.autograd as dist_autogradimport torch.distributed.rpc as rpcfrom torch import optimfrom torch.distributed.optim import DistributedOptimizer with dist_autograd.context() as context_id:  # Forward pass.  rref1 = rpc.remote(“worker1”, torch.add, args=(torch.ones(2), 3))  rref2 = rpc.remote(“worker1”, torch.add, args=(torch.ones(2), 1))  loss = rref1.to_here() + rref2.to_here()   # Backward pass.  dist_autograd.backward(context_id, [loss.sum()])   # Optimizer, pass in optim.Adagrad, DistributedOptimizer will  # automatically convert/compile it to TorchScript (GIL-free)  dist_optim = DistributedOptimizer(     optim.Adagrad,     [rref1, rref2],     lr=0.05,  )  dist_optim.step(context_id)

[Beta] Enhancements to RPC-based Profiling

Support for using the PyTorch profiler in conjunction with the RPC framework was first introduced in PyTorch 1.6. In PyTorch 1.7, the following enhancements have been made:

User are now able to use familiar profiling tools such as with torch.autograd.profiler.profile() and with torch.autograd.profiler.record_function, and this works transparently with the RPC framework with full feature support, profiles asynchronous functions, and TorchScript functions.

[Prototype] Windows support for Distributed Training

PyTorch 1.7 brings prototype support for DistributedDataParallel and collective communications on the Windows platform. In this release, the support only covers Gloo-based ProcessGroup and FileStore.
To use this feature across multiple machines, please provide a file from a shared file system in init_process_group.

# initialize the process groupdist.init_process_group(    “gloo”,    # multi-machine example:    # Shared files need six “/”    # init_method = `”file://////{machine}/{share_folder}/file”`    # Local file need three “/”    init_method=”file:///{your local file path}”,    rank=rank,    world_size=world_size) model = DistributedDataParallel(local_model, device_ids=[rank])

Mobile

PyTorch Mobile supports both iOS and Android with binary packages available in Cocoapods and JCenter respectively. You can learn more about PyTorch-Mobile here.

[Beta] PyTorch Mobile Caching allocator for performance improvements

On some mobile platforms, such as Pixel, we observed that memory is returned to the system more aggressively. This results in frequent page faults as PyTorch being a functional framework does not maintain state for the operators. Thus outputs are allocated dynamically on each execution of the op, for the most ops. To ameliorate performance penalties due to this, PyTorch 1.7 provides a simple caching allocator for CPU. The allocator caches allocations by tensor sizes and, is currently, available only via the PyTorch C++ API. The caching allocator itself is owned by client and thus the lifetime of the allocator is also maintained by client code. Such a client owned caching allocator can then be used with scoped guard, c10::WithCPUCachingAllocatorGuard, to enable the use of cached allocation within that scope.

Example usage:

#include <c10/mobile/CPUCachingAllocator.h>…..c10::CPUCachingAllocator caching_allocator;  // Owned by client code. Can be a member of some client class so as to tie the  // the lifetime of caching allocator to that of the class……{  c10::optional<c10::WithCPUCachingAllocatorGuard> caching_allocator_guard;  if (FLAGS_use_caching_allocator) {    caching_allocator_guard.emplace(&caching_allocator);  }  ….  model.forward(..);}…..

NOTE: Caching allocator is only available on mobile builds, thus the use of caching allocator outside of mobile builds won’t be effective.

Backwards Incompatible changes

Python API

torch.conj now returns the input as-is for real Tensors (#43270)

Previously, torch.conj and Tensor.conj were making a clone for Tensors of real dtype. It now returns the Tensor as-is to improve performance.
You can recover the original behavior by adding a .clone() for real Tensors.
Note that this behavior is different from numpy for which np.conj returns a new ndarray and ndarray.conj returns the ndarray as-is.

1.6.0 1.7.0
>>> t.is_complex()False>>> t.conj() is tFalse          >>> t.is_complex()False>>> t.conj() is tTrue>>>t.conj().clone() is tFalse         

torch.tensor, torch.as_tensor, and torch.sparse_coo_tensor now use the input Tensor’s device when it is not specified (#41984)

This will change the device on which the Tensor is created and so the user can start seeing device mismatch errors.
It also means for sparse Tensors that both of the provided Tensors must be on the same device if the device is not specified.
You can recover the original behavior by passing the device argument.

1.6.0 1.7.0
>>> t.devicedevice(type=‘cuda:0’)>>> # tensor constructor>>> torch.tensor(t, dtype=torch.float32).devicedevice(type=‘cpu’)>>> # sparse constructor>>> torch.sparse_coo_tensor(            torch.tensor(([0], [2]), device=”cpu”),            torch.tensor(([1.],), device=”cuda”),            size=(3, 3, 1)).devicedevice(type=’cuda’, index=0)           >>> t.devicedevice(type=‘cuda:0’)>>> # tensor constructor>>> torch.tensor(t, dtype=torch.float32).devicedevice(type=‘cuda:0’)>>> # Specify the device to get the same behavior as 1.6>>> torch.tensor(t, dtype=torch.float32, device=’cpu’).devicedevice(type=‘cpu’)>>> # sparse constructor>>> torch.sparse_coo_tensor(            torch.tensor(([0], [2]), device=”cpu”),            torch.tensor(([1.],), device=”cuda”),            size=(3, 3, 1)).deviceRuntimeError: backend of indices (CPU) must match backendof values (CUDA)>>> # Specify the device to get the same behavior as 1.6>>> torch.sparse_coo_tensor(            torch.tensor(([0], [2]), device=”cpu”),            torch.tensor(([1.],), device=”cuda”),            size=(3, 3, 1),            device=”cuda:0″).devicedevice(type=’cuda’, index=0)          

Improve torch.norm handling of keepdim=True (#41956)

Before this change, when calling torch.norm with keepdim=True and p=’fro’ or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. It is now properly respected.
Also, any time torch.norm was called with p=’nuc’ and keepdim=True, the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. It is now properly keeping all the dimensions.
You can recover the original behavior by setting keepdim=False.
NOTE: this function is now deprecated (see below) and we recommend you use torch.linalg.norm, which follows NumPy’s conventions.

1.6.0 1.7.0
>>> t.size()torch.Size([4, 4])>>> t.norm(p=‘fro’, keepdim=True).size()torch.size([])>>> t.norm(p=3, keepdim=True).size()torch.size([])>>> t.norm(p=‘nuc’, keepdim=True).size()torch.size([1])        >>> t.size()torch.Size([4, 4])>>> t.norm(p=‘fro’, keepdim=True).size()torch.size([1, 1])>>> t.norm(p=3, keepdim=True).size()torch.size([1, 1])>>> t.norm(p=‘nuc’, keepdim=True).size()torch.size([1, 1])        

torch.split and torch.chunk: Fix view tracking for the autograd (#41567)

The autograd system is able to correctly handle modifications through views of Tensors by explicitly tracking known view operations. In prior releases, torch.split and torch.chunk were not marked as known view operations, which could lead to silently wrong gradients.

Note that since v1.5, inplace modification of views created by functions that return multiple views is deprecated. Such case is not properly handled by the autograd and can lead to internal errors or wrong gradients. So, as a side effect of this view fix, inplace modifications of the outputs of torch.split and torch.chunk will now raise a warning and can lead to internal errors or wrong gradients while they were previously silently computing wrong gradients.
If you see such a warning, you should replace the inplace operation with an out of place one.
You can recover the original behavior by using the new torch.unsafe_split and torch.unsafe_chunk. Note that these functions are only here to ease the transition and will also be removed in a future version.

torch.{argmin,argmax} now always return the first min/max index (#42004)

torch.argmin (torch.argmax) now always returns the index of the first minimum (maximum) element. This choice is consistent with NumPy. Previously if there were multiple minima (maxima) the index returned could be the index of any of them.
You cannot recover the original behavior as it was platform dependent and not guaranteed. If your code was relying on a specific index for your specific platform, you should update it to work with the first index and this new code will work on all platforms.

torch.{min,max,median}: Update backward formula when doing full reduction (dim argument not provided) (#43519)

When no dimension is specified, full reduction is performed and the gradient will now flow back evenly towards all the input that realized the output value. The old behavior was to propagate the gradient only for one of such input selected arbitrarily.
This should improve stability of training by gradient descent.
To recover the previous behavior, you can perform the reduction with the dim= argument. It will ensure that the gradient only flows back for the input whose index was returned.

1.6.0 1.7.0
>>> atensor([3, 2, 3])>>> a.max().backward()>>> a.gradtensor([0, 0, 1])          >>> atensor([3, 2, 3])>>> a.max().backward()>>> a.gradtensor([0.5, 0, 0.5])>>> a.max(dim=0).max(dim=0).max(dim=0).backward()>>> a.gradtensor([0, 0, 1])         

nn.BCELoss size mismatch warning is now an error (#41426)

This is the end of the deprecation cycle for this op to make sure it does not have different broadcasting semantic compared to numpy’s broadcasting semantic used everywhere else in PyTorch’s codebase.
You need to make sure all inputs are the same size to avoid the error.

1.6.0 1.7.0
>>> bceloss = nn.BCELoss()>>> a = torch.rand(25)>>> b = torch.rand(25, 1)>>> bceloss(a, b)UserWarning: Using a target size (torch.Size([25, 1]))that is different to the input size (torch.Size([25]))is deprecated. Please ensure they have the same size.tensor(1.0604)         >>> bceloss = nn.BCELoss()>>> a = torch.rand(25)>>> b = torch.rand(25, 1)>>> bceloss(a, b)ValueError: Using a target size (torch.Size([25, 1]))that is different to the input size (torch.Size([25]))is deprecated. Please ensure they have the same size.>>> b = b.reshape(25)>>> bceloss(a, b)tensor(1.0604)     

Custom autograd.Function stop materializing None output Tensors (#41490)

To improve performance, the custom autograd.Function will not create a Tensor full of zeros when an input is differentiable but the user’s backward function returns None for it. This means that code for which the .backward() or autograd.grad() final result will now be None while it used to be a Tensor full of zeros.
You can recover the previous behavior by having your custom autograd.Function materialize the zero Tensor with torch.zeros_like(input) to replace the None output for the backward method.

import torch # Custom Function that returns None for the gradientclass GetTwos(torch.autograd.Function):    @staticmethod    def forward(ctx, inp):        return inp.clone().fill_(2)     @staticmethod    def backward(ctx, grad_out):        # To recover the 1.6 behavior, replace the line below with `return torch.zeros_like(grad_out)`        return None a = torch.rand(10, requires_grad=True)b = GetTwos.apply(a)b.sum().backward() print(a.grad)# In PyTorch 1.6 this will print# tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])# In PyTorch 1.7 this will print# None

Fix inplace detection for non-differentiable outputs (#41269)

We fixed a bug in the inplace detection code that was preventing the detection of some inplace operations for output that are not differentiable (like integer type Tensors).
This can lead to code that used to run fine to throw the error “a Tensor that was needed for backward was modified in an inplace operation”.
Such failure is true and the user code must be fixed to compute proper gradients. In general, this involves cloning the Tensor before modifying it inplace to make sure the backward pass can happen safely.

import torch a = torch.rand(10, requires_grad=True)with torch.no_grad():    a[2] = 10 b, ind = a.max(dim=0)# ind is 2 here with torch.no_grad():    t = torch.rand(10)    t[4] = 10    res = torch.max(t, dim=0, out=(torch.Tensor(), ind))    # ind becomes 4 here # This backward runs in 1.6 but will fail in 1.7b.sum().backward()print(a.grad)# tensor([0., 0., 0., 0., 1., 0., 0., 0., 0., 0.])# The value is wrong is at index 4 while it should be at index 2 # The issue is avoided by not modifying ind inplace by replacing the line# above with:# res = torch.max(t, dim=0, out=(torch.Tensor(), ind.clone()))

Add __torch_functions__ for methods (#37091)

Functions, slicing and Tensor methods will now properly preserve the subclass type when possible.

>>> class SubTensor(torch.Tensor):…     pass>>> type(torch.add(SubTensor([0]), SubTensor([1]))).__name__’SubTensor’>>> type(torch.add(SubTensor([0]), torch.Tensor([1]))).__name__’SubTensor’

The old behavior of “any operations on your subclass produces a torch.Tensor instead of the subclass” can be recovered by doing:

from torch._C import _disabled_torch_function_impl    class SubTensor(torch.Tensor):    __torch_function__ = _disabled_torch_function_impl

tensor.__iter__: Use torch.unbind instead of a for loop (#40884)

This improves performances significantly but it changes the behavior of in-place operations on the value returned by the iterator. This happens only if either the input Tensor or any argument of the in-place operation is a Tensor that requires gradients. And it will fail with “Output X of UnbindBackward is a view and is being modified inplace”.
You can recover the previous behavior by manually slicing the Tensor: [t[i] for i in range(t.size(0))] as shown in the example below.

1.6.0 1.7.0
>>> x = torch.randn(5, 10, requires_grad=True)>>> for i, v in enumerate(x):>>>     v.fill_(i)      >>> x = torch.randn(5, 10, requires_grad=True)>>> for i, v in enumerate([x[j] for j in range(x.size(0))]):>>>   v.fill_(i)     

Updated most function that take zero, one or two Tensor arguments and indexing op to check for memory overlap in the Tensor being worked on (#43418, #43419, #43420, #43421, #43423, #43422)

It fixes silent correctness errors: something that used to be silently incorrect now errors out. Code that raises this error must be updated to avoid doing such op that was returning wrong results as shown in the example below:

>>> x = torch.randn(1, 3)>>> # Create a tensor that has internal memory overlap>>> y = x.expand(2, 3) # In 1.6, this would not error out, but in 1.7, this errors out>>> torch.nn.functional.elu(y, inplace=True)RuntimeError: unsupported operation: more than one element of the written-to tensor refers to a single memory location. Please clone() the tensor before performing the operation. # Here is the fix in 1.7>>> torch.nn.functional.elu(y, inplace=False)

c++ API: Any external users of TensorIterator now always get the memory overlap check. The previous behavior can be recovered by setting set_check_mem_overlap(false) when creating the iterator.

TorchScript

TorchScript now correctly supports various exception type and custom exception message (#41907)

TorchScript now supports properties of TorchScript classes and ScriptModules (#42389, #42390)

Quantization

The convolution parameters now support versioning.

Some undocumented functions that were mistakenly made public have been removed

TorchScript Compiler Update

In 1.7, we are enabling a Profiling Executor and a new Tensor-Expressions-based (TE) Fuser. All compilations will now go through one (an adjustable setting) profiling run and one optimization run. For the profiling run, complete tensor shapes are recorded and used by the new Fuser. For the optimization run, the focus is on finding (in torch.jit.ScriptModules) and fusing element-wise operations over CUDA tensors into a single CUDA kernel.

The TE fuser is expected to deliver performance similar to the old fuser used in 1.6. It however unlocks more opportunities for performance improvements in future releases. In rare cases, performance of some models may degrade 5-10%. If you experience any regressions please report it on Github, so we can address them as soon as possible! For 1.7, we are providing an option for our users to revert back to the old fuser by calling torch._C._jit_set_profiling_executor(False) in Python and torch::jit::getExecutorMode()“ = false; in C++. For more information, please see “Graph Executor” section in our documentation.

Deprecations

Python API

torch.norm and torch.functional.norm are deprecated in favor of torch.linalg.norm (#44321)

The new torch.linalg.norm has the same behavior as numpy.linalg.norm
Both deprecated functions had odd behaviors for matrix and vector norms. You should refer to the doc here to find the exact behavior they had and how to replicate it with the new API.

Deprecate fft functions in torch. namespace in favor of torch.fft. namespace (#44876)

Please use torch.fft.foo as a drop-in replacement for torch.foo for the following functions: fft, ifft, rfft and irfft.

Warns when some out= functions need to resize an output which is not 0-size (#42079)

This behavior is dangerous and leads to an API that is hard to use. It is being deprecated to be able to fix that API in future versions.
You should resize the output before-hand to avoid any issue in the future:

a = torch.rand(5)b = torch.rand(25) # This is deprecatedtorch.add(a, a, out=b) # This has the same behavior but will work in future versionstorch.add(a, a, out=b.resize_(0))

torch.optim: Warn for duplicate params in param group (#41597)

Providing multiple times the same Parameter in a single param group is most likely due to user error and is being deprecated.
Please open an issue if you have a valid use case that require this feature.

torch.linspace and torch.logspace: Not giving the step argument is deprecated (#43860)

The default steps argument that has been used historically in PyTorch is not consistent with other libraries and so is being removed to avoid confusion.
For both functions, passing steps=100 keyword argument can be used to recover the original behavior.

1.6.0 1.7.0
>>> torch.linspace(0, 10).size()torch.Size([100])          >>> torch.linspace(0, 10).size()UserWarning: Not providing a value for linspace’ssteps is deprecated and will throw a runtime errorin a future release.torch.Size([100])>>> torch.linspace(0, 10, steps=100).size()torch.Size([100])     

Distributed

New features

Python API

New namespaces:

New operators:

API extension:

Autograd

CUDA

C++ API

Mobile

Vulkan

Distributed

Quantization

Misc

Improvements

Python API

torch.nn

Build

Distributed

TorchScript

Mobile

Quantization

ONNX

In PyTorch 1.7, we have continued to add and improve PyTorch operator export to ONNX. We have enabled export of 10 new operators, and further enhanced and optimized export of 10+ torch operators to ONNX. We have also focused on improving export of TorchScript modules, in particular laying some groundwork required for better support in near future. We have also created an API (torch.onnx.utils._find_missing_ops_onnx_export) as a diagnostic tool (preview only) to get a list of operators in a model that are not supported or implemented by ONNX exporter. Support for export of torch.quantization.FakeQuantize has also been added to help enable some QAT workflows.

Misc

Python Type Annotations

Bug fixes

Python API

Torch.nn

C++ API

Distributed

TorchScript

Quantization

ONNX

Misc

Performance

Python API

Distributed

TorchScript

Mobile

Quantization

Misc

Documentation

Python API

Distributed

TorchScript

Mobile

Quantization

Misc


This release has 2 assets:

Visit the release page to download them.


Interested in learning more about best practices and how to avoid common pitfalls when implementing deep learning?
Download our eBook: Getting Started in Deep Learning


The post PyTorch 1.7.0 Now Available appeared first on Exxact.

October 27, 2020 09:30 PM UTC


PyCoder’s Weekly

Issue #444 (Oct. 27, 2020)

#444 – OCTOBER 27, 2020
View in Browser »

The PyCoder’s Weekly Logo


Python Modulo in Practice: How to Use the % Operator

In this tutorial, you’ll learn about the Python modulo operator (%). You’ll look at the mathematical concepts behind the modulo operation and how the modulo operator is used with Python’s numeric types. You’ll also see ways to use the modulo operator in your own code.
REAL PYTHON

Implementation Plan for Speeding Up CPython

Core contributer Mark Shannon has a plan to increase CPython’s performance fivefold in four stages of updates.
MARK SHANNON

Pinpoint Hard-To-Reproduce Problems in Your Production Code Without Affecting Your App Performance.

alt

Datadog’s Continuous Profiler is an always-on, production code profiler that enables you to analyze code-level performance across your entire environment, with minimal overhead. Profiles reveal which functions (or lines of code) consume the most resources, such as CPU and memory. Try it yourself →
DATADOG sponsor

On Code Isolation in Python

How can you hide, or isolate, Python code in an application from potential bad actors? Learn several methods for doing so and why you should never run thirs party code in the same Python interpreter as your applications.
ARTEM GOLUBIN

The Real Python Podcast – Episode #32: Our New “Python Basics” Book & Filling the Gaps in Your Learning Path

Do you have gaps in your Python learning path? If you’re like me, you may have followed a completely random route to learn Python. This week on the show, David Amos is here to talk about the release of the Real Python book, “Python Basics: A Practical Introduction to Python 3”. The book is designed not only to get beginners up to speed but also to help fill in the gaps many intermediate learners may still have.
REAL PYTHON

Python Software Foundation Fellow Members for Q3 2020

Congratulations to all the new fellows!
PYTHON SOFTWARE FOUNDATION

Discussions

Speeding Up CPython

Mark Shannon has a plan to speed up CPython roughly 5 times over four stages. The community opines.
PYTHON-DEV

The youtube-dl GitHub Repo Has Received a DMCA Takedown Request From the RIAA

REDDIT

Python Jobs

Senior Full Stack Developer (Chicago, IL, USA)

Panopta

Senior Software Engineer (Remote)

Silicon Therapeutics

Senior Research Programmer (Remote)

Silicon Therapeutics

Business Analyst/Data Analyst (South San Francisco, CA, USA)

Projas Technologies, LLC

More Python Jobs >>>

Articles & Tutorials

How to Shoot Yourself in the Foot With Python, Part 1

If you’re new to Python, you might find yourself confused by some of the situations described in this article. Learn about five mistakes you could make, why they happen, and how to fix them.
MIGUEL BRITO

Using the Facade Pattern to Wrap Third-Party Integrations

“There are many ways we can create systems with layered architecture; one of the more popular techniques is to leverage Structural Design Patterns to create explicit relationships between classes. This post explores how the Facade Pattern can be used to wrap third-party integrations to improve software design.”
ALY SIVJI • Shared by Aly Sivji

Python Developers Are in Demand on Vettery

alt

Get discovered by top companies using Vettery to actively grow their tech teams with Python developers (like you). Here’s how it works: create a profile, name your salary, and connect with hiring managers at startups to Fortune 500 companies. Sign up today - it’s completely free for job-seekers →
VETTERY sponsor

Creating a Binary Search in Python

Binary search is a classic algorithm in computer science. In this step-by-step course, you’ll learn how to implement this algorithm in Python. You’ll learn how to leverage existing libraries as well as craft your own binary search Python implementation.
REAL PYTHON

Solving the Sequence Alignment Problem in Python

Sequence alignment is a method of pairing elements of two sequences under some constraints. It can be used to analyze sequences of biological data, such as nucleic acid sequences. Learn how solve the sequence alignment problem in Python using a brute-force method and a more efficient method that uses dynamic programming.
JOHN LEKBERG

Getting Started with OpenTelemetry and Distributed Tracing in Python

Learn why distributed tracing is the foundation for observability, and how to instrument your Python applications with OpenTelemetry in under 10 minutes.
LIGHTSTEP sponsor

Level Up Your Skills With the Real Python Slack Community

In this guide, you’ll learn how to get the most out of your Real Python membership using the community Slack. You’ll learn some lesser-known features of Slack and see how to communicate your technical problems more effectively.
REAL PYTHON

Higher Kinded Types in Python

Higher kinded types (HKT) are a notion in type theory that can be really helpful in functional programming and typing tensors and matrices. HKTs aren’t supported yet in Python, but you can emulate them.
NIKITA SOBOLEV

Automating Photoshop

Learn how to automate Photoshop using Python and the Photoshop COM programming interface.
DAVID VAILLANCOURT

Projects & Code

lambda-networks: A New Approach to Image Recognition That Reaches SOTA With Less Compute

GITHUB.COM/LUCIDRAINS

you-get: Dumb Downloader That Scrapes the Web

GITHUB.COM/SOIMORT

nbQA: Run Any Standard Python Code Quality Tool on a Jupyter Notebook

GITHUB.COM/NBQA-DEV

Sorting-Algorithms-Visualizer: See How Sorting Algorithm Works With Pygame

GITHUB.COM/LUCASPILLA

Faster-Cpython: How to Make CPython Faster

GITHUB.COM/MARKSHANNON

Super-mario-bros-PPO-pytorch: Proximal Policy Optimization (PPO) Algorithm for Super Mario Bros

GITHUB.COM/UVIPEN

Events

SciPy Japan 2020

October 30 to November 3, 2020
SCIPY.ORG

Python Brasil 2020

November 2 to November 9, 2020
PYTHONBRASIL.ORG.BR


Happy Pythoning!
This was PyCoder’s Weekly Issue #444.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

October 27, 2020 07:30 PM UTC


Python Morsels

Equality vs Identity

Watch First:


Transcript

You're probably already familiar with equality: that's the == operator. identity uses the is operator.

Equality

Let's say we have two variables, x and y pointing to two lists:

>>> x = [1, 2, 3]
>>> y = [1, 2, 3]

If we say x == y we're asking about equality:

>>> x == y
True

Equality is about whether these two objects represent the same value, whether they're essentially the same thing.

The == operator delegates to one of these objects and asks it, "do you represent the same value as the other object?" That's up to the actual objects to answer.

Identity

The is operator asks about identity.

>>> x is y
False

Unlike ==, the is operator doesn't even look at the objects that x and y point to. Instead, is asks if the variables x and y are pointing to the same object.

The is operator answers the question do two object references actually point to the same object? The expression x is y checks the memory location that x and y are pointing to (by their id) and checks to see if those locations are the same.

>>> id(x)
139957046343296
>>> id(y)
139957046343488

If x and y have the same id in memory, that means we're referencing the same object in two places: x and y are actually referring to the same exact object.

In Python, assignment points a variable to an object.

If we assign x to y:

>>> x = y

We're now pointing the variable x to the same object that y is currently pointing to.

>>> id(x)
139957046343488
>>> id(y)
139957046343488

If we call the append method on the list that x points to

x.append(4)

We've mutated that list object, but we've also mutated the object that y points to because both x and y point to exactly the same object.

>>> x
[1, 2, 3, 4]
>>> y
[1, 2, 3, 4]

These two objects are equal:

>>> x == y
True

But they're also identical (which means they point to the same object):

>>> x is y
True

Inequality and Non-identity

Just as we have equality (==) and inequality (!=):

>>> x == y
True
>>> x != y
False

we also have is for identity (is) and unidentity (is not)... or is it inidentity. How about non-identity.

>>> x is y
True
>>> x is not y
False

The is not operator is one of the few operators in Python that actually has a space inside it.

When is identity used?

You really don't see identity used very often. This is really the most important takeaway about identity and equality: you'll use == all the time, but you'll almost never use is. When comparing two objects, you'll almost always want to check for equality instead of identity.

The place you'll most commonly see is usedis with None:

>>> x is None
False
>>> x is not None
True

, there are other places you might see it used, the one place you'll most commonly see it used

There's only one None value in memory in Python. We're asking the question "is x pointing to the one and only None value".

You'll only see is used with special values where there's only one of them in memory. I call these sentinel values. Sentinel objects are considered completely unique. There's only one of them floating in the memory.

Sentinel values are pretty much the one place you'll see identity used and None is by far the most common sentinel value in Python.

PEP 8, the Python style guide, says you should use identity to compare with None. So you should never x == None, but instead type x is None. That's the convention that we use with None in Python.

Summary

When you want to ask the question "does one object represent the same data as another object", you pretty much always want to use equality (with the == or != operators).

You'll almost never need to ask the question "is one pointer referencing literally the same object as another pointer". If you do want to ask that question though, you'll check identity (with the is and is not operators).

The one time that you really should rely on identity is when comparing to None (checking x is None).

October 27, 2020 03:00 PM UTC


Real Python

Creating a Binary Search in Python

Binary search is a classic algorithm in computer science. It often comes up in programming contests and technical interviews. Implementing binary search turns out to be a challenging task, even when you understand the concept. Unless you’re curious or have a specific assignment, you should always leverage existing libraries to do a binary search in Python or any other language.

In this course, you’ll learn how to:

This course assumes you’re a student or an intermediate programmer with an interest in algorithms and data structures. At the very least, you should be familiar with Python’s built-in data types, such as lists and tuples. In addition, some familiarity with recursion, classes, data classes, and lambdas will help you better understand the concepts you’ll see in this course.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

October 27, 2020 02:00 PM UTC


Stack Abuse

How to Set Axis Range (xlim, ylim) in Matplotlib

Introduction

Matplotlib is one of the most widely used data visualization libraries in Python. Much of Matplotlib's popularity comes from its customization options - you can tweak just about any element from its hierarchy of objects.

In this tutorial, we'll take a look at how to set the axis range (xlim, ylim) in Matplotlib, to truncate or expand the view to specific limits.

Creating a Plot

Let's first create a simple plot:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')

plt.show()

Here, we've plotted two sine functions, starting at 0 and ending at 100 with a step of 0.1. Running this code yields:

matplotlib plot subplots

Now, we can tweak the range of this axis, which currently goes from 0 to 100.

Setting Axis Range in Matplotlib

Now, if we'd like to truncate that view, into a smaller one or even a larger one, we can tweak the X and Y limits. These can be accessed either through the PyPlot instance, or the Axes instance.

How to Set X-Limit (xlim) in Matplotlib

Let's first set the X-limit, using both the PyPlot and Axes instances. Both of these methods accept a tuple - the left and right limits. So, for example, if we wanted to truncate the view to only show the data in the range of 25-50 on the X-axis, we'd use xlim([25, 50]):

fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')

plt.xlim([25, 50])

This limits the view on the X-axis to the data between 25 and 50 and results in:

how to set x limit axis range in matplotlib

This same effect can be achieved by setting these via the ax object. This way, if we have multiple Axes, we can set the limit for them separately:

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(figsize=(12, 6))

x = np.arange(0, 10, 0.1)
y = np.sin(x)
z = np.cos(x)

ax = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

ax.set_title('Full view')
ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')

ax2.set_title('Truncated view')
ax2.plot(y, color='blue', label='Sine wave')
ax2.plot(z, color='black', label='Cosine wave')

ax2.set_xlim([25, 50])

plt.show()

how to set x limit axis range for subplots in matplotlib

How to Set Y-Limit (ylim) in Matplotlib

Now, let's set the Y-limit. This can be achieved with the same two approaches:

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')

plt.ylim([-1, 0])

Or:

ax.plot(y, color='blue', label='Sine wave')
ax.plot(z, color='black', label='Cosine wave')

ax.set_ylim([-1, 0])

Both of which result in:

how to set y limit axis range in matplotlib

Conclusion

In this tutorial, we've gone over how to set the axis range (i.e. the X and Y limits) using Matplotlib in Python.

If you're interested in Data Visualization and don't know where to start, make sure to check out our book on Data Visualization in Python.

Data Visualization in Python, a book for beginner to intermediate Python developers, will guide you through simple data manipulation with Pandas, cover core plotting libraries like Matplotlib and Seaborn, and show you how to take advantage of declarative and experimental libraries like Altair.

Data Visualization in Python

Understand your data better with visualizations! With over 275+ pages, you'll learn the ins and outs of visualizing data in Python with popular libraries like Matplotlib, Seaborn, Bokeh, and more.

October 27, 2020 01:15 PM UTC


Artem Golubin

On code isolation in Python

I started learning Python in 2009, and I had a pretty challenging task and somewhat unusual use of Python. I was working on a desktop application that used PyQT for GUI and Python as the main language.

To hide the code, I embedded Python interpreter into a standalone Windows executable. There are a lot of solutions to do so (e.g. pyinstaller, pyexe), and they all work similarly. They compile your Python scripts to bytecode files and bundle them with an interpreter into an executable. Compiling scripts down to bytecode makes it harder for people with bad intentions to get the source code and crack or hack your software. Bytecode has to be extracted from the executable and decompiled. It can also produce obfuscated code that is much harder to understand.

[....]

October 27, 2020 12:54 PM UTC


Python Software Foundation

Python Software Foundation Fellow Members for Q3 2020

It's that time of year! Let us welcome the new PSF Fellows for Q3! The following people continue to do amazing things for the Python community:

Débora Azevedo

Twitter, Website

Ines Montani

Twitter, GitHub, Website

John Roa

Karolina Ladino

Website, Twitter

Katia Lira

Twitter

Mariatta Wijaya

Twitter, GitHub Sponsor, GitHub, LinkedIn

Melissa Weber Mendonça

GitHub, Twitter

Ng Swee Meng

LinkedIn, GitHub, Twitter, Instagram

Nilo Ney Coutinho Menezes

GitHub, Blog, Twitter, Website

Park Hyun-woo

GitHub, Twitter

Ram Rachum

GitHub, Blog

Sebastian Vetter

LinkedIn, Website

Thank you for your continued contributions. We have added you to our Fellow roster online.

The above members help support the Python ecosystem by contributing to CPython, contributing to the PyLadies community, maintaining Python libraries, creating educational material, translating courses, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.

Let's continue to recognize Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 4 through November 20, 2020.

Work Group Needs Members

The Fellow Work Group is looking for more members from all around the world! If you are a PSF Fellow and would like to help review nominations, please email us at psf-fellow at python.org. More information is available at: https://www.python.org/psf/fellows/.

October 27, 2020 10:47 AM UTC