skip to navigation
skip to content

Planet Python

Last update: January 19, 2019 10:48 AM UTC

January 19, 2019


codingdirectional

Tidy up the user interface of the video editing application

Hello and welcome back, it has been a day since the last post and today we will continue to edit our video application project. After I have included the final feature for this project I can now concentrate on the user interface part. Mine ideology is always to focus on the main objective first before working on the small details when it comes to programming, as long as we have destroyed the main battleship then it will easy to take on those small battleships that have lost their main supply line.

In this article, we will create the below user interface which consists of a button to select the video file, a checkbox to remove the audio and another checkbox for adding new audio.

The new user interface

Below is the entire program.

from tkinter import *
from tkinter import filedialog
import os
import subprocess
import tkinter.ttk as tk

win = Tk() # Create instance
win.title("NeWw Vid") # Add a title
win.resizable(0, 0) # Disable resizing the GUI
win.configure(background='white') # change background color

mainframe = Frame(win) # create a frame
mainframe.pack()

buttonFrame = Frame(win) # create a button frame
buttonFrame.pack(side = BOTTOM, fill=X)

#  Create a label
#aLabel = Label(win, text="Select video size and video", anchor="center", padx=13, pady=10, relief=RAISED)
#aLabel.grid(column=0, row=0, sticky=W+E)
#aLabel.configure(foreground="black")
#aLabel.configure(background="white")
#aLabel.configure(wraplength=110)

# Create a combo box
vid_size = StringVar() # create a string variable
preferSize = tk.Combobox(mainframe, textvariable=vid_size) 
preferSize['values'] = (1920, 1280, 854, 640) # video width in pixels
preferSize.grid(column=0, row=1) # the position of combo box
preferSize.current(0) # select item one 
preferSize.pack(side = LEFT, expand = TRUE)

removeAudioVal = IntVar()
removeAudio = tk.Checkbutton(mainframe, text="Remove Audio", variable=removeAudioVal)
removeAudio.pack(side = LEFT, padx=3)

newAudio = IntVar()
aNewAudio = tk.Checkbutton(mainframe, text="New Audio", variable=newAudio)
aNewAudio.pack(side = LEFT, padx=2)

# Open a video file
def openVideo():
        
        fullfilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Video file", "*.mp4; *.avi ")]) # select a video file from the hard drive
        
        if(newAudio.get() == 1):
                audiofilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Audio file", "*.wav; *.ogg ")]) # select a new audio file from the hard drive
                
        if(fullfilename != ''): 

                scale_vid = preferSize.get() # retrieve value from the comno box
                new_size = str(scale_vid)
                dir_path = os.path.dirname(os.path.realpath(fullfilename))
                os.chdir(dir_path)
                f = new_size  + '.mp4' # the new output file name/format
                f2 = f + '.mp4' # webm video

                noAudio = removeAudioVal.get() # get the checkbox state for audio 

                #subprocess.call(['ffmpeg', '-stream_loop', '2', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize and loop the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', '-r', '24', f]) # resize and speed up the video with ffmpeg
               
                #subprocess.call(['ffmpeg', '-i', f, '-ss', '00:02:30', '-y', f2]) # create animated gif starting from 2 minutes and 30 seconds to the end
                subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize the video with ffmpeg

                if(noAudio == 1):
                        subprocess.call(['ffmpeg', '-i', f, '-c', 'copy', '-y', '-an', f2]) # remove audio from the original video
                
                if(audiofilename != '' and noAudio == 1):
                        subprocess.call(['ffmpeg', '-i', f2, '-i', audiofilename, '-shortest', '-c:v', 'copy', '-c:a', 'aac', '-b:a', '256k', '-y', f]) # add audio to the original video, trim either the audio or video depends on which one is longer

                #subprocess.call(['ffmpeg', '-i', f, '-vf', 'eq=contrast=1.3:brightness=-0.03:saturation=0.01', '-y', f2]) # adjust the saturation contrast and brightness of video
                #subprocess.call(['ffmpeg', '-i', f, '-y', f2]) # converting the video with ffmpeg

                
action_vid = tk.Button(buttonFrame, text="Open Video", command=openVideo)
action_vid.pack(fill=X)

win.mainloop()

Not bad for now, we will continue to modify the user interface in the next chapter. Below is the new video which this program has created.

http://islandstropicalman.tumblr.com/post/182129073622/music-with-a-lot-of-ants

January 19, 2019 06:03 AM UTC

January 18, 2019


Stack Abuse

Lambda Functions in Python

What are Lambda Functions?

In Python, we use the lambda keyword to declare an anonymous function, which is why we refer to them as "lambda functions". An anonymous function refers to a function declared with no name. Although syntactically they look different, lambda functions behave in the same way as regular functions that are declared using the def keyword. The following are the characteristics of Python lambda functions:

In this article, we will discuss Python's lambda functions in detail, as well as show examples of how to use them.

Creating a Lambda Function

We use the following syntax to declare a lambda function:

lambda argument(s): expression  

As stated above, we can have any number of arguments but only a single expression. The lambda operator cannot have any statements and it returns a function object that we can assign to any variable.

For example:

remainder = lambda num: num % 2

print(remainder(5))  

Output

1  

In this code the lambda num: num % 2 is the lambda function. The num is the argument while num % 2 is the expression that is evaluated and the result of the expression is returned. The expression gets the modulus of the input parameter by 2. Providing 5 as the parameter, which is divided by 2, we get a remainder of 1.

You should notice that the lambda function in the above script has not been assigned any name. It simply returns a function object which is assigned to the identifier remainder. However, despite being anonymous, it was possible for us to call it in the same way that we call a normal function. The statement:

lambda num: num % 2  

Is similar to the following:

def remainder(num):  
    return num % 2

Here is another example of a lambda function:

product = lambda x, y : x * y

print(product(2, 3))  

Output

6  

The lambda function defined above returns the product of the values of the two arguments.

Why Use Lambda Functions?

Lambda functions are used when you need a function for a short period of time. This is commonly used when you want to pass a function as an argument to higher-order functions, that is, functions that take other functions as their arguments.

The use of anonymous function inside another function is explained in the following example:

def testfunc(num):  
    return lambda x : x * num

In the above example, we have a function that takes one argument, and the argument is to be multiplied with a number that is unknown. Let us demonstrate how to use the above function:

def testfunc(num):  
    return lambda x : x * num

result1 = testfunc(10)

print(result1(9))  

Output

90  

In the above script, we use a lambda function to multiply the number we pass by 10. The same function can be used to multiply the number by 1000:

def testfunc(num):  
  return lambda x : x * num

result2 = testfunc(1000)

print(result2(9))  

Output

9000  

It is possible for us to use the testfunc() function to define the above two lambda functions within a single program:

def testfunc(num):  
    return lambda x : x * num

result1 = testfunc(10)  
result2 = testfunc(1000)

print(result1(9))  
print(result2(9))  

Output

90  
9000  

Lambda functions can be used together with Python's built-in functions like map(), filter() etc.

In the following section, we will be discussing how to use lambda functions with various Python built-in functions.

The filter() Function

The Python's filter() function takes a lambda function together with a list as the arguments. It has the following syntax:

filter(object, iterable)  

The object here should be a lambda function which returns a boolean value. The object will be called for every item in the iterable to do the evaluation. The result is either a True or a False for every item. Note that the function can only take one iterable as the input.

A lambda function, along with the list to be evaluated, is passed to the filter() function. The filter() function returns a list of those elements that return True when evaluated by the lambda funtion. Consider the example given below:

numbers_list = [2, 6, 8, 10, 11, 4, 12, 7, 13, 17, 0, 3, 21]

filtered_list = list(filter(lambda num: (num > 7), numbers_list))

print(filtered_list)  

Output

[8, 10, 11, 12, 13, 17, 21]

In the above example, we have created a list named numbers_list with a list of integers. We have created a lambda function to check for the integers that are greater than 7. This lambda function has been passed to the filter() function as the argument and the results from this filtering have been saved into a new list named filtered_list.

The map() Function

The map() function is another built-in function that takes a function object and a list. The syntax of map function is as follows:

map(object, iterable_1, iterable_2, ...)  

The iterable to the map() function can be a dictionary, a list, etc. The map() function basically maps every item in the input iterable to the corresponding item in the output iterable, according to the logic defined by the lambda function. Consider the following example:

numbers_list = [2, 6, 8, 10, 11, 4, 12, 7, 13, 17, 0, 3, 21]

mapped_list = list(map(lambda num: num % 2, numbers_list))

print(mapped_list)  

Output

[0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1]

In the script above, we have a list numbers_list, which consists of random numbers. We then call the map() function and pass it a lambda function as the argument. The lambda function calculates the remainder after dividing each number by 2. The result of the mapping is stored in a list named mapped_list. Finally, we print out the contents of the list.

Conclusion

In Python, a lambda function is a single-line function declared with no name, which can have any number of arguments, but it can only have one expression. Such a function is capable of behaving similarly to a regular function declared using the Python's def keyword. Often times a lambda function is passed as an argument to another function.

In this article we explained the syntax, use-cases, and examples of commonly used lambda functions.

January 18, 2019 02:10 PM UTC


Python Bytes

#113 Python Lands on the Windows 10 App Store

January 18, 2019 08:00 AM UTC


Vasudev Ram

Announcing PIaaS - Python Interviewing as a Service



Hello, readers,

Announcing Python Interviewing as a Service:

I'm now officially offering PIaaS - Python Interviewing as a Service. I have done it some earlier, informally, for clients. Recently a couple of companies asked me for help on this again, so I am now adding it to my list of offered services, the others being consulting (software design and development, code review, technology evaluation and recommendation) and software training.

I can help your organization interview and hire Python developer candidates, offloading (some of) that work from your core technical and HR / recruitment staff.

I can also interview on related areas like SQL and RDBMS, and Unix and Linux commands and shell scripting.

I have long-term experience in all the above areas.

To hire me for PIaaS or to learn more about it, contact me via the Gmail address on my site's contact page.

- Vasudev Ram

My Codementor profile: Vasudev Ram on Codementor


January 18, 2019 02:46 AM UTC


Reuven Lerner

Beyond the “hello, world” of Python’s “print” function

One of the first things that anyone learns in Python is (of course) how to print the string, “Hello, world.”  As you would expect, the code is straightforward and simple:

print('Hello, world')

And indeed, Python’s “print” function is so easy and straightforward to use that we barely give it any thought.  We assume that people know how to use it — and for the most part, for most of the things they want to do, that’s true.

But lurking beneath the surface of the “print” function is a lot of functionality, as well as some history (and even a bit of pain).  Understanding how to use “print” can cut down on the code you write, and generally make it easier for you to work with.

The basics

The basics are simple: “print” is a function, which means that if you want to invoke it, you need to use parentheses:

>>> print('hello')

hello

You can pass any type of data to “print”. Strings are most common, but you can also ints, floats, lists, tuples, dicts, sets, or any other object. For example:

>>> print(5)

5

or

>>> print([10, 20, 30])

[10, 20, 30]

And of course, it doesn’t matter whether the thing you’re trying to print is passed as a literal object, or referenced by a variable:

>>> d = {'a':1, 'b':2, 'c':3}>>> print(d)

{'a':1, 'b':2, 'c':3}

You can also put an expression inside of the parentheses; the value of the expression will be passed to “print”:

>>> print(3+5)
8

>>> print([10, 20] + [30, 40])
[10, 20, 30, 40]

Every object in Python knows how to display itself as a string, which means that you can pass it directly to “print”. There isn’t any need to turn things into strings before handing them to “print”:

print(str([10, 20, 30])    # unnecessary use of "str"

[10, 20, 30]

After “print” displays its output, it adds a newline.  For example:

>>> print('abc')
>>> print('def')
>>> print('ghi')
abc
def
ghi

You can pass as many arguments as you want to “print”, separated by commas. Each will be printed, in order, with a space between them:

>>> print('abcd', 'efgh', [10, 20, 30], 99, 'ijkl')

abcd efgh [10, 20, 30] 99 ijkl

We’ll see, below, how we can change these two default behaviors.

Inputs and outputs

If “print” is a function, then it must have a return value. Let’s take a look:

>>> x = print('abcd')
>>> type(x)
NoneType

In other words: “print” returns None, no matter what you print. After all, you’re not printing in order to get a return value, but rather for the side effect.

What about arguments to “print”?  Well, we’ve already seen that we can pass any number of arguments, each of which will be printed.  But there are some optional parameters that we can pass, as well.

The two most relevant ones allow us to customize the behavior we saw before, changing what string appears between printed items and what is placed at the end of the output.

The “sep” parameter, for example, defaults to ‘ ‘ (a space character), and is placed between printed items.  We can set this to any string, including a multi-character string:

>>> print('a', 'b', 'c', sep='*')
a*b*c

>>> print('abc', 'def', 'ghi', sep='***')
abc***def***ghi

>>> print([10, 20, 30], [40, 50, 60], [70, 80, 90], sep='***')
[10, 20, 30]***[40, 50, 60]***[70, 80, 90]

Notice that “sep” is placed between the arguments to “print”, not between the elements of each argument.  Thus in this third example, the ‘***’ goes between the lists, rather than between the integer elements of the lists.

If you want the arguments to be printed alongside one another, you can set “sep” to be an empty string:

>>> print('abc', 'def', 'ghi', sep='')
abcdefghi

Similarly, the “end” parameter defaults to ‘\n’ (newline), but can contain any string. It determines what’s printed after “print” is done.

For example, if you want to have some extra lines after you print something, just change “end” so that it has a few newlines:

>>> def foo():
        print('abc', end='\n\n\n')
        print('def', end='\n\n\n')
>>> foo()
abc


def


If, by contrast, you don’t want “print” to add a newline at the end of what you print, you can set “end” to be an empty string:

>>> def foo():
        print('abc', end='')
        print('def', end='')

>>> foo()
abcdef>>>

Notice how in the Python interactive shell, using the empty string to print something means that the next ‘>>>’ prompt comes after what you printed.  After all, you didn’t ask for there to be a newline after what you wrote, and Python complied with your request.

Of course, you can pass values for “end” that don’t involve newlines at all. For example, let’s say that you want to output multiple fields to the screen, with each field printed in a separate line:

>>> def foo():
        print('abc', end=':')
        print('def', end=':')
        print('ghi')

>>> foo()
abc:def:ghi

Printing to files

By default, “print” sends its data to standard output, known in Python as “sys.stdout”.  While the “sys” module is automatically loaded along with Python, its name isn’t available unless you explicitly “import sys”.

The “print” function lets you specify, with the “file” parameter, another file-like object (i.e., one that adheres to the appropriate protocol) to which you want to write. The object must be writable, but other than that, you can use any object.

For example:

>>> f = open('myfile.txt', 'w')

>>> print('hello')
hello
>>> print('hello???', file=f)
>>> print('hello again')
hello again
>>> f.close()

>>> print(open('myfile.txt').read())
hello???

In this case, the output was written to a file.  But we could also have written to a StringIO object, for example, which acts like a file but isn’t one.

Note that if I hadn’t closed “f” in the above example, the output wouldn’t have arrived in the file. That’s because Python buffers all output by default; whenever you write to a file, the data is only actually written when the buffer fills up (and is flushed), when you invoke the “flush” method explicitly, or when you close the file, and thus flush implicitly. Using the “with” construct with a file object closes it, and thus flushes the buffers as well.

There is another way to flush the output buffer, however: We can pass a True value to the “flush” parameter in “print”.  In such a case, the output is immediately flushed to disk, and thus written.  This might sound great, but remember that the point of buffering is to lessen the load on the disk and on the computer’s I/O system. So flush when you need, but don’t do it all of the time — unless you’re paid by the hour, and it’s in your interest to have things work more slowly.

Here’s an example of printing with and without flush:

>>> f = open('myfile.txt', 'w')
>>> print('abc', file=f)
>>> print('def', file=f)
>>> print(open('myfile.txt').read())  # no flush, and thus empty file

>>> print('ghi', file=f, flush=True)  
>>> print(open('myfile.txt').read())  # all data has been flushed to disk
abc
def
ghi

You might have noticed a small inconsistency here: “print” writes to files, by default “sys.stdout”. And if we don’t flush or close the file, the output is buffered.  So, why don’t we have to flush (or close, not that this is a good idea) when we print to the screen?

The answer is that “sys.stdout” is treated specially by Python. As the Python docs say, it is “line buffered,” meaning that every time we send a newline character (‘\n’), the output is flushed.  So long as you are printing things to “sys.stdout” that end with a newline — and why wouldn’t you be doing that? — you won’t notice the buffering.

Remember Python 2?

As I write this, in January 2019, there are fewer than 12 months remaining before Python 2 is no longer supported or maintained. This doesn’t change the fact that many of my clients are still using Python 2 (because rewriting their large code base isn’t worthwhile or feasible).  If you’re still using Python 2, you should really be trying to move to Python 3.

And indeed, one of the things that strikes people moving from Python 2 to 3 would be the differences in “print”.

First and foremost, “print” in Python 2 is a statement, not an expression. This means that the parentheses in 2 are optional, while they’re mandatory in 3 — one of the first things that people learn when they move from 2 to 3.

This also means that “print” in Python 2 cannot be passed to other functions. In Python 3, you can.

Python 2’s “print” statement didn’t have the parameters (or defaults) that we have at our disposal.  You wanted to print to a file other than “sys.stdout”?  Assign it to “sys.stdout” to use “print” — or just write to the file with the “write” method for files.  You wanted “print” not to descend a line after printing?  Put a comma at the end of the line.  (Yes, really; this is ugly, but it works.)

What if you’re working in Python 2, and want to get a taste of Python 3’s print function?  You can add this line to your code:

from __future__ import print_function

Once you have done so, Python 3’s “print” function will be in place.

Now I know that Python 3 is no longer in the future; indeed, you could say that Python 2 is in the past. But for many people who want to transition or learn how to do it, this is a good method. But watch out: If you have calls to “print” without parentheses, or are commas to avoid descending a line, then you’ll need to do more than just this import.  You will need to go through your code, and make sure that it works in this way. So while that might seem like a wise way to to, it’s only the first step of a much larger transition from 2 to 3 that you’ll need to make.

Enjoyed this article?  Join more than 11,000 other developers who receive my free, weekly “Better developers” newsletter. Every Monday, you’ll get an article like this one about software development and Python:




 

The post Beyond the “hello, world” of Python’s “print” function appeared first on Lerner Consulting Blog.

January 18, 2019 12:03 AM UTC

January 17, 2019


PyCharm

Webinar Recording: “Live Development of a PyCharm Plugin” with Joachim Ansorg

This week we hosted a webinar with Joachim Ansorg presenting Live Development of a PyCharm Plugin. The webinar recording is now available, as well as a repo with the contents he showed in the webinar.

w

This webinar covered a tremendous amount of ground in just over an hour:

The chosen topic was interesting in its own right: a plugin to let you suppress flake8 warnings using the standard # noqa comment syntax.

Thanks go to Joachim for spending his time with us and showing how even a small IDE plugin can easily justify the investment by saving developer time. He writes IntelliJ plugins professionally, and based on the feedback from companies we’ve referred to him, he does so quite well.

-PyCharm Team-
The Drive to Develop

January 17, 2019 06:11 PM UTC


Julien Danjou

Serious Python released!

Serious Python released!

Today I'm glad to announce that my new book, Serious Python, has been released.

However, you wonder… what is Serious Python?

Well, Serious Python is the the new name of The Hacker's Guide to Python — the first book I published. Serious Python is the 4th update of that book — but with a brand a new name and a new editor!

Serious Python released!

For more than a year, I've been working with the editor No Starch Press to enhance this book and bring it to the next level! I'm very proud of what we achieved, and working with a whole team on this book has been a fantastic experience.

The content has been updated to be ready for 2019: pytest is now a de-facto standard for testing, so I had to write about it. On the other hand, Python 2 support was less a focus, and I removed many mentions of Python 2 altogether. Some chapters have been reorganized, regrouped and others got enhanced with new content!

The good news: you can get this new edition of the book with a 15% discount for the next 24 hours using the coupon code SERIOUSPYTHONLAUNCH on the book page.

The book is also released as part as No Starch collection. They also are in charge of distributing the paperback copy of the book. If you want a version of the book that you can touch and hold in your arms, look for it in No Starch shop, on Amazon or in your favorite book shop!

Serious Python released!No Starch version of Serious Python cover

January 17, 2019 04:56 PM UTC


Stack Abuse

Getting Started with MySQL and Python

Introduction

For any fully functional deployable application, the persistence of data is indispensable. A trivial way of storing data would be to write it to a file in the hard disk, but one would prefer writing the application specific data to a database for obvious reasons. Python provides language support for writing data to a wide range of databases.

Python DB API

At the heart of Python support for database programming is the Python DB API (PEP – 249) which does not depend on any specific database engine. Depending on the database we use at the persistence layer, an appropriate implementation of Python DB API should be imported and used in our program. In this tutorial, we will be demonstrating how to use Python to connect to MySQL database and do transactions with it. For this, we will be using the MySQLdb Python package.

Before we proceed with connecting to the database using Python, we need to install MySQL connector for Python. This can be done in two ways:

$ pip install mysql-connector-python

If there is a specific MySQL version installed in the local machine, then you may need a specific MySQL connector version so that no compatibility issues arise, which we can get using the following command:

$ pip install mysql-connector-python==<insert_version_number_here>

Finally, we need to install MySQL client module that will enable us to connect to MySQL databases from our Python application, which acts as the client:

$ pip install mysqlclient

Connecting to the Database

Once we have the connector installed in place, the import MySQLdb statement should not throw any error on executing the Python file.

Prerequisites

Note: It is assumed that the readers have basic understanding of databases in general and the MySQL database in specific, along with knowledge of structured query language (SQL). However, the basic process to create a database and a user has been explained in this section. Follow these steps:

CREATE DATABASE pythondb;  
USE pythondb;  
CREATE USER 'pythonuser'@'localhost' IDENTIFIED BY 'pythonpwd123'  
GRANT ALL PRIVILEGES ON pythondb.* To 'pythonuser'@'localhost'  
FLUSH PRIVILEGES  

Checking your Connection to pythondb

Here is a simple script that can be used to programmatically test connection to the newly created database:

#!/usr/bin/python

import MySQLdb

dbconnect = MySQLdb.connect("localhost", "pythonuser", "pythonpwd123", "pythondb")

cursor = dbconnect.cursor()  
cursor.execute("SELECT VERSION()")

data = cursor.fetchone()  
if data:  
  print('Version retrieved: ', data)
else:  
  print('Version not retrieved.')

dbconnect.close()  

Output

Version retrieved: 5.7.19  

The version number shown above is just a dummy number. It should match the installed MySQL server's version.

Let us take a closer look at the sample program above to learn how it works. First off, import MySQLdb is used to import the required python module.

MySQLdb.connect() method takes hostname, username, password, and database schema name to create a database connection. On successfully connecting with the database, it will return a connection object (which is referred to as dbconnect here).

Using the connection object, we can execute queries, commit transactions and rollback transactions before closing the connection.

Once we get the connection object, we need to get a MySQLCursor object in order to execute queries using execute method. The result set of the transaction can be retrieved using the fetchall, fetchone, or fetchmany methods, which will be discussed later in this tutorial.

There are three important methods related to database transactions apart from the execute method. We will learn briefly about these methods now.

The dbconnect.commit() method informs the database that the changes executed before calling this function shall be finalized and there is no scope for rolling back to the previous state if the transaction is successful.

Sometimes, if transaction failure occurs, we will need to change the database to the previous state before the failure happened so that the data is not lost or corrupted. In such a case, we will need to rollback the database to the previous state using dbconnect.rollback().

Finally, the dbconnect.close() method is used to close the connection to database. To perform further transactions, we need to create a new connection.

Create a New Table

Once the connection with pythondb is established successfully, we are ready to go to the next step. Let us create a new table in it:

import MySQLdb

dbconnect = MySQLdb.connect("localhost","pythonuser","pythonpwd123","pythondb" )

cursor = dbconnect.cursor()  
cursor.execute("DROP TABLE IF EXISTS MOVIE")

query = "CREATE TABLE MOVIE(  \  
          id int(11) NOT NULL,\
          name varchar(20),\
          year int(11),\
          director varchar(20),\
          genre varchar(20),\
          PRIMARY KEY (id))"

cursor.execute(query)

dbconnect.close()  

After executing the above script, you should be able to see a new table movie created for the schema pythondb. This can be viewed using MySQL WorkBench.

Performing CRUD Operations

Now we'll perform some insert, read, modify, and delete operations in the newly created database table via the Python script.

Creating a New Record

The following script demonstrates how to insert a new record into MySQL database using a Python script:

#!/usr/bin/python

import MySQLdb

dbconnect = MySQLdb.connect("localhost", "pythonuser", "pythonpwd123", "pythondb")

cursor = dbconnect.cursor()

query = 'insert into movie(id, name, year, director, genre)  \  
       values (1, "Bruce Almighty", 2003, "Tom Shaydac", "Comedy")'
try:  
   cursor.execute(query)
   dbconnect.commit()
except:  
   dbconnect.rollback()
finally:  
   dbconnect.close()

Reading Rows from a Table

Once a new row is inserted in the database, you can fetch the data in three ways using the cursor object:

For simplicity, we will use the "select all" SQL query and use a for loop over the result set of the fetchall method to print individual records.

#!/usr/bin/python

import MySQLdb

dbconnect = MySQLdb.connect("localhost", "pythonuser", "pythonpwd123", "pythondb")

cursor = dbconnect.cursor()

query = "SELECT * FROM movie"  
try:  
   cursor.execute(query)
   resultList = cursor.fetchall()
   for row in resultList:
      print ("Movie ID =", row[0])
      print ("Name =", row[1])
      print ("Year =", row[2])
      print ("Director = ", row[3])
      print ('Genre = ', row[4])
except:  
   print ("Encountered error while retrieving data from database")
finally:  
   dbconnect.close()

Output:

Movie ID = 1  
Name = Bruce Almighty  
Year = 2003  
Director = Tom Shaydac  
Genre = Comedy  

Updating a Row

Let us now update the Genre of "Bruce Almighty" from Comedy to Satire:

import MySQLdb

dbconnect = MySQLdb.connect("localhost", "pythonuser", "pythonpwd123", "pythondb")

# The cursor object obtained below allows SQL queries to be executed in the database session.
cursor = dbconnect.cursor()

updatequery = "update movie set genre = 'Satire' where id = 1"

cursor.execute(updatequery)

dbconnect.commit()

print(cursor.rowcount, "record(s) affected")  

Output:

1 record(s) affected  

Deleting a Record

Here is a Python script that demonstrates how to delete a database row:

import MySQLdb

dbconnect = MySQLdb.connect("localhost", "pythonuser", "pythonpwd123", "pythondb")

# The cursor object obtained below allows SQL queries to be executed in the database session.
cursor = dbconnect.cursor()

updatequery = "DELETE FROM movie WHERE id = 1"

cursor.execute(updatequery)

dbconnect.commit()

print(cursor.rowcount, "record(s) deleted")  

After executing the above script, you should be able to see the following output if everything goes well.

Output

1 record(s) deleted  

Conclusion

In this article, we learned how to use the Python DB API to connect to a database. Specifically, we saw how a connection can be established to a MySQL database using the MySQLdb implementation of Python DB API. We also learned how to perform transactions with the database.

January 17, 2019 02:24 PM UTC


Django Weblog

Django 2.2 alpha 1 released

Django 2.2 alpha 1 is now available. It represents the first stage in the 2.2 release cycle and is an opportunity for you to try out the changes coming in Django 2.2.

Django 2.2 has a salmagundi of new features which you can read about in the in-development 2.2 release notes.

This alpha milestone marks the feature freeze. The current release schedule calls for a beta release in about a month and a release candidate about a month from then. We'll only be able to keep this schedule if we get early and often testing from the community. Updates on the release schedule are available on the django-developers mailing list.

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the alpha package from our downloads page or on PyPI.

The PGP key ID used for this release is Carlton Gibson: E17DF5C82B4F9D00.

January 17, 2019 02:13 PM UTC


gamingdirectional

Create Panda 3D Game Project

Hello, do you still remember that I have mentioned to you before that I will start another game project alongside the new pygame project? Well, I have not decided yet which game framework should I use to build the python game. Yesterday I had just came across Panda 3D which is a very attractive game framework that we can use to create the python game. After playing with it for a while I just...

Source

January 17, 2019 12:53 PM UTC


Codementor

Introduction to Python

Introduction to python: A complete tutorial series for Beginner to Advanced.

January 17, 2019 10:53 AM UTC


codingdirectional

Convert video from one format to another with python

Hello, and well come back. In this article, we will input the last feature for our video editing application before we tidy up the entire application’s user interface and then upload this free application to the online software store for others to enjoy. As usual, we will use the FFmpeg tool together with a python program to convert a video from the mp4 format to the webm format. This program will first resize the video to the size which the user has selected and then start to convert the video to the new format then saves that new video to a new file. Below is the entire program.

from tkinter import *
from tkinter import filedialog
import os
import subprocess
import tkinter.ttk as tk

win = Tk() # Create instance
win.title("Manipulate Video") # Add a title
win.resizable(0, 0) # Disable resizing the GUI
win.configure(background='white') # change background color

#  Create a label
aLabel = Label(win, text="Select video size and video", anchor="center", padx=13, pady=10, relief=RAISED)
aLabel.grid(column=0, row=0, sticky=W+E)
aLabel.configure(foreground="black")
aLabel.configure(background="white")
aLabel.configure(wraplength=110)

# Create a combo box
vid_size = StringVar() # create a string variable
preferSize = tk.Combobox(win, textvariable=vid_size) 
preferSize['values'] = (1920, 1280, 854, 640) # video width in pixels
preferSize.grid(column=0, row=1) # the position of combo box
preferSize.current(0) # select item one 

# Open a video file
def openVideo():
        
        fullfilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Video file", "*.mp4; *.avi ")]) # select a video file from the hard drive
        #audiofilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Audio file", "*.mp4; *.ogg ")]) # select a new audio file from the hard drive
        if(fullfilename != ''): #and audiofilename != ''):
                scale_vid = preferSize.get() # retrieve value from the comno box
                new_size = str(scale_vid)
                dir_path = os.path.dirname(os.path.realpath(fullfilename))
                os.chdir(dir_path)
                f = new_size  + '.mp4' # the new output file name/format
                f2 = f + '.webm' # webm video
                #subprocess.call(['ffmpeg', '-stream_loop', '2', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize and loop the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', '-r', '24', f]) # resize and speed up the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', f, '-ss', '00:02:30', '-y', f2]) # create animated gif starting from 2 minutes and 30 seconds to the end
                subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', f, '-c', 'copy', '-y', '-an', f2]) # remove audio from the original video
                #subprocess.call(['ffmpeg', '-i', f2, '-i', audiofilename, '-shortest', '-c:v', 'copy', '-c:a', 'aac', '-b:a', '256k', '-y', f]) # add audio to the original video, trim either the audio or video depends on which one is longer
                #subprocess.call(['ffmpeg', '-i', f, '-vf', 'eq=contrast=1.3:brightness=-0.03:saturation=0.01', '-y', f2]) # adjust the saturation contrast and brightness of video
                subprocess.call(['ffmpeg', '-i', f, '-y', f2]) # converting the video with ffmpeg

                
action_vid = Button(win, text="Open Video", command=openVideo, padx=2)
action_vid.grid(column=0, row=2, sticky=E+W)
action_vid.configure(background='black')
action_vid.configure(foreground='white')

win.mainloop()

Finally, the last piece of code has been completed. We will continue to modify all those features in this application as well as to tidy up the user interface of this video editing application starting from the next chapter onward so stay tuned.

I always want to write more articles per day but due to lack of stamina I always end up with only one article per day. I am working on the energy and stamina part now and hopefully, I will be able to write a few more articles per day. Writing article needs concentration thus stamina training is a must if we need to write more. Tomorrow I am off to take care of the offline business, see you again the day after tomorrow.

January 17, 2019 07:38 AM UTC

January 16, 2019


NumFOCUS

Now Hiring: Events and Digital Marketing Coordinator

The post Now Hiring: Events and Digital Marketing Coordinator appeared first on NumFOCUS.

January 16, 2019 09:52 PM UTC


Codementor

Analyzing Robinhood trade history

A Python script to get a look at your trading history from trading options and individual equities on Robinhood: calculate profit/loss, sum dividend payouts and generate buy-and-hold comparison.

January 16, 2019 07:45 PM UTC


Real Python

Async IO in Python: A Complete Walkthrough

Async IO is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7, and probably beyond.

You may be thinking with dread, “Concurrency, parallelism, threading, multiprocessing. That’s a lot to grasp already. Where does async IO fit in?”

This tutorial is built to help you answer that question, giving you a firmer grasp of Python’s approach to async IO.

Here’s what you’ll cover:

Coroutines (specialized generator functions) are the heart of async IO in Python, and we’ll dive into them later on.

Note: In this article, I use the term async IO to denote the language-agnostic design of asynchronous IO, while asyncio refers to the Python package.

Before you get started, you’ll need to make sure you’re set up to use asyncio and other libraries found in this tutorial.

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Setting Up Your Environment

You’ll need Python 3.7 or above to follow this article in its entirety, as well as the aiohttp and aiofiles packages:

$ python3.7 -m venv ./py37async
$ source ./py37async/bin/activate  # Windows: .\py37async\Scripts\activate.bat
$ pip install --upgrade pip aiohttp aiofiles  # Optional: aiodns

For help with installing Python 3.7 and setting up a virtual environment, check out Python 3 Installation & Setup Guide or Virtual Environments Primer.

With that, let’s jump in.

The 10,000-Foot View of Async IO

Async IO is a bit lesser known than its tried-and-true cousins, multiprocessing and threading. This section will give you a fuller picture of what async IO is and how it fits into its surrounding landscape.

Where Does Async IO Fit In?

Concurrency and parallelism are expansive subjects that are not easy to wade into. While this article focuses on async IO and its implementation in Python, it’s worth taking a minute to compare async IO to its counterparts in order to have context about how async IO fits into the larger, sometimes dizzying puzzle.

Parallelism consists of performing multiple operations at the same time. Multiprocessing is a means to effect parallelism, and it entails spreading tasks over a computer’s central processing units (CPUs, or cores). Multiprocessing is well-suited for CPU-bound tasks: tightly bound for loops and mathematical computations usually fall into this category.

Concurrency is a slightly broader term than parallelism. It suggests that multiple tasks have the ability to run in an overlapping manner. (There’s a saying that concurrency does not imply parallelism.)

Threading is a concurrent execution model whereby multiple threads take turns executing tasks. One process can contain multiple threads. Python has a complicated relationship with threading thanks to its GIL, but that’s beyond the scope of this article.

What’s important to know about threading is that it’s better for IO-bound tasks. While a CPU-bound task is characterized by the computer’s cores continually working hard from start to finish, an IO-bound job is dominated by a lot of waiting on input/output to complete.

To recap the above, concurrency encompasses both multiprocessing (ideal for CPU-bound tasks) and threading (suited for IO-bound tasks). Multiprocessing is a form of parallelism, with parallelism being a specific type (subset) of concurrency. The Python standard library has offered longstanding support for both of these through its multiprocessing, threading, and concurrent.futures packages.

Now it’s time to bring a new member to the mix. Over the last few years, a separate design has been more comprehensively built into CPython: asynchronous IO, enabled through the standard library’s asyncio package and the new async and await language keywords. To be clear, async IO is not a newly invented concept, and it has existed or is being built into other languages and runtime environments, such as Go, C#, or Scala.

The asyncio package is billed by the Python documentation as a library to write concurrent code. However, async IO is not threading, nor is it multiprocessing. It is not built on top of either of these.

In fact, async IO is a single-threaded, single-process design: it uses cooperative multitasking, a term that you’ll flesh out by the end of this tutorial. It has been said in other words that async IO gives a feeling of concurrency despite using a single thread in a single process. Coroutines (a central feature of async IO) can be scheduled concurrently, but they are not inherently concurrent.

To reiterate, async IO is a style of concurrent programming, but it is not parallelism. It’s more closely aligned with threading than with multiprocessing but is very much distinct from both of these and is a standalone member in concurrency’s bag of tricks.

That leaves one more term. What does it mean for something to be asynchronous? This isn’t a rigorous definition, but for our purposes here, I can think of two properties:

Here’s a diagram to put it all together. The white terms represent concepts, and the green terms represent ways in which they are implemented or effected:

Concurrency versus parallelism

I’ll stop there on the comparisons between concurrent programming models. This tutorial is focused on the subcomponent that is async IO, how to use it, and the APIs that have sprung up around it. For a thorough exploration of threading versus multiprocessing versus async IO, pause here and check out Jim Anderson’s overview of concurrency in Python. Jim is way funnier than me and has sat in more meetings than me, to boot.

Async IO Explained

Async IO may at first seem counterintuitive and paradoxical. How does something that facilitates concurrent code use a single thread and a single CPU core? I’ve never been very good at conjuring up examples, so I’d like to paraphrase one from Miguel Grinberg’s 2017 PyCon talk, which explains everything quite beautifully:

Chess master Judit Polgár hosts a chess exhibition in which she plays multiple amateur players. She has two ways of conducting the exhibition: synchronously and asynchronously.

Assumptions:

  • 24 opponents
  • Judit makes each chess move in 5 seconds
  • Opponents each take 55 seconds to make a move
  • Games average 30 pair-moves (60 moves total)

Synchronous version: Judit plays one game at a time, never two at the same time, until the game is complete. Each game takes (55 + 5) * 30 == 1800 seconds, or 30 minutes. The entire exhibition takes 24 * 30 == 720 minutes, or 12 hours.

Asynchronous version: Judit moves from table to table, making one move at each table. She leaves the table and lets the opponent make their next move during the wait time. One move on all 24 games takes Judit 24 * 5 == 120 seconds, or 2 minutes. The entire exhibition is now cut down to 120 * 30 == 3600 seconds, or just 1 hour. (Source)

There is only one Judit Polgár, who has only two hands and makes only one move at a time by herself. But playing asynchronously cuts the exhibition time down from 12 hours to one. So, cooperative multitasking is a fancy way of saying that a program’s event loop (more on that later) communicates with multiple tasks to let each take turns running at the optimal time.

Async IO takes long waiting periods in which functions would otherwise be blocking and allows other functions to run during that downtime. (A function that blocks effectively forbids others from running from the time that it starts until the time that it returns.)

Async IO Is Not Easy

I’ve heard it said, “Use async IO when you can; use threading when you must.” The truth is that building durable multithreaded code can be hard and error-prone. Async IO avoids some of the potential speedbumps that you might otherwise encounter with a threaded design.

But that’s not to say that async IO in Python is easy. Be warned: when you venture a bit below the surface level, async programming can be difficult too! Python’s async model is built around concepts such as callbacks, events, transports, protocols, and futures—just the terminology can be intimidating. The fact that its API has been changing continually makes it no easier.

Luckily, asyncio has matured to a point where most of its features are no longer provisional, while its documentation has received a huge overhaul and some quality resources on the subject are starting to emerge as well.

The asyncio Package and async/await

Now that you have some background on async IO as a design, let’s explore Python’s implementation. Python’s asyncio package (introduced in Python 3.4) and its two keywords, async and await, serve different purposes but come together to help you declare, build, execute, and manage asynchronous code.

The async/await Syntax and Native Coroutines

A Word of Caution: Be careful what you read out there on the Internet. Python’s async IO API has evolved rapidly from Python 3.4 to Python 3.7. Some old patterns are no longer used, and some things that were at first disallowed are now allowed through new introductions. For all I know, this tutorial will join the club of the outdated soon too.

At the heart of async IO are coroutines. A coroutine is a specialized version of a Python generator function. Let’s start with a baseline definition and then build off of it as you progress here: a coroutine is a function that can suspend its execution before reaching return, and it can indirectly pass control to another coroutine for some time.

Later, you’ll dive a lot deeper into how exactly the traditional generator is repurposed into a coroutine. For now, the easiest way to pick up how coroutines work is to start making some.

Let’s take the immersive approach and write some async IO code. This short program is the Hello World of async IO but goes a long way towards illustrating its core functionality:

#!/usr/bin/env python3
# countasync.py

import asyncio

async def count():
    print("One")
    await asyncio.sleep(1)
    print("Two")

async def main():
    await asyncio.gather(count(), count(), count())

if __name__ == "__main__":
    import time
    s = time.perf_counter()
    asyncio.run(main())
    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")

When you execute this file, take note of what looks different than if you were to define the functions with just def and time.sleep():

$ python3 countasync.py
One
One
One
Two
Two
Two
countasync.py executed in 1.01 seconds.

The order of this output is the heart of async IO. Talking to each of the calls to count() is a single event loop, or coordinator. When each task reaches await asyncio.sleep(1), the function yells up to the event loop and gives control back to it, saying, “I’m going to be sleeping for 1 second. Go ahead and let something else meaningful be done in the meantime.”

Contrast this to the synchronous version:

#!/usr/bin/env python3
# countsync.py

import time

def count():
    print("One")
    time.sleep(1)
    print("Two")

def main():
    for _ in range(3):
        count()

if __name__ == "__main__":
    s = time.perf_counter()
    main()
    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")

When executed, there is a slight but critical change in order and execution time:

$ python3 countsync.py
One
Two
One
Two
One
Two
countsync.py executed in 3.01 seconds.

While using time.sleep() and asyncio.sleep() may seem banal, they are used as stand-ins for any time-intensive processes that involve wait time. (The most mundane thing you can wait on is a sleep() call that does basically nothing.) That is, time.sleep() can represent any time-consuming blocking function call, while asyncio.sleep() is used to stand in for a non-blocking call (but one that also takes some time to complete).

As you’ll see in the next section, the benefit of awaiting something, including asyncio.sleep(), is that the surrounding function can temporarily cede control to another function that’s more readily able to do something immediately. In contrast, time.sleep() or any other blocking call is incompatible with asynchronous Python code, because it will stop everything in its tracks for the duration of the sleep time.

The Rules of Async IO

At this point, a more formal definition of async, await, and the coroutine functions that they create are in order. This section is a little dense, but getting a hold of async/await is instrumental, so come back to this if you need to:

In code, that second bullet point looks roughly like this:

async def g():
    # Pause here and come back to g() when f() is ready
    r = await f()
    return r

There’s also a strict set of rules around when and how you can and cannot use async/await. These can be handy whether you are still picking up the syntax or already have exposure to using async/await:

Here are some terse examples meant to summarize the above few rules:

async def f(x):
    y = await z(x)  # OK - `await` and `return` allowed in coroutines
    return y

async def g(x):
    yield x  # OK - this is an async generator

async def m(x):
    yield from gen(x)  # No - SyntaxError

def m(x):
    y = await z(x)  # Still no - SyntaxError (no `async def` here)
    return y

Finally, when you use await f(), it’s required that f() be an object that is awaitable. Well, that’s not very helpful, is it? For now, just know that an awaitable object is either (1) another coroutine or (2) an object defining an .__await__() dunder method that returns an iterator. If you’re writing a program, for the large majority of purposes, you should only need to worry about case #1.

That brings us to one more technical distinction that you may see pop up: an older way of marking a function as a coroutine is to decorate a normal def function with @asyncio.coroutine. The result is a generator-based coroutine. This construction has been outdated since the async/await syntax was put in place in Python 3.5.

These two coroutines are essentially equivalent (both are awaitable), but the first is generator-based, while the second is a native coroutine:

import asyncio

@asyncio.coroutine
def py34_coro():
    """Generator-based coroutine, older syntax"""
    yield from stuff()

async def py35_coro():
    """Native coroutine, modern syntax"""
    await stuff()

If you’re writing any code yourself, prefer native coroutines for the sake of being explicit rather than implicit. Generator-based coroutines will be removed in Python 3.10.

Towards the latter half of this tutorial, we’ll touch on generator-based coroutines for explanation’s sake only. The reason that async/await were introduced is to make coroutines a standalone feature of Python that can be easily differentiated from a normal generator function, thus reducing ambiguity.

Don’t get bogged down in generator-based coroutines, which have been deliberately outdated by async/await. They have their own small set of rules (for instance, await cannot be used in a generator-based coroutine) that are largely irrelevant if you stick to the async/await syntax.

Without further ado, let’s take on a few more involved examples.

Here’s one example of how async IO cuts down on wait time: given a coroutine makerandom() that keeps producing random integers in the range [0, 10], until one of them exceeds a threshold, you want to let multiple calls of this coroutine not need to wait for each other to complete in succession. You can largely follow the patterns from the two scripts above, with slight changes:

#!/usr/bin/env python3
# rand.py

import asyncio
import random

# ANSI colors
c = (
    "\033[0m",   # End of color
    "\033[36m",  # Cyan
    "\033[91m",  # Red
    "\033[35m",  # Magenta
)

async def randint(a: int, b: int) -> int:
    return random.randint(a, b)

async def makerandom(idx: int, threshold: int = 6) -> int:
    print(c[idx + 1] + f"Initiated makerandom({idx}).")
    i = await randint(0, 10)
    while i <= threshold:
        print(c[idx + 1] + f"makerandom({idx}) == {i} too low; retrying.")
        await asyncio.sleep(idx + 1)
        i = await randint(0, 10)
    print(c[idx + 1] + f"---> Finished: makerandom({idx}) == {i}" + c[0])
    return i

async def main():
    res = await asyncio.gather(*(makerandom(i, 10 - i - 1) for i in range(3)))
    return res

if __name__ == "__main__":
    random.seed(444)
    r1, r2, r3 = asyncio.run(main())
    print()
    print(f"r1: {r1}, r2: {r2}, r3: {r3}")

The colorized output says a lot more than I can and gives you a sense for how this script is carried out:

rand.py program executionrand.py execution

This program uses one main coroutine, makerandom(), and runs it concurrently across 3 different inputs. Most programs will contain small, modular coroutines and one wrapper function that serves to chain each of the smaller coroutines together. main() is then used to gather tasks (futures) by mapping the central coroutine across some iterable or pool.

In this miniature example, the pool is range(3). In a fuller example presented later, it is a set of URLs that need to be requested, parsed, and processed concurrently, and main() encapsulates that entire routine for each URL.

While “making random integers” (which is CPU-bound more than anything) is maybe not the greatest choice as a candidate for asyncio, it’s the presence of asyncio.sleep() in the example that is designed to mimic an IO-bound process where there is uncertain wait time involved. For example, the asyncio.sleep() call might represent sending and receiving not-so-random integers between two clients in a message application.

Async IO Design Patterns

Async IO comes with its own set of possible script designs, which you’ll get introduced to in this section.

Chaining Coroutines

A key feature of coroutines is that they can be chained together. (Remember, a coroutine object is awaitable, so another coroutine can await it.) This allows you to break programs into smaller, manageable, recyclable coroutine:

#!/usr/bin/env python3
# chained.py

import asyncio
import random
import time

async def randint(a: int, b: int) -> int:
    return random.randint(a, b)

async def part1(n: int) -> str:
    i = await randint(0, 10)
    print(f"part1({n}) sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-1"
    print(f"Returning part1({n}) == {result}.")
    return result

async def part2(n: int, arg: str) -> str:
    i = await randint(0, 10)
    print(f"part2{n, arg} sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-2 derived from {arg}"
    print(f"Returning part2{n, arg} == {result}.")
    return result

async def chain(n: int) -> None:
    start = time.perf_counter()
    p1 = await part1(n)
    p2 = await part2(n, p1)
    end = time.perf_counter() - start
    print(f"-->Chained result{n} => {p2} (took {end:0.2f} seconds).")

async def main(*args):
    await asyncio.gather(*(chain(n) for n in args))

if __name__ == "__main__":
    import sys
    random.seed(444)
    args = [1, 2, 3] if len(sys.argv) == 1 else map(int, sys.argv[1:])
    start = time.perf_counter()
    asyncio.run(main(*args))
    end = time.perf_counter() - start
    print(f"Program finished in {end:0.2f} seconds.")

Pay careful attention to the output, where part1() sleeps for a variable amount of time, and part2() begins working with the results as they become available:

$ python3 chained.py 9 6 3
part1(9) sleeping for 4 seconds.
part1(6) sleeping for 4 seconds.
part1(3) sleeping for 0 seconds.
Returning part1(3) == result3-1.
part2(3, 'result3-1') sleeping for 4 seconds.
Returning part1(9) == result9-1.
part2(9, 'result9-1') sleeping for 7 seconds.
Returning part1(6) == result6-1.
part2(6, 'result6-1') sleeping for 4 seconds.
Returning part2(3, 'result3-1') == result3-2 derived from result3-1.
-->Chained result3 => result3-2 derived from result3-1 (took 4.00 seconds).
Returning part2(6, 'result6-1') == result6-2 derived from result6-1.
-->Chained result6 => result6-2 derived from result6-1 (took 8.01 seconds).
Returning part2(9, 'result9-1') == result9-2 derived from result9-1.
-->Chained result9 => result9-2 derived from result9-1 (took 11.01 seconds).
Program finished in 11.01 seconds.

In this setup, the runtime of main() will be equal to the maximum runtime of the tasks that it gathers together and schedules.

Using a Queue

The asyncio package provides queue classes that are designed to be similar to classes of the queue module. In our examples so far, we haven’t really had a need for a queue structure. In chained.py, each task (future) is composed of a set of coroutines that explicitly await each other and pass through a single input per chain.

There is an alternative structure that can also work with async IO: a number of producers, which are not associated with each other, add items to a queue. Each producer may add multiple items to the queue at staggered, random, unannounced times. A group of consumers pull items from the queue as they show up, greedily and without waiting for any other signal.

In this design, there is no chaining of any individual consumer to a producer. The consumers don’t know the number of producers, or even the cumulative number of items that will be added to the queue, in advance.

It takes an individual producer or consumer a variable amount of time to put and extract items from the queue, respectively. The queue serves as a throughput that can communicate with the producers and consumers without them talking to each other directly.

Note: While queues are often used in threaded programs because of the thread-safety of queue.Queue(), you shouldn’t need to concern yourself with thread safety when it comes to async IO. (The exception is when you’re combining the two, but that isn’t done in this tutorial.)

One use-case for queues (as is the case here) is for the queue to act as a transmitter for producers and consumers that aren’t otherwise directly chained or associated with each other.

The synchronous version of this program would look pretty dismal: a group of blocking producers serially add items to the queue, one producer at a time. Only after all producers are done can the queue be processed, by one consumer at a time processing item-by-item. There is a ton of latency in this design. Items may sit idly in the queue rather than be picked up and processed immediately.

An asynchronous version, asyncq.py, is below. The challenging part of this workflow is that there needs to be a signal to the consumers that production is done. Otherwise, await q.get() will hang indefinitely, because the queue will have been fully processed, but consumers won’t have any idea that production is complete.

(Big thanks for some help from a StackOverflow user for helping to straighten out main(): the key is to await q.join(), which blocks until all items in the queue have been received and processed, and then to cancel the consumer tasks, which would otherwise hang up and wait endlessly for additional queue items to appear.)

Here is the full script:

#!/usr/bin/env python3
# asyncq.py

import asyncio
import itertools as it
import os
import random
import time

async def makeitem(size: int = 5) -> str:
    return os.urandom(size).hex()

async def randint(a: int, b: int) -> int:
    return random.randint(a, b)

async def randsleep(a: int = 1, b: int = 5, caller=None) -> None:
    i = await randint(a, b)
    if caller:
        print(f"{caller} sleeping for {i} seconds.")
    await asyncio.sleep(i)

async def produce(name: int, q: asyncio.Queue) -> None:
    n = await randint(1, 5)
    for _ in it.repeat(None, n):  # Synchronous loop for each single producer
        await randsleep(caller=f"Producer {name}")
        i = await makeitem()
        t = time.perf_counter()
        await q.put((i, t))
        print(f"Producer {name} added <{i}> to queue.")

async def consume(name: int, q: asyncio.Queue) -> None:
    while True:
        await randsleep(caller=f"Consumer {name}")
        i, t = await q.get()
        now = time.perf_counter()
        print(f"Consumer {name} got element <{i}>"
              f" in {now-t:0.5f} seconds.")
        q.task_done()

async def main(nprod: int, ncon: int):
    q = asyncio.Queue()
    producers = [asyncio.create_task(produce(n, q)) for n in range(nprod)]
    consumers = [asyncio.create_task(consume(n, q)) for n in range(ncon)]
    await asyncio.gather(*producers)
    await q.join()  # Implicitly awaits consumers, too
    for c in consumers:
        c.cancel()

if __name__ == "__main__":
    import argparse
    random.seed(444)
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--nprod", type=int, default=5)
    parser.add_argument("-c", "--ncon", type=int, default=10)
    ns = parser.parse_args()
    start = time.perf_counter()
    asyncio.run(main(**ns.__dict__))
    elapsed = time.perf_counter() - start
    print(f"Program completed in {elapsed:0.5f} seconds.")

The first few coroutines are helper functions that return a random string, a fractional-second performance counter, and a random integer. A producer puts anywhere from 1 to 5 items into the queue. Each item is a tuple of (i, t) where i is a random string and t is the time at which the producer attempts to put the tuple into the queue.

When a consumer pulls an item out, it simply calculates the elapsed time that the item sat in the queue using the timestamp that the item was put in with.

Keep in mind that asyncio.sleep() is used to mimic some other, more complex coroutine that would eat up time and block all other execution if it were a regular blocking function.

Here is a test run with two producers and five consumers:

$ python3 asyncq.py -p 2 -c 5
Producer 0 sleeping for 3 seconds.
Producer 1 sleeping for 3 seconds.
Consumer 0 sleeping for 4 seconds.
Consumer 1 sleeping for 3 seconds.
Consumer 2 sleeping for 3 seconds.
Consumer 3 sleeping for 5 seconds.
Consumer 4 sleeping for 4 seconds.
Producer 0 added <377b1e8f82> to queue.
Producer 0 sleeping for 5 seconds.
Producer 1 added <413b8802f8> to queue.
Consumer 1 got element <377b1e8f82> in 0.00013 seconds.
Consumer 1 sleeping for 3 seconds.
Consumer 2 got element <413b8802f8> in 0.00009 seconds.
Consumer 2 sleeping for 4 seconds.
Producer 0 added <06c055b3ab> to queue.
Producer 0 sleeping for 1 seconds.
Consumer 0 got element <06c055b3ab> in 0.00021 seconds.
Consumer 0 sleeping for 4 seconds.
Producer 0 added <17a8613276> to queue.
Consumer 4 got element <17a8613276> in 0.00022 seconds.
Consumer 4 sleeping for 5 seconds.
Program completed in 9.00954 seconds.

In this case, the items process in fractions of a second. A delay can be due to two reasons:

With regards to the second reason, luckily, it is perfectly normal to scale to hundreds or thousands of consumers. You should have no problem with python3 asyncq.py -p 5 -c 100. The point here is that, theoretically, you could have different users on different systems controlling the management of producers and consumers, with the queue serving as the central throughput.

So far, you’ve been thrown right into the fire and seen three related examples of asyncio calling coroutines defined with async and await. If you’re not completely following or just want to get deeper into the mechanics of how modern coroutines came to be in Python, you’ll start from square one with the next section.

Async IO’s Roots in Generators

Earlier, you saw an example of the old-style generator-based coroutines, which have been outdated by more explicit native coroutines. The example is worth re-showing with a small tweak:

import asyncio

@asyncio.coroutine
def py34_coro():
    """Generator-based coroutine"""
    # No need to build these yourself, but be aware of what they are
    s = yield from stuff()
    return s

async def py35_coro():
    """Native coroutine, modern syntax"""
    s = await stuff()
    return s

async def stuff():
    return 0x10, 0x20, 0x30

As an experiment, what happens if you call py34_coro() or py35_coro() on its own, without await, or without any calls to asyncio.run() or other asyncio “porcelain” functions? Calling a coroutine in isolation returns a coroutine object:

>>>
>>> py35_coro()
<coroutine object py35_coro at 0x10126dcc8>

This isn’t very interesting on its surface. The result of calling a coroutine on its own is an awaitable coroutine object.

Time for a quiz: what other feature of Python looks like this? (What feature of Python doesn’t actually “do much” when it’s called on its own?)

Hopefully you’re thinking of generators as an answer to this question, because coroutines are enhanced generators under the hood. The behavior is similar in this regard:

>>>
>>> def gen():
...     yield 0x10, 0x20, 0x30
...
>>> g = gen()
>>> g  # Nothing much happens - need to iterate with `.__next__()`
<generator object gen at 0x1012705e8>
>>> next(g)
(16, 32, 48)

Generator functions are, as it so happens, the foundation of async IO (regardless of whether you declare coroutines with async def rather than the older @asyncio.coroutine wrapper). Technically, await is more closely analogous to yield from than it is to yield. (But remember that yield from x() is just syntactic sugar to replace for i in x(): yield i.)

One critical feature of generators as it pertains to async IO is that they can effectively be stopped and restarted at will. For example, you can break out of iterating over a generator object and then resume iteration on the remaining values later. When a generator function reaches yield, it yields that value, but then it sits idle until it is told to yield its subsequent value.

This can be fleshed out through an example:

>>>
>>> from itertools import cycle
>>> def endless():
...     """Yields 9, 8, 7, 6, 9, 8, 7, 6, ... forever"""
...     yield from cycle((9, 8, 7, 6))

>>> e = endless()
>>> total = 0
>>> for i in e:
...     if total < 30:
...         print(i, end=" ")
...         total += i
...     else:
...         print()
...         # Pause execution. We can resume later.
...         break
9 8 7 6 9 8 7 6 9 8 7 6 9 8

>>> # Resume
>>> next(e), next(e), next(e)
(6, 9, 8)

The await keyword behaves similarly, marking a break point at which the coroutine suspends itself and lets other coroutines work. “Suspended,” in this case, means a coroutine that has temporarily ceded control but not totally exited or finished. Keep in mind that yield, and by extension yield from and await, mark a break point in a generator’s execution.

This is the fundamental difference between functions and generators. A function is all-or-nothing. Once it starts, it won’t stop until it hits a return, then pushes that value to the caller (the function that calls it). A generator, on the other hand, pauses each time it hits a yield and goes no further. Not only can it push this value to calling stack, but it can keep a hold of its local variables when you resume it by calling next() on it.

There’s a second and lesser-known feature of generators that also matters. You can send a value into a generator as well through its .send() method. This allows generators (and coroutines) to call (await) each other without blocking. I won’t get any further into the nuts and bolts of this feature, because it matters mainly for the implementation of coroutines behind the scenes, but you shouldn’t ever really need to use it directly yourself.

If you’re interested in exploring more, you can start at PEP 342, where coroutines were formally introduced. Brett Cannon’s How the Heck Does Async-Await Work in Python is also a good read, as is the PYMOTW writeup on asyncio. Lastly, there’s David Beazley’s Curious Course on Coroutines and Concurrency, which dives deep into the mechanism by which coroutines run.

Let’s try to condense all of the above articles into a few sentences: there is a particularly unconventional mechanism by which these coroutines actually get run. Their result is an attribute of the exception object that gets thrown when their .send() method is called. There’s some more wonky detail to all of this, but it probably won’t help you use this part of the language in practice, so let’s move on for now.

To tie things together, here are some key points on the topic of coroutines as generators:

Other Features: async for and Async Generators + Comprehensions

Along with plain async/await, Python also enables async for to iterate over an asynchronous iterator. The purpose of an asynchronous iterator is for it to be able to call asynchronous code at each stage when it is iterated over.

A natural extension of this concept is an asynchronous generator. Recall that you can use await, return, or yield in a native coroutine. Using yield within a coroutine became possible in Python 3.6 (via PEP 525), which introduced asynchronous generators with the purpose of allowing await and yield to be used in the same coroutine function body:

>>>
>>> async def mygen(u: int = 10):
...     """Yield powers of 2."""
...     i = 0
...     while i < u:
...         yield 2 ** i
...         i += 1
...         await asyncio.sleep(0.1)

Last but not least, Python enables asynchronous comprehension with async for. Like its synchronous cousin, this is largely syntactic sugar:

>>>
>>> async def main():
...     # This does *not* introduce concurrent execution
...     # It is meant to show syntax only
...     g = [i async for i in mygen()]
...     f = [j async for j in mygen() if not (j // 3 % 5)]
...     return g, f
...
>>> g, f = asyncio.run(main())
>>> g
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
>>> f
[1, 2, 16, 32, 256, 512]

This is a crucial distinction: neither asynchronous generators nor comprehensions make the iteration concurrent. All that they do is provide the look-and-feel of their synchronous counterparts, but with the ability for the loop in question to give up control to the event loop for some other coroutine to run.

In other words, asynchronous iterators and asynchronous generators are not designed to concurrently map some function over a sequence or iterator. They’re merely designed to let the enclosing coroutine allow other tasks to take their turn. The async for and async with statements are only needed to the extent that using plain for or with would “break” the nature of await in the coroutine. This distinction between asynchronicity and concurrency is a key one to grasp.

The Event Loop and asyncio.run()

You can think of an event loop as something like a while True loop that monitors coroutines, taking feedback on what’s idle, and looking around for things that can be executed in the meantime. It is able to wake up an idle coroutine when whatever that coroutine is waiting on becomes available.

Thus far, the entire management of the event loop has been implicitly handled by one function call:

asyncio.run(main())  # Python 3.7+

asyncio.run(), introduced in Python 3.7, is responsible for getting the event loop, running tasks until they are marked as complete, and then closing the event loop.

There’s a more long-winded way of managing the asyncio event loop, with get_event_loop(). The typical pattern looks like this:

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

You’ll probably see loop.get_event_loop() floating around in older examples, but unless you have a specific need to fine-tune control over the event loop management, asyncio.run() should be sufficient for most programs.

If you do need to interact with the event loop within a Python program, loop is a good-old-fashioned Python object that supports introspection with loop.is_running() and loop.is_closed(). You can manipulate it if you need to get more fine-tuned control, such as in scheduling a callback by passing the loop as an argument.

What is more crucial is understanding a bit beneath the surface about the mechanics of the event loop. Here are a few points worth stressing about the event loop.

#1: Coroutines don’t do much on their own until they are tied to the event loop.

You saw this point before in the explanation on generators, but it’s worth restating. If you have a main coroutine that awaits others, simply calling it in isolation has little effect:

>>>
>>> import asyncio

>>> async def main():
...     print("Hello ...")
...     await asyncio.sleep(1)
...     print("World!")

>>> routine = main()
>>> routine
<coroutine object main at 0x1027a6150>

Remember to use asyncio.run() to actually force execution by scheduling the main() coroutine (future object) for execution on the event loop:

>>>
>>> asyncio.run(routine)
Hello ...
World!

(Other coroutines can be executed with await. It is typical to wrap just main() in asyncio.run(), and chained coroutines with await will be called from there.)

#2: By default, an async IO event loop runs in a single thread and on a single CPU core. Usually, running one single-threaded event loop in one CPU core is more than sufficient. It is also possible to run event loops across multiple cores. Check out this talk by John Reese for more, and be warned that your laptop may spontaneously combust.

#3. Event loops are pluggable. That is, you could, if you really wanted, write your own event loop implementation and have it run tasks just the same. This is wonderfully demonstrated in the uvloop package, which is an implementation of the event loop in Cython.

That is what is meant by the term “pluggable event loop”: you can use any working implementation of an event loop, unrelated to the structure of the coroutines themselves. The asyncio package itself ships with two different event loop implementations, with the default being based on the selectors module. (The second implementation is built for Windows only.)

A Full Program: Asynchronous Requests

You’ve made it this far, and now it’s time for the fun and painless part. In this section, you’ll build a web-scraping URL collector, areq.py, using aiohttp, a blazingly fast async HTTP client/server framework. (We just need the client part.) Such a tool could be used to map connections between a cluster of sites, with the links forming a directed graph.

Note: You may be wondering why Python’s requests package isn’t compatible with async IO. requests is built on top of urrlib3, which in turn uses Python’s http and socket modules.

By default, socket operations are blocking. This means that Python won’t like await requests.get(url) because .get() is not awaitable. In contrast, almost everything in aiohttp is an awaitable coroutine, such as session.request() and response.text(). It’s a great package otherwise, but you’re doing yourself a disservice by using requests in asynchronous code.

The high-level program structure will look like this:

  1. Read a sequence of URLs from a local file, urls.txt.

  2. Send GET requests for the URLs and decode the resulting content. If this fails, stop there for a URL.

  3. Search for the URLs within href tags in the HTML of the responses.

  4. Write the results to foundurls.txt.

  5. Do all of the above as asynchronously and concurrently as possible. (Use aiohttp for the requests, and aiofiles for the file-appends. These are two primary examples of IO that are well-suited for the async IO model.)

Here are the contents of urls.txt. It’s not huge, and contains mostly highly trafficked sites:

$ cat urls.txt
https://regex101.com/
https://docs.python.org/3/this-url-will-404.html
https://www.nytimes.com/guides/
https://www.mediamatters.org/
https://1.1.1.1/
https://www.politico.com/tipsheets/morning-money
https://www.bloomberg.com/markets/economics
https://www.ietf.org/rfc/rfc2616.txt

The second URL in the list should return a 404 response, which you’ll need to handle gracefully. If you’re running an expanded version of this program, you’ll probably need to deal with much hairier problems than this, such a server disconnections and endless redirects.

The requests themselves should be made using a single session, to take advantage of reusage of the session’s internal connection pool.

Let’s take a look at the full program. We’ll walk through things step-by-step after:

#!/usr/bin/env python3
# areq.py

"""Asynchronously get links embedded in multiple pages' HMTL."""

import asyncio
import logging
import re
import sys
from typing import IO
import urllib.error
import urllib.parse

import aiofiles
import aiohttp
from aiohttp import ClientSession

logging.basicConfig(
    format="%(asctime)s %(levelname)s:%(name)s: %(message)s",
    level=logging.DEBUG,
    datefmt="%H:%M:%S",
    stream=sys.stderr,
)
logger = logging.getLogger("areq")
logging.getLogger("chardet.charsetprober").disabled = True

HREF_RE = re.compile(r'href="(.*?)"')

async def fetch_html(url: str, session: ClientSession, **kwargs) -> str:
    """GET request wrapper to fetch page HTML.

    kwargs are passed to `session.request()`.
    """

    resp = await session.request(method="GET", url=url, **kwargs)
    resp.raise_for_status()
    logger.info("Got response [%s] for URL: %s", resp.status, url)
    html = await resp.text()
    return html

async def parse(url: str, session: ClientSession, **kwargs) -> set:
    """Find HREFs in the HTML of `url`."""
    found = set()
    try:
        html = await fetch_html(url=url, session=session, **kwargs)
    except (
        aiohttp.ClientError,
        aiohttp.http_exceptions.HttpProcessingError,
    ) as e:
        logger.error(
            "aiohttp exception for %s [%s]: %s",
            url,
            getattr(e, "status", None),
            getattr(e, "message", None),
        )
        return found
    except Exception as e:
        logger.exception(
            "Non-aiohttp exception occured:  %s", getattr(e, "__dict__", {})
        )
        return found
    else:
        for link in HREF_RE.findall(html):
            try:
                abslink = urllib.parse.urljoin(url, link)
            except (urllib.error.URLError, ValueError):
                logger.exception("Error parsing URL: %s", link)
                pass
            else:
                found.add(abslink)
        logger.info("Found %d links for %s", len(found), url)
        return found

async def write_one(file: IO, url: str, **kwargs) -> None:
    """Write the found HREFs from `url` to `file`."""
    res = await parse(url=url, **kwargs)
    if not res:
        return None
    async with aiofiles.open(file, "a") as f:
        for p in res:
            await f.write(f"{url}\t{p}\n")
        logger.info("Wrote results for source URL: %s", url)

async def bulk_crawl_and_write(file: IO, urls: set, **kwargs) -> None:
    """Crawl & write concurrently to `file` for multiple `urls`."""
    async with ClientSession() as session:
        tasks = []
        for url in urls:
            tasks.append(
                write_one(file=file, url=url, session=session, **kwargs)
            )
        await asyncio.gather(*tasks)

if __name__ == "__main__":
    import pathlib
    import sys

    assert sys.version_info >= (3, 7), "Script requires Python 3.7+."
    here = pathlib.Path(__file__).parent

    with open(here.joinpath("urls.txt")) as infile:
        urls = set(map(str.strip, infile))

    outpath = here.joinpath("foundurls.txt")
    with open(outpath, "w") as outfile:
        outfile.write("source_url\tparsed_url\n")

    asyncio.run(bulk_crawl_and_write(file=outpath, urls=urls))

This script is longer than our initial toy programs, so let’s break it down.

The constant HREF_RE is a regular expression to extract what we’re ultimately searching for, href tags within HTML:

>>>
>>> HREF_RE.search('Go to <a href="https://realpython.com/">Real Python</a>')
<re.Match object; span=(15, 45), match='href="https://realpython.com/"'>

The coroutine fetch_html() is a wrapper around a GET request to make the request and decode the resulting page HTML. It makes the request, awaits the response, and raises right away in the case of a non-200 status:

resp = await session.request(method="GET", url=url, **kwargs)
resp.raise_for_status()

If the status is okay, fetch_html() returns the page HTML (a str). Notably, there is no exception handling done in this function. The logic is to propagate that exception to the caller and let it be handled there:

html = await resp.text()

We await session.request() and resp.text() because they’re awaitable coroutines. The request/response cycle would otherwise be the long-tailed, time-hogging portion of the application, but with async IO, fetch_html() lets the event loop work on other readily available jobs such as parsing and writing URLs that have already been fetched.

Next in the chain of coroutines comes parse(), which waits on fetch_html() for a given URL, and then extracts all of the href tags from that page’s HTML, making sure that each is valid and formatting it as an absolute path.

Admittedly, the second portion of parse() is blocking, but it consists of a quick regex match and ensuring that the links discovered are made into absolute paths.

In this specific case, this synchronous code should be quick and inconspicuous. But just remember that any line within a given coroutine will block other coroutines unless that line uses yield, await, or return. If the parsing was a more intensive process, you might want to consider running this portion in its own process with loop.run_in_executor().

Next, the coroutine write() takes a file object and a single URL, and waits on parse() to return a set of the parsed URLs, writing each to the file asynchronously along with its source URL through use of aiofiles, a package for async file IO.

Lastly, bulk_crawl_and_write() serves as the main entry point into the script’s chain of coroutines. It uses a single session, and a task is created for each URL that is ultimately read from urls.txt.

Here are a few additional points that deserve mention:

If you’d like to explore a bit more, the companion files for this tutorial up at GitHub have comments and docstrings attached as well.

Here’s the execution in all of its glory, as areq.py gets, parses, and saves results for 9 URLs in under a second:

$ python3 areq.py
21:33:22 DEBUG:asyncio: Using selector: KqueueSelector
21:33:22 INFO:areq: Got response [200] for URL: https://www.mediamatters.org/
21:33:22 INFO:areq: Found 115 links for https://www.mediamatters.org/
21:33:22 INFO:areq: Got response [200] for URL: https://www.nytimes.com/guides/
21:33:22 INFO:areq: Got response [200] for URL: https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Got response [200] for URL: https://www.ietf.org/rfc/rfc2616.txt
21:33:22 ERROR:areq: aiohttp exception for https://docs.python.org/3/this-url-will-404.html [404]: Not Found
21:33:22 INFO:areq: Found 120 links for https://www.nytimes.com/guides/
21:33:22 INFO:areq: Found 143 links for https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Wrote results for source URL: https://www.mediamatters.org/
21:33:22 INFO:areq: Found 0 links for https://www.ietf.org/rfc/rfc2616.txt
21:33:22 INFO:areq: Got response [200] for URL: https://1.1.1.1/
21:33:22 INFO:areq: Wrote results for source URL: https://www.nytimes.com/guides/
21:33:22 INFO:areq: Wrote results for source URL: https://www.politico.com/tipsheets/morning-money
21:33:22 INFO:areq: Got response [200] for URL: https://www.bloomberg.com/markets/economics
21:33:22 INFO:areq: Found 3 links for https://www.bloomberg.com/markets/economics
21:33:22 INFO:areq: Wrote results for source URL: https://www.bloomberg.com/markets/economics
21:33:23 INFO:areq: Found 36 links for https://1.1.1.1/
21:33:23 INFO:areq: Got response [200] for URL: https://regex101.com/
21:33:23 INFO:areq: Found 23 links for https://regex101.com/
21:33:23 INFO:areq: Wrote results for source URL: https://regex101.com/
21:33:23 INFO:areq: Wrote results for source URL: https://1.1.1.1/

That’s not too shabby! As a sanity check, you can check the line-count on the output. In my case, it’s 626, though keep in mind this may fluctuate:

$ wc -l foundurls.txt
     626 foundurls.txt

$ head -n 3 foundurls.txt
source_url  parsed_url
https://www.bloomberg.com/markets/economics https://www.bloomberg.com/feedback
https://www.bloomberg.com/markets/economics https://www.bloomberg.com/notices/tos

Next Steps: If you’d like to up the ante, make this webcrawler recursive. You can use aio-redis to keep track of which URLs have been crawled within the tree to avoid requesting them twice, and connect links with Python’s networkx library.

Remember to be nice. Sending 1000 concurrent requests to a small, unsuspecting website is bad, bad, bad. There are ways to limit how many concurrent requests you’re making in one batch, such as in using the sempahore objects of asyncio or using a pattern like this one. If you don’t heed this warning, you may get a massive batch of TimeoutError exceptions and only end up hurting your own program.

Async IO in Context

Now that you’ve seen a healthy dose of code, let’s step back for a minute and consider when async IO is an ideal option and how you can make the comparison to arrive at that conclusion or otherwise choose a different model of concurrency.

When and Why Is Async IO the Right Choice?

This tutorial is no place for an extended treatise on async IO versus threading versus multiprocessing. However, it’s useful to have an idea of when async IO is probably the best candidate of the three.

The battle over async IO versus multiprocessing is not really a battle at all. In fact, they can be used in concert. If you have multiple, fairly uniform CPU-bound tasks (a great example is a grid search in libraries such as scikit-learn or keras), multiprocessing should be an obvious choice.

Simply putting async before every function is a bad idea if all of the functions use blocking calls. (This can actually slow down your code.) But as mentioned previously, there are places where async IO and multiprocessing can live in harmony.

The contest between async IO and threading is a little bit more direct. I mentioned in the introduction that “threading is hard.” The full story is that, even in cases where threading seems easy to implement, it can still lead to infamous impossible-to-trace bugs due to race conditions and memory usage, among other things.

Threading also tends to scale less elegantly than async IO, because threads are a system resource with a finite availability. Creating thousands of threads will fail on many machines, and I don’t recommend trying it in the first place. Creating thousands of async IO tasks is completely feasible.

Async IO shines when you have multiple IO-bound tasks where the tasks would otherwise be dominated by blocking IO-bound wait time, such as:

The biggest reason not to use it is that await only supports a specific set of objects that define a specific set of methods. If you want to do async read operations with a certain DBMS, you’ll need to find not just a Python wrapper for that DBMS, but one that supports the async/await syntax. Coroutines that contain synchronous calls block other coroutines and tasks from running.

For a shortlist of libraries that work with async/await, see the list at the end of this tutorial.

Async IO It Is, but Which One?

This tutorial focuses on async IO, the async/await syntax, and using asyncio for event-loop management and specifying tasks. asyncio certainly isn’t the only async IO library out there. This observation from Nathaniel J. Smith says a lot:

[In] a few years, asyncio might find itself relegated to becoming one of those stdlib libraries that savvy developers avoid, like urllib2.

What I’m arguing, in effect, is that asyncio is a victim of its own success: when it was designed, it used the best approach possible; but since then, work inspired by asyncio – like the addition of async/await – has shifted the landscape so that we can do even better, and now asyncio is hamstrung by its earlier commitments. (Source)

To that end, a few big-name alternatives that do what asyncio does, albeit with different APIs and different approaches, are curio and trio. Personally, I think that if you’re building a moderately sized, straightforward program, just using asyncio is plenty sufficient and understandable, and lets you avoid adding yet another large dependency outside of Python’s standard library.

But by all means, check out curio and trio, and you might find that they get the same thing done in a way that’s more intuitive for you as the user. Many of the package-agnostic concepts presented here should permeate to alternative async IO packages as well.

Odds and Ends

In these next few sections, you’ll cover some miscellaneous parts of asyncio and async/await that haven’t fit neatly into the tutorial thus far, but are still important for building and understanding a full program.

Other Top-Level asyncio Functions

In addition to asyncio.run(), you’ve seen a few other package-level functions such as asyncio.create_task() and asyncio.gather().

You can use create_task() to schedule the execution of a coroutine object, followed by asyncio.run():

>>>
>>> import asyncio

>>> async def coro(seq) -> list:
...     """'IO' wait time is proportional to the max element."""
...     await asyncio.sleep(max(seq))
...     return list(reversed(seq))
...
>>> async def main():
...     # This is a bit redundant in the case of one task
...     # We could use `await coro([3, 2, 1])` on its own
...     t = asyncio.create_task(coro([3, 2, 1]))  # Python 3.7+
...     await t
...     print(f't: type {type(t)}')
...     print(f't done: {t.done()}')
...
>>> t = asyncio.run(main())
t: type <class '_asyncio.Task'>
t done: True

There’s a subtlety to this pattern: if you don’t await t within main(), it may finish before main() itself signals that it is complete. Because asyncio.run(main()) calls loop.run_until_complete(main()), the event loop is only concerned (without await t present) that main() is done, not that the tasks that get created within main() are done. Without await t, the loop’s other tasks will be cancelled, possibly before they are completed. If you need to get a list of currently pending tasks, you can use asyncio.Task.all_tasks().

Note: asyncio.create_task() was introduced in Python 3.7. In Python 3.6 or lower, use asyncio.ensure_future() in place of create_task().

Separately, there’s asyncio.gather(). While it doesn’t do anything tremendously special, gather() is meant to neatly put a collection of coroutines (futures) into a single future. As a result, it returns a single future object, and, if you await asyncio.gather() and specify multiple tasks or coroutines, you’re waiting for all of them to be completed. (This somewhat parallels queue.join() from our earlier example.) The result of gather() will be a list of the results across the inputs:

>>>
>>> import time
>>> async def main():
...     t = asyncio.create_task(coro([3, 2, 1]))
...     t2 = asyncio.create_task(coro([10, 5, 0]))  # Python 3.7+
...     print('Start:', time.strftime('%X'))
...     a = await asyncio.gather(t, t2)
...     print('End:', time.strftime('%X'))  # Should be 10 seconds
...     print(f'Both tasks done: {all((t.done(), t2.done()))}')
...     return a
...
>>> a = asyncio.run(main())
Start: 16:20:11
End: 16:20:21
Both tasks done: True
>>> a
[[1, 2, 3], [0, 5, 10]]

You probably noticed that gather() waits on the entire result set of the Futures or coroutines that you pass it. Alternatively, you can loop over asyncio.as_completed() to get tasks as they are completed, in the order of completion. The function returns an iterator that yields tasks as they finish. Below, the result of coro([3, 2, 1]) will be available before coro([10, 5, 0]) is complete, which is not the case with gather():

>>>
>>> async def main():
...     t = asyncio.create_task(coro([3, 2, 1]))
...     t2 = asyncio.create_task(coro([10, 5, 0]))
...     print('Start:', time.strftime('%X'))
...     for res in asyncio.as_completed((t, t2)):
...         compl = await res
...         print(f'res: {compl} completed at {time.strftime("%X")}')
...     print('End:', time.strftime('%X'))
...     print(f'Both tasks done: {all((t.done(), t2.done()))}')
...
>>> a = asyncio.run(main())
Start: 09:49:07
res: [1, 2, 3] completed at 09:49:10
res: [0, 5, 10] completed at 09:49:17
End: 09:49:17
Both tasks done: True

Lastly, you may also see asyncio.ensure_future(). You should rarely need it, because it’s a lower-level plumbing API and largely replaced by create_task(), which was introduced later.

The Precedence of await

While they behave somewhat similarly, the await keyword has significantly higher precedence than yield. This means that, because it is more tightly bound, there are a number of instances where you’d need parentheses in a yield from statement that are not required in an analogous await statement. For more information, see examples of await expressions from PEP 492.

Conclusion

You’re now equipped to use async/await and the libraries built off of it. Here’s a recap of what you’ve covered:

Resources

Python Version Specifics

Async IO in Python has evolved swiftly, and it can be hard to keep track of what came when. Here’s a list of Python minor-version changes and introductions related to asyncio:

If you want to be safe (and be able to use asyncio.run()), go with Python 3.7 or above to get the full set of features.

Articles

Here’s a curated list of additional resources:

A few Python What’s New sections explain the motivation behind language changes in more detail:

From David Beazley:

YouTube talks:

Libraries That Work With async/await

From aio-libs:

From magicstack:

From other hosts:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 16, 2019 02:00 PM UTC


Robin Wilson

Bid for a day’s work from me

Summary: I will do a day’s work for the highest bidder in this auction. This could mean you get a day’s work from me very cheaply. Please read all of this post carefully, and then submit your bid here before 5th Feb.

This experiment is based very heavily on David MacIver’s experiment in auctioning off a day’s work (see his blog posts introducing it, and summarising the results). It seemed to work fairly well for him, and I am interested to see how it will work for me.

So, if you win this auction, I will do one day (8 hours) of work for you, on a project of your choosing. If you’ve been following this blog then you’ll have a reasonable idea of what sort of things I can do – but to jog your memory, here are some ideas: These are just a few ideas of things I could do – I am happy to do most things, although I will let you know if I think that I do not have the required expertise to do what you are requesting.

Rules

  1. The bid is only for me to work for 8 hours, so I strongly suggest either a short self-contained project, or something that can be stopped at any point and still be useful. If you want me to continue working past 8 hours then I would be happy to negotiate some further work – but this would be entirely outside of the bidding process.
  2. The 8 hours work will likely be split over multiple days: due to my health I find working for 8 hours straight to be very difficult, so I will probably do the work in two or three chunks. I am happy to do the work entirely independently, or to work in close collaboration with you.
  3. If I produce something tangible as part of this work (eg. some code, some documentation) then I will give you the rights to do whatever you wish with these (the only exception being work on my open-source projects, for which I will require you to agree to release the work under the same open-source license as the rest of the project).
  4. Following David’s lead, the auction will be a Vickrey Auction, where all bids are secret, and the highest bidder wins but pays the second highest bidder’s bid. This means that the mathematically best amount to bid is exactly the amount you are willing to pay for my time.
  5. If there is only one bidder, then you will get a day of my work and pay nothing for it.
  6. If there is a tie for top place then I will pick the work I most want to do, and charge the highest bid.
  7. The auction closes at 23:59 UTC on the 5th February 2019. Bids submitted after that time will be invalid.
  8. The day of work must be claimed by the end of March 2019. I will contact the winner to arrange dates and times. I will send an invoice after the work is completed, and this must be paid within 30 days.
  9. If your company wants to bid then I am happy to invoice them after the work is complete and, within reason, jump through the necessary hoops to get the invoice paid.
  10. If you wish me to work in-person then I will invoice you for travel costs on top of the bid payment. Work can only be carried out in a wheelchair accessible building, and in general I would prefer remote work.
  11. If you ask me to do something illegal, unethical, or just something that I firmly do not want to do, then I will delete your bid. If you would have been one of the top bidders then I will inform you of this.
  12. After the auction is over, and the work has been completed, I will post on this blog a summary of the bids received, the winning bid and so on.
To go ahead and submit your bid, please fill in the form here.

January 16, 2019 10:29 AM UTC


Codementor

How to build an API for a machine learning model in 5 minutes using Flask

This article explains how to create an API for your machine learning model in Python with flask.

January 16, 2019 07:49 AM UTC


Wingware News

Wing Python IDE 6.1.4: January 16, 2019

This minor release fixes using typing.IO and similar classes as type hints, improves handling of editor splits in goto-definition, fixes failure to install the remote agent, and fixes failure to convert EOLs in the editor.

January 16, 2019 01:00 AM UTC

January 15, 2019


Will McGugan

PyFilesystem is greater than or equal to Pathlib

I was reading a post by Trey Hunner on why pathlib is great, where he makes the case that pathlib is a better choice than the standard library alternatives that preceded it. I wouldn't actually disagree with a word of it. He's entirely correct. You should probably be using pathlib were it fits.

Personally, however, I rarely use pathlib, because I find that for the most part, PyFilesystem is a better choice. I'd like to take some of the code examples from Trey's post and re-write them using PyFilesystem, just so we can compare.

Create a folder, move a file

The first example from Trey's post, creates a folder then moves a file into it. Here it is:

from pathlib import Path

Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
Path('.editorconfig').rename('src/.editorconfig')

The code above is straightforward, and hides the gory platform details which is a major benefit of pathlib over os.path.

The PyFilesystem version also does this, and the code is remarkably similar:

from fs import open_fs

with open_fs('.') as cwd:
    cwd.makedirs('src/__pypackages', recreate=True)
    cwd.move('.editorconfig', 'src/.editorconfig')

The two lines that do the work are somewhat similar -- you can probably figure them out without looking at the docs. The first line of non-import code may need some explanation. In PyFilesystem the abstraction is not a path but a directory. So open_fs('.') returns a FS object for the current working directory. It's this object which contains methods for making directories and moving files etc.

Create a directory if it doesn't already exist, write a blank file

This next example from Trey's post, creates a directory then creates an empty file if it doesn't already exist:

from pathlib import Path


def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filepath."""
    path = Path(dir_path, '.editorconfig')
    if not path.exists():
        path.parent.mkdir(exist_ok=True, parent=True)
        path.touch()
    return path

This function is tricky to compare, as it does things you might not consider doing in a project with PyFilesystem, but if I was to translate it literally, it would be something like the following:

def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filename."""
    with open_fs(dir_path, create=True) as fs:
        fs.touch(".editorconfig")
    return fs.getsyspath(".editorconfig")

The reason that you wouldn't write this code with PyFilesystem, is that you rarely need to pass around paths. You typically pass around FS objects which represent a subdirectory. It's perhaps not the best example to demonstrate this, but the PyFilesystem code would likely be more like the following:

def make_editorconfig(directory_fs):
    directory_fs.create(".editorconfig")

with open_fs("foo", create=True) as directory_fs:
    make_editorconfig(directory_fs)

Rather than a str or a Path object, the function excepts an FS object. An advantage of this that file / directory operations are sandboxed under that directory. Unlike the Pathlib version, which has access to the entire filesystem. For a trivial example, this won't matter. But if you have more complex code, it can prevent you from unintentionally deleting or overwriting files if there is a bug.

Counting files by extension

Next up, we have a short script which counts the Python files in a subdirectory using pathlib:

from pathlib import Path


extension = '.py'
count = 0
for filename in Path.cwd().rglob(f'*{extension}'):
    count += 1
print(f"{count} Python files found")

Nice and simple. PyFilesystem has glob functionality (although no rglob yet). The code looks quite similar:

from fs import open_fs

extension = '.py'

with open_fs('.') as fs:
    count = fs.glob(f"**/*{extension}").count().files
print(f"{count} Python files found")

There's no for loop in the code above, because there is built in file counting functionality, but otherwise it is much the same.

I think Trey was using this example to compare performance. I haven't actually compared performance of PyFilesystem's globbing versus os.path or pathlib. That could be the subject for another post.

Write a file to the terminal if it exists

The next example is a simple one for both pathlib and PyFilesystem. Here's the pathlib version:

from pathlib import Path
import sys


directory = Path(sys.argv[1])
ignore_path = directory / '.gitignore'
if ignore_path.is_file():
    print(ignore_path.read_text(), end='')

And here's the PyFIlesystem equivelent:

import sys
from fs import open_fs


with open_fs(sys.argv[1]) as fs:
    if fs.isfile(".gitignore"):
        print(fs.readtext('.gitignore'), end='')

Note that there's no equivalent of directory / '.gitignore'. You don't need to join paths in PyFilesystem as often, but when you do, you don't need to worry about platform details. All paths in PyFilesystem are a sort of idealized path with a common format.

Finding duplicates

Trey offered a fully working script to find duplicates in a subdirectory with and without pathlib. Coincidentally I'd recently added a similar example to PyFilesystem.

Here is Trey's pathlib version:

from collections import defaultdict
from hashlib import md5
from pathlib import Path


def find_files(filepath):
    for path in Path(filepath).rglob('*'):
        if path.is_file():
            yield path


file_hashes = defaultdict(list)
for path in find_files(Path.cwd()):
    file_hash = md5(path.read_bytes()).hexdigest()
    file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

And here we have equivalent functionality with PyFilesystem:

from collections import defaultdict
from hashlib import md5
from fs import open_fs

file_hashes = defaultdict(list)
with open_fs('.') as fs:
    for path in fs.walk.files():
        file_hash = md5(fs.readbytes(path)).hexdigest()
        file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

The PyFilesystem version compares quite favourable here (in terms of lines of code at least). Mostly because there was already an iterator of paths method built in.

Conclusion

First off, I would like to emphasise that I'm not suggesting you never use pathlib. It is better than the alternatives in the standard library. Pathlib also has the advantage that it is actually in the standard library, whereas PyFilesystem is a pip install fs away.

I would say that I think PyFilesystem results in cleaner code for the most part, which could just be down to the fact that I've been working with PyFilesystem for a lot longer and it 'fits my brain' better. I'll let you be the judge. Also note that as the primary author of PyFilesystem, there is obviously a bucket-load of bias here.

There is one area where I think PyFilesystem is a clear winner. The PyFilesystem code above would work virtually unaltered with files in an archive, in memory, on a ftp server, S3 etc. or any of the supported filesystems.

I'd like to apologise to Trey Hunner if I misrepresented anything he said in his post!

January 15, 2019 10:51 PM UTC


Mike Driscoll

Python 101: Episode #42 – Creating Executables with cx_Freeze

In this screencast, we will learn how to turn your Python code into a Windows executable file using the cx_Freeze project.

You can also read the chapter this video is based on here or get the book on Leanpub

Python 101 coverPython 101

January 15, 2019 09:24 PM UTC


PyCoder’s Weekly

Issue #351 (Jan. 15, 2019)

#351 – JANUARY 15, 2019
View in Browser »

The PyCoder’s Weekly Logo


Speed Up Your Python Program With Concurrency

How and when to use concurrency in Python. You’ll see a simple, non-concurrent approach and then look into why and when you’d want to use threading, asyncio, or multiprocessing.
REAL PYTHON

What I’ve Learned About Optimizing Python

A nice overview of common Python performance pitfalls and how to avoid them. Mostly aimed at CPython 2.7 but much of it applies to Python 3 as well. (While you read the article, keep in mind that “premature optimization is the root of all evil.”)
GREGORY SZORC

Develop Your Python GUI Applications With Qt

alt

Learn how to create fluid user interfaces and data visualizations with Qt for Python in this free hands-on webinar. Also included are free examples of ready-made 2D/3D data visualizations, controls, charts, and more to get you started. Code along!
QT COMPANY sponsor

Parsing Semi-Formatted Text in Python via TextFSM

TextFSM is a module for parsing text using a simple template language. Orkhan’s article offers a nice introduction and I found it more useful than TextFSM’s official website.
ORKHAN HASANLI

Dockerizing Python Applications

See how to write a simple Python web application using Flask and get it ready for “dockerizing”, followed by creating a Docker Image, and deploying it both to a test and production environment. Good step-by-step tutorial.
STACKABUSE.COM

Improve Your Code With Atomic Functions

You may have encountered the terms atomic function and the Don’t Repeat Yourself (DRY) principle. Sean’s article demonstrates how these concepts work together to make your Python code easier to test and more readable.
SEAN TULLIS

Enforcing Consistent Formatting for JSON Files With Python

Nice little “hack” for ensuring that JSON files checked into code repositories follow a common formatting style. Helps keep your git diffs nice and clean.
MOSHE ZADKA

A* Pathfinding Algorithm Explained With Python

The animations visualizing every step of the algorithm are amazing in this article. This was originally published in 2014 but continuously updated since then.
REDBLOBGAMES.COM

Discussions

Ruby Creator on Running an Open-Source Project

“You feel better. No cost. But it doesn’t help the community. It’s how Python lost Guido (I think)” and “Please be constructive. Don’t ruin our lives.”
TWITTER.COM/YUKIHIRO_MATZ

What Does a Basic Python Portfolio Look Like?

REDDIT

What Python Habits Do You Wish You Unlearned Earlier?

REDDIT

My Parents Kept Asking for Photos of My Daughter, So I Automated It

And that my friend, is the true power of Python ;-)
REDDIT

Python Jobs

Sr Enterprise Python Developer (Toronto, Canada)

Kognitiv

Senior Software Engineer (Santa Monica, CA)

GoodRX

Python Software Engineer(s) (Palo Alto, CA)

Rhythm Diagnostic Systems, Inc

Senior Python Developer (Vienna, Austria)

Adverity GmbH

More Python Jobs >>>

Articles & Tutorials

What Should Be in the Python Standard Library?

“Python has always touted itself as a ‘batteries included’ language; its standard library contains lots of useful modules, often more than enough to solve many types of problems quickly. From time to time, though, some have started to rethink that philosophy […]”
JAKE EDGE

Django Migrations: A Primer

Get comfortable with Django migrations and learn how to create database tables without writing any SQL, how to automatically modify your database after you changed your models, and how to revert changes made to your database.
REAL PYTHON

Python Basics: A Practical Introduction to Python 3

alt

Make the leap from Beginner to Intermediate in Python with this complete curriculum freshly updated for Python 3.7. Includes exercises, interactive quizzes, and sample projects so you’ll always know what to focus on next. The first half of this book is a quick yet thorough overview of all the Python fundamentals. The second half focuses on solving interesting, real-world problems in a practical manner using Python. Last call for the Early Access discount (47% off) →
REAL PYTHON book sponsor

Machine Learning in Python

Are you struggling to get your start in machine learning using Python? In this step-by-step tutorial you’ll learn how to perform basic machine learning using Python, from scratch.
ADRIAN ROSEBROCK

Automated Machine Learning in Python

Can machine learning processes be automated? This article provides an overview of Python-based AutoML approaches and libraries.
DERRICK MWITTI

Filesystem Magic with Python

How to achieve various filesystem related tasks, such as finding duplicates or deleting files, using the PyFilesystem library. The cool thing is that all examples work with local files, archives, and network filesystems like S3 or Google Cloud Storage.
WILL MCGUGAN

Probabilistic Programming in Python

Learn how to use PyMC3 to define and solve probabilistic models.
OSVALDO MARTIN

Kenneth Reitz’s Code Style™

A short post with Kenneth’s thoughts on PEP 8.
KENNETHREITZ.ORG • Shared by Ricky White

Projects & Code

json.tool: CLI to Validate and Pretty-Print JSON Objects

I had no idea this was part of the Python standard library. Live and learn!
PYTHON.ORG

Treepace: Tree Transformation Language

Treepace (“tree pattern replace”) is a Python library offering a concise, embeddable language for searching and replacing parts of tree structures.
MATÚŠ SULÍR

grumpy: Python 2.7 → Go Source Code Transcompiler and Runtime

Compiles Python source code to Go source code which is then compiled to native code, rather than to bytecode.
GITHUB.COM/GRUMPYHOME

Developing Privacy-Aware Applications With Python and I2P

Invisible Internet Project (I2P) provides a framework for developing privacy-aware applications. It is a virtual network working on top of the regular Internet, in which hosts can exchange data without disclosing their “real” IP addresses.
GETI2P.NET

TextFSM: Python Module for Parsing Semi-Structured Text

The engine takes two inputs—a template file, and text input (such as command responses from the CLI of a device) and returns a list of records that contains the data parsed from the text. This is pretty cool, check out this tutorial to see some examples.
GITHUB.COM/GOOGLE

PyFilesystem: Filesystem Abstraction for Python

Work with files and directories in archives, memory, the cloud, and so on as easily as your local drive.
PYFILESYSTEM.ORG

Events

DjangoCon Europe 2019 (Last Call for Speakers)

The call for proposals only remains open till January 20, so be sure to get your talks submitted in time. DjangoCon Europe takes place in Copenhagen, Denmark from April 10–14th.
DJANGOPROJECT.COM

PyData Bristol Meetup

January 17, 2019
MEETUP.COM

Python Northwest

January 17, 2019
PYNW.ORG.UK

PyLadies Dublin

January 17, 2019
PYLADIES.COM

Karlsruhe Python User Group (KaPy)

January 18, 2019
BL0RG.NET

BangPypers

January 19, 2019
MEETUP.COM

PyDelhi User Group Meetup

January 19, 2019
MEETUP.COM


Happy Pythoning!
This was PyCoder’s Weekly Issue #351.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

January 15, 2019 08:30 PM UTC


Trey Hunner

No really, pathlib is great

I recently published an article about Python’s pathlib module and how I think everyone should be using it.

I won some pathlib converts, but some folks also brought up concerns. Some folks noted that I seemed to be comparing pathlib to os.path in a disingenuous way. Some people were also concerned that pathlib will take a very long time to be widely adopted because os.path is so entrenched in the Python community. And there were also concerns expressed about performance.

In this article I’d like to acknowledge and address these concerns. This will be both a defense of pathlib and a sort of love letter to PEP 519.

Comparing pathlib and os.path the right way

In my last article I compared this code which uses os and os.path:

1
2
3
4
5
import os
import os.path

os.makedirs(os.path.join('src', '__pypackages__'), exist_ok=True)
os.rename('.editorconfig', os.path.join('src', '.editorconfig'))

To this code with uses pathlib.Path:

1
2
3
4
from pathlib import Path

Path('src/__pypackages__').mkdir(parents=True, exist_ok=True)
Path('.editorconfig').rename('src/.editorconfig')

This might seem like an unfair comparison because I used os.path.join in the first example to ensure the correct path separator is used on all platforms but I didn’t do that in the second example. But this is in fact a fair comparison because the Path class normalizes path separators automatically.

We can prove this by looking at the string representation of this Path object on Windows:

1
2
>>> str(Path('src/__pypackages__'))
'src\\__pypackages__'

No matter whether we use the joinpath method, a / in a path string, the / operator (which is a neat feature of Path objects), or separate arguments to the Path constructor, we get the same representation in all cases:

1
2
3
4
5
6
7
8
>>> Path('src', '.editorconfig')
WindowsPath('src/.editorconfig')
>>> Path('src') / '.editorconfig'
WindowsPath('src/.editorconfig')
>>> Path('src').joinpath('.editorconfig')
WindowsPath('src/.editorconfig')
>>> Path('src/.editorconfig')
WindowsPath('src/.editorconfig')

That last expression caused some confusion from folks who assumed pathlib wouldn’t be smart enough to convert that / into a \ in the path string. Fortunately, it is!

With Path objects, you never have to worry about backslashes vs forward slashes again: specify all paths using forward slashes and you’ll get what you’d expect on all platforms.

Normalizing file paths shouldn’t be your concern

If you’re developing on Linux or Mac, it’s very easy to add bugs to your code that only affect Windows users. Unless you’re careful to use os.path.join to build your paths up or os.path.normcase to convert forward slashes to backslashes as appropriate, you may be writing code that breaks on Windows.

This is a Windows bug waiting to happen (we’ll get mixed backslashes and forward slashes here):

1
2
3
4
import sys
import os.path
directory = '.' if not sys.argv[1:] else sys.argv[1]
new_file = os.path.join(directory, 'new_package/__init__.py')

This just works on all systems:

1
2
3
4
import sys
from pathlib import Path
directory = '.' if not sys.argv[1:] else sys.argv[1]
new_file = Path(directory, 'new_package/__init__.py')

It used to be the responsibility of you the Python programmer to carefully join and normalize your paths, just as it used to be your responsibility in Python 2 land to use unicode whenever it was more appropriate than bytes. This is the case no more. The pathlib.Path class is careful to fix path separator issues before they even occur.

I don’t use Windows. I don’t own a Windows machine. But a ton of the developers who use my code likely use Windows and I don’t want my code to break on their machines.

If there’s a chance that your Python code will ever run on a Windows machine, you really need pathlib.

Don’t stress about path normalization: just use pathlib.Path whenever you need to represent a file path.

pathlib seems great, but I depend on code that doesn’t use it!

You have lots of code that works with path strings. Why would you switch to using pathlib when it means you’d need to rewrite all this code?

Let’s say you have a function like this:

1
2
3
4
5
6
7
8
9
10
import os
import os.path

def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filename."""
    filename = os.path.join(dir_path, '.editorconfig')
    if not os.path.exists(filename):
        os.makedirs(dir_path, exist_ok=True)
        open(filename, mode='wt').write('')
    return filename

This function accepts a directory to create a .editorconfig file in, like this:

1
2
3
>>> import os.path
>>> make_editorconfig(os.path.join('src', 'my_package'))
'src/my_package/.editorconfig'

But our code also works with a Path object:

1
2
3
>>> from pathlib import Path
>>> make_editorconfig(Path('src/my_package'))
'src/my_package/.editorconfig'

But… how??

Well os.path.join accepts Path objects (as of Python 3.6). And os.makedirs accepts Path objects too.

In fact the built-in open function accepts Path objects and shutil does and anything in the standard library that previously accepted a Path object is now expected to work with both Path objects and path strings.

This is all thanks to PEP 519, which called for an os.PathLike abstract base class and declared that Python utilities that work with file paths should now accept either path strings or path-like objects.

But my favorite third-party library X has a better Path object!

You might already be using a third-party library that has a Path object which works differently than pathlib’s Path objects. Maybe you even like it better.

For example django-environ, path.py, plumbum, and visidata all have their own custom Path objects that represent file paths. Some of these pathlib alternatives predate pathlib and chose to inherit from str so they could be passed to functions that expected path strings. Thanks to PEP 519 both pathlib and its third-party alternatives can play nicely without needing to resort to the hack of inheriting from str.

Let’s say you don’t like pathlib because Path objects are immutable and you very much prefer using mutable Path objects. Well thanks to PEP 519, you can create your own even-better-because-it-is-mutable Path and also has a __fspath__. You don’t need to use pathlib to benefit from it.

Any homegrown Path object you make or find in a third party library now has the ability to work natively with the Python built-ins and standard library modules that expect Path objects. Even if you don’t like pathlib, its existence a big win for third-party Path objects as well.

But Path objects and path strings don’t mix, do they?

You might be thinking: this is really wonderful, but won’t this sometimes-a-string and sometimes-a-path-object situation add confusion to my code?

The answer is yes, somewhat. But I’ve found that it’s pretty easy to work around.

PEP 519 added a couple other things along with path-like objects: one is a way to convert all path-like objects to path strings and the other is a way to convert all path-like objects to Path objects.

Given either a path string or a Path object (or anything with a __fspath__ method):

1
2
3
4
from pathlib import Path
import os.path
p1 = os.path.join('src', 'my_package')
p2 = Path('src/my_package')

The os.fspath function will now normalize both of these types of paths to strings:

1
2
3
>>> from os import fspath
>>> fspath(p1), fspath(p2)
('src/my_package', 'src/my_package')

And the Path class will now accept both of these types of paths and convert them to Path objects:

1
2
>>> Path(p1), Path(p2)
(PosixPath('src/my_package'), PosixPath('src/my_package'))

That means you could convert the output of the make_editorconfig function back into a Path object if you wanted to:

1
2
3
>>> from pathlib import Path
>>> Path(make_editorconfig(Path('src/my_package')))
PosixPath('src/my_package/.editorconfig')

Though of course a better long-term approach would be to rewrite the make_editorconfig function to use pathlib instead.

pathlib is too slow

I’ve heard this concern come up a few times: pathlib is just too slow.

It’s true that pathlib can be slow. Creating thousands of Path objects can make a noticeable impact on your code.

I decided to test the performance difference between pathlib and the alternative on my own machine using two different programs that both look for all .py files below the current directory.

Here’s the os.walk version:

1
2
3
4
5
6
7
8
9
10
from os import getcwd, walk


extension = '.py'
count = 0
for root, directories, filenames in walk(getcwd()):
    for filename in filenames:
        if filename.endswith(extension):
            count += 1
print(f"{count} Python files found")

Here’s the Path.rglob version:

1
2
3
4
5
6
7
8
from pathlib import Path


extension = '.py'
count = 0
for filename in Path.cwd().rglob(f'*{extension}'):
    count += 1
print(f"{count} Python files found")

Testing runtimes for programs that rely on filesystem accesses is tricky because runtimes vary greatly, so I reran each script 10 times and compared the best runtime of each.

Both scripts found 97,507 Python files in the directory I ran them in. The first one finished in 1.914 seconds (best out of 10 runs). The second one finished in 3.430 seconds (best out of 10 runs).

When I set extension = '' these find about 600,000 files and the differences spread a little further apart. The first runs in 1.888 seconds and the second in 7.485 seconds.

So the pathlib version of this program ran twice as slow for .py files and four times as slow for every file in my home directory. The pathlib code was indeed slower, much slower percentage-wise.

But in my case, this speed difference doesn’t matter much. I searched for every file in my home directory and lost 6 seconds to the slower version of my code. If I needed to scale this code to search 10 million files, I’d probably want to rewrite it. But that’s a problem I can get to if I experience it.

If you have a tight loop that could use some optimizing and pathlib.Path is one of the bottlenecks that’s slowing that loop down, abandon pathlib in that part of your code. But don’t optimize parts of your code that aren’t bottlenecks: it’s a waste of time and often results in less readable code for little gain.

Improving readability with pathlib

I’d like to wrap up these thoughts by ending with some pathlib refactorings. I’ve taken a couple small examples of code that work with files and refactored these examples to use pathlib instead. I’ll mostly leave these code blocks without comment and let you be the judge of which versions you like best.

Here’s the make_editorconfig function we saw earlier:

1
2
3
4
5
6
7
8
9
10
11
import os
import os.path


def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filename."""
    filename = os.path.join(dir_path, '.editorconfig')
    if not os.path.exists(filename):
        os.makedirs(dir_path, exist_ok=True)
        open(filename, mode='wt').write('')
    return filename

And here’s the same function using pathlib.Path instead:

1
2
3
4
5
6
7
8
9
10
from pathlib import Path


def make_editorconfig(dir_path):
    """Create .editorconfig file in given directory and return filepath."""
    path = Path(dir_path, '.editorconfig')
    if not path.exists():
        path.parent.mkdir(exist_ok=True, parent=True)
        path.touch()
    return path

Here’s a command-line program that accepts a string representing a directory and prints the contents of the .gitignore file in that directory if one exists:

1
2
3
4
5
6
7
8
9
import os.path
import sys


directory = sys.argv[1]
ignore_filename = os.path.join(directory, '.gitignore')
if os.path.isfile(ignore_filename):
    with open(ignore_filename, mode='rt') as ignore_file:
        print(ignore_file.read(), end='')

This is the same code using pathlib.Path:

1
2
3
4
5
6
7
8
from pathlib import Path
import sys


directory = Path(sys.argv[1])
ignore_path = directory / '.gitignore'
if ignore_path.is_file():
    print(ignore_path.read_text(), end='')

And here’s some code that prints all groups of files in and below the current directory which are duplicates:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from collections import defaultdict
from hashlib import md5
from os import getcwd, walk
import os.path


def find_files(filepath):
    for root, directories, filenames in walk(filepath):
        for filename in filenames:
            yield os.path.join(root, filename)


file_hashes = defaultdict(list)
for path in find_files(getcwd()):
    with open(path, mode='rb') as my_file:
        file_hash = md5(my_file.read()).hexdigest()
        file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

This is the same code that uses pathlib.Path instead:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from collections import defaultdict
from hashlib import md5
from pathlib import Path


def find_files(filepath):
    for path in Path(filepath).rglob('*'):
        if path.is_file():
            yield path


file_hashes = defaultdict(list)
for path in find_files(Path.cwd()):
    file_hash = md5(path.read_bytes()).hexdigest()
    file_hashes[file_hash].append(path)

for paths in file_hashes.values():
    if len(paths) > 1:
        print("Duplicate files found:")
        print(*paths, sep='\n')

The changes here are subtle, but I think they add up. I prefer this pathlib-refactored version.

Start using pathlib.Path objects

Let’s recap.

The / separators in pathlib.Path strings are automatically converted to the correct path separator based on the operating system you’re on. This is a huge feature that can make for code that is more readable and more certain to be free of path-related bugs.

1
2
3
4
5
6
7
>>> path1 = Path('dir', 'file')
>>> path2 = Path('dir') / 'file'
>>> path3 = Path('dir/file')
>>> path3
WindowsPath('dir/file')
>>> path1 == path2 == path3
True

The Python standard library and built-ins (like open) also accept pathlib.Path objects now. This means you can start using pathlib, even if your dependencies don’t!

1
2
3
4
5
6
from shutil import move

def rename_and_redirect(old_filename, new_filename):
    move(old, new)
    with open(old, mode='wt') as f:
        f.write(f'This file has moved to {new}')
1
2
3
4
5
>>> from pathlib import Path
>>> old, new = Path('old.txt'), Path('new.txt')
>>> rename_and_redirect(old, new)
>>> old.read_text()
'This file has moved to new.txt'

And if you don’t like pathlib, you can use a third-party library that provides the same path-like interface. This is great because even if you’re not a fan of pathlib you’ll still benefit from the new changes detailed in PEP 519.

1
2
3
4
5
6
>>> from plumbum import Path
>>> my_path = Path('old.txt')
>>> with open(my_path) as f:
...     print(f.read())
...
This file has moved to new.txt

While pathlib is sometimes slower than the alternative(s), the cases where this matters are somewhat rare (in my experience at least) and you can always jump back to using path strings for parts of your code that are particularly performance sensitive.

And in general, pathlib makes for more readable code. Here’s a succinct and descriptive Python script to demonstrate my point:

1
2
3
4
from pathlib import Path
gitignore = Path('.gitignore')
if gitignore.is_file():
    print(gitignore.read_text(), end='')

The pathlib module is lovely: start using it!

January 15, 2019 07:20 PM UTC


codingdirectional

Turn video into black and white with python

Hi, in this quick article we are going to turn a colorful video into a black and white video using python with the help of the FFmpeg tool. We will use the equalizer filter from FFmpeg tool to get the job done. If you have missed the previous steps to resize and to add music into the video then you can read the previous article about it. Below are two pictures showing the video scene before and after applying the equalizer filter.

Before editing
After editing

Playing around with the FFmpeg’s eq filter by yourself to find out more effects. Below is the program which changes the color of the video.

from tkinter import *
from tkinter import filedialog
import os
import subprocess
import tkinter.ttk as tk

win = Tk() # Create instance
win.title("Manipulate Video") # Add a title
win.resizable(0, 0) # Disable resizing the GUI
win.configure(background='white') # change background color

#  Create a label
aLabel = Label(win, text="Select video size and video", anchor="center", padx=13, pady=10, relief=RAISED)
aLabel.grid(column=0, row=0, sticky=W+E)
aLabel.configure(foreground="black")
aLabel.configure(background="white")
aLabel.configure(wraplength=110)

# Create a combo box
vid_size = StringVar() # create a string variable
preferSize = tk.Combobox(win, textvariable=vid_size) 
preferSize['values'] = (1920, 1280, 854, 640) # video width in pixels
preferSize.grid(column=0, row=1) # the position of combo box
preferSize.current(0) # select item one 

# Open a video file
def openVideo():
        
        fullfilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Video file", "*.mp4; *.avi ")]) # select a video file from the hard drive
        audiofilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Audio file", "*.mp4; *.ogg ")]) # select a new audio file from the hard drive
        if(fullfilename != '' and audiofilename != ''):
                scale_vid = preferSize.get() # retrieve value in the comno box
                new_size = str(scale_vid)
                dir_path = os.path.dirname(os.path.realpath(fullfilename))
                os.chdir(dir_path)
                f = new_size  + '.mp4' # the new output file name/format
                f2 = f + '1.mp4' # video without audio
                #subprocess.call(['ffmpeg', '-stream_loop', '2', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize and loop the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', '-r', '24', f]) # resize and speed up the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize the video with ffmpeg
                #subprocess.call(['ffmpeg', '-i', f, '-ss', '00:02:30', '-y', f2]) # create animated gif starting from 2 minutes and 30 seconds to the end
                subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize the video with ffmpeg
                subprocess.call(['ffmpeg', '-i', f, '-c', 'copy', '-y', '-an', f2]) # remove audio from the original video
                subprocess.call(['ffmpeg', '-i', f2, '-i', audiofilename, '-shortest', '-c:v', 'copy', '-c:a', 'aac', '-b:a', '256k', '-y', f]) # add audio to the original video, trim either the audio or video depends on which one is longer
                subprocess.call(['ffmpeg', '-i', f, '-vf', 'eq=contrast=1.3:brightness=-0.03:saturation=0.01', '-y', f2]) # adjust the saturation contrast and brightness of video

                
action_vid = Button(win, text="Open Video", command=openVideo, padx=2)
action_vid.grid(column=0, row=2, sticky=E+W)
action_vid.configure(background='black')
action_vid.configure(foreground='white')

win.mainloop()

We are supposed to edit the user interface of this video editing application today but instead, we will leave that to the next round since there is already enough thing for you to learn in this chapter.

January 15, 2019 12:16 PM UTC


Reuven Lerner

Registration ends today for Weekly Python Exercise

Weekly Python Exercise logoIf you want to join this month’s cohort of Weekly Python Exercise, you’d better act fast: Registration ends today!

Then Weekly Python Exercise is a great way to go.

Click here to join Weekly Python Exercise.

Questions?  You can always e-mail me (reuven@lerner.co.il) or hit me up on Twitter (@reuvenmlerner).  Or you can watch the Q&A Webinar I held last night, answering questions from others interested in joining this cohort:

And don’t forget that I offer discounts for students, pensioners/retirees, and people living outside of the world’s 30 wealthiest countries.

Hundreds of other developers have improved their Python skills with WPE in the last 18 months.  Improve your career, stop feeling stuck, and stop searching Stack Overflow, and improve your Python fluency!

The post Registration ends today for Weekly Python Exercise appeared first on Lerner Consulting Blog.

January 15, 2019 12:10 PM UTC


Made With Mu

Happy Mu Year 2019!

New year, new Mu! We’re very happy to release the next version of Mu, version 1.0.2. Please update to this release.

This is a bug fix release with numerous contributions from our growing community of volunteer developers. Collectively, we’ve managed to fix and improve many of the things users have been telling us about. Please see the release notes (linked to above) for details of the fixes made.

As always, if you spot any bugs, please don’t hesitate to let us know.

Thanks especially to GitHub user @Linguini2004, Craig Steele, John Guan, Tiago Montes, Tim Golden, Carlos Pereira Atencio, GitHub user @wu6692776, Eberhard Fahle, Limor Fried, Tim McCurrach and Brent Rubell for their valuable contributions.

Onwards and upwards! Next stop 1.1.

January 15, 2019 12:00 PM UTC