skip to navigation
skip to content

Planet Python

Last update: January 23, 2019 09:48 PM UTC

January 23, 2019


Python Engineering at Microsoft

Python in Visual Studio 2019 Preview 2

Today we are releasing Visual Studio 2019 Preview 2, which contains new features for Python developers to improve the experience for managing Python environments and enable you to work with Python code without having to create a Python project. We’ve also enabled Python support for Visual Studio Live Share.

We’ll take a closer look at these new features in the rest of this post.

Creating Python Environments

To make it easier for you to create virtual or conda Python environments for your project, the ability to create Python environments has been moved from the Python environments window to a new Add environment dialog that can be opened from various parts of Visual Studio. This improves discoverability and enables new capabilities such as the ability to create conda environments on-demand and support for Open Folder described later in this post.

For example, when opening a project that contains a requirements.txt file or environment.yml but no virtual environment or conda environment is found, you will be prompted to create an environment with a notification:

In this case clicking on ‘Create virtual environment’ will show a new Add environment dialog box, pre-configured to create a new virtual environment using the provided requirements.txt file:

You can also use the Add environment dialog to create a conda environment, using an environment.yml file, or by specifying a list of packages to install:

The dialog also allows you to add existing virtual environments on your machine, or to install new versions of Python.

After clicking on Create, you will see a progress notification in the status bar, and you can click the blue link to view progress in the Output window:

You can continue working while the environment is being created.

Switching from Anaconda to Miniconda

Previous versions of Visual Studio allowed you to install Anaconda through the Visual Studio Installer, and while this enabled you to easily acquire Python data science packages, this resulted in large Visual Studio installation times and sometimes caused reliability issues when upgrading versions of Visual Studio. To address these problems, Anaconda has been removed in favor of the much smaller Miniconda, which is now installed as a default optional component in for the Python workload:

Miniconda allows you to create conda environments on-demand using the Add environment dialog. If you still want to continue using a full install of Anaconda, you can install Anaconda yourself and continue working with Anaconda by selecting your Anaconda install as the active Python environment.  Note that if both the Visual Studio bundled Miniconda and Anaconda are installed, we will use Miniconda to create conda environments. If you prefer to use your own conda version you can specify the path to conda.exe in Tools > Options > Python > Conda.

Python Toolbar and Open Folder Support

In previous versions of Visual Studio we required you to create a Python project in order to work with Python code. We have added a new Python toolbar that allows you to work with Python code without having to create or open a Python project. The toolbar allows you to switch between Python environments, add new Python environments, or manage Python packages installed in the current environment:

The Python toolbar will appear whenever a Python file is open and allows you to select your Python interpreter when working with files in Open Folder workspaces or Python files included in C++ or C# projects.

In the case of Open Folder, your selection is stored in the .vs/PythonSettings.json file so that the same environment is selected the next time you open Visual Studio. By default, the debugger will debug the currently opened Python file, and will run the script using the currently selected Python environment:

To customize your debug settings, you can right-click a Python file and select “Debug and Launch Settings”:

This will generate a launch.vs.json file with Python settings which can be used to customize debug settings:

Note that in Preview 2, the editor and interactive window do not use the currently selected Python environment when using Open Folder. This functionality will be added in future previews of Visual Studio 2019.

Live Share Support for Python

In this release you can now use Visual Studio Live Share with Python files. Previously, you could only use Live Share with Python by hosting a session with Visual Studio Code. You can initiate a Live Share session by clicking the Live Share button on the upper right corner of Visual Studio:

Users who join your live share session will be able to see your Python files, see IntelliSense from your selected Python environment, and collaboratively debug through Python code:

Try it out!

Be sure to download the Visual Studio 2019 Preview 2, install the Python Workload, and give feedback on Visual Studio Developer Community.

January 23, 2019 09:10 PM UTC


Continuum Analytics Blog

RPM and Debian Repositories for Miniconda

Conda, the package manager from Anaconda, is now available as either a RedHat RPM or as a Debian package. The packages are the equivalent to the Miniconda installer which only contains conda and its dependencies.  You can use yum or apt-get to install, uninstall and manage conda on your system. To install conda follow the …
Read more →

The post RPM and Debian Repositories for Miniconda appeared first on Anaconda.

January 23, 2019 06:02 PM UTC


Python Does What?!

So a list and a tuple walk into a sum()

As a direct side effect of glom's 19.1.0 release, the authors here at PDW got to re-experience one of the more surprising behaviors of three of Python's most basic constructs:

Most experienced developers know the quickest way to combine a short list of short lists:
list_of_lists = [[1], [2], [3, 4]]
sum(list_of_lists, [])
# [1, 2, 3, 4]
Ah, nice and flat, much better.

But what happens when we throw a tuple into the mix:
list_of_seqs = [[1], [2], (3, 4)]
sum(list_of_seqs, [])
# TypeError: can only concatenate list (not "tuple") to list
This is kind of surprising! Especially when you consider this:
seq = [1, 2]
seq += (3, 4)
# [1, 2, 3, 4]
Why should sum() fail when addition succeeds?! We'll get to that.
new_list = [1, 2] + (3, 4)
# TypeError: can only concatenate list (not "tuple") to list
There's that error again!

The trick here is that Python has two addition operators. The simple "+" or "add" operator, used by sum(), and the more nuanced "+=" or "iadd" operator, add's inplace variant.

But why is ok for one addition to error and the other to succeed?

Symmetry. And maybe commutativity if you remember that math class.

"+" in Python is symmetric: A + B and B + A should always yield the same result. To do otherwise would be more surprising than any of the surprises above. list and tuple cannot be added with this operator because in a mixed-type situation, the return type would change based on ordering.

Meanwhile, "+=" is asymmetric. The left side of the statement determines the type of the return completely. A += B keeps A's type. A straightforward, Pythonic reason if there ever was one.

Going back to the start of our story, by building on operator.iadd, glom's new flatten() function avoids sum()'s error-raising behavior and works wonders on all manner of nesting iterable.

January 23, 2019 09:00 AM UTC


Real Python

Python’s Requests Library (Guide)

The requests library is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application.

Throughout this article, you’ll see some of the most useful features that requests has to offer as well as how to customize and optimize those features for different situations you may come across. You’ll also learn how to use requests in an efficient way as well as how to prevent requests to external services from slowing down your application.

In this tutorial, you’ll learn how to:

Though I’ve tried to include as much information as you need to understand the features and examples included in this article, I do assume a very basic general knowledge of HTTP. That said, you still may be able to follow along fine anyway.

Now that that is out of the way, let’s dive in and see how you can use requests in your application!

Getting Started With requests

Let’s begin by installing the requests library. To do so, run the following command:

$ pip install requests

If you prefer to use Pipenv for managing Python packages, you can run the following:

$ pipenv install requests

Once requests is installed, you can use it in your application. Importing requests looks like this:

import requests

Now that you’re all set up, it’s time to begin your journey through requests. Your first goal will be learning how to make a GET request.

The GET Request

HTTP methods such as GET and POST, determine which action you’re trying to perform when making an HTTP request. Besides GET and POST, there are several other common methods that you’ll use later in this tutorial.

One of the most common HTTP methods is GET. The GET method indicates that you’re trying to get or retrieve data from a specified resource. To make a GET request, invoke requests.get().

To test this out, you can make a GET request to GitHub’s Root REST API by calling get() with the following URL:

>>>
>>> requests.get('https://api.github.com')
<Response [200]>

Congratulations! You’ve made your first request. Let’s dive a little deeper into the response of that request.

The Response

A Response is a powerful object for inspecting the results of the request. Let’s make that same request again, but this time store the return value in a variable so that you can get a closer look at its attributes and behaviors:

>>>
>>> response = requests.get('https://api.github.com')

In this example, you’ve captured the return value of get(), which is an instance of Response, and stored it in a variable called response. You can now use response to see a lot of information about the results of your GET request.

Status Codes

The first bit of information that you can gather from Response is the status code. A status code informs you of the status of the request.

For example, a 200 OK status means that your request was successful, whereas a 404 NOT FOUND status means that the resource you were looking for was not found. There are many other possible status codes as well to give you specific insights into what happened with your request.

By accessing .status_code, you can see the status code that the server returned:

>>>
>>> response.status_code
200

.status_code returned a 200, which means your request was successful and the server responded with the data you were requesting.

Sometimes, you might want to use this information to make decisions in your code:

if response.status_code == 200:
    print('Success!')
elif response.status_code == 404:
    print('Not Found.')

With this logic, if the server returns a 200 status code, your program will print Success!. If the result is a 404, your program will print Not Found.

requests goes one step further in simplifying this process for you. If you use a Response instance in a conditional expression, it will evaluate to True if the status code was between 200 and 400, and False otherwise.

Therefore, you can simplify the last example by rewriting the if statement:

if response:
    print('Success!')
else:
    print('An error has occurred.')

Technical Detail: This Truth Value Test is made possible because __bool__() is an overloaded method on Response.

This means that the default behavior of Response has been redefined to take the status code into account when determining the truth value of the object.

Keep in mind that this method is not verifying that the status code is equal to 200. The reason for this is that other status codes within the 200 to 400 range, such as 204 NO CONTENT and 304 NOT MODIFIED, are also considered successful in the sense that they provide some workable response.

For example, the 204 tells you that the response was successful, but there’s no content to return in the message body.

So, make sure you use this convenient shorthand only if you want to know if the request was generally successful and then, if necessary, handle the response appropriately based on the status code.

Let’s say you don’t want to check the response’s status code in an if statement. Instead, you want to raise an exception if the request was unsuccessful. You can do this using .raise_for_status():

import requests
from requests.exceptions import HTTPError

for url in ['https://api.github.com', 'https://api.github.com/invalid']:
    try:
        response = requests.get(url)

        # If the response was successful, no Exception will be raised
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')  # Python 3.6
    except Exception as err:
        print(f'Other error occurred: {err}')  # Python 3.6
    else:
        print('Success!')

If you invoke .raise_for_status(), an HTTPError will be raised for certain status codes. If the status code indicates a successful request, the program will proceed without that exception being raised.

Further Reading: If you’re not familiar with Python 3.6’s f-strings, I encourage you to take advantage of them as they are a great way to simplify your formatted strings.

Now, you know a lot about how to deal with the status code of the response you got back from the server. However, when you make a GET request, you rarely only care about the status code of the response. Usually, you want to see more. Next, you’ll see how to view the actual data that the server sent back in the body of the response.

Content

The response of a GET request often has some valuable information, known as a payload, in the message body. Using the attributes and methods of Response, you can view the payload in a variety of different formats.

To see the response’s content in bytes, you use .content:

>>>
>>> response = requests.get('https://api.github.com')
>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. response will do that for you when you access .text:

>>>
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

Because the decoding of bytes to a str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:

>>>
>>> response.encoding = 'utf-8' # Optional: requests infers this internally
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'

If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():

>>>
>>> response.json()
{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}

The type of the return value of .json() is a dictionary, so you can access values in the object by key.

You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.

Headers

The response headers can give you useful information, such as the content type of the response payload and a time limit on how long to cache the response. To view these headers, access .headers:

>>>
>>> response.headers
{'Server': 'GitHub.com', 'Date': 'Mon, 10 Dec 2018 17:49:54 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Status': '200 OK', 'X-RateLimit-Limit': '60', 'X-RateLimit-Remaining': '59', 'X-RateLimit-Reset': '1544467794', 'Cache-Control': 'public, max-age=60, s-maxage=60', 'Vary': 'Accept', 'ETag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'X-GitHub-Media-Type': 'github.v3; format=json', 'Access-Control-Expose-Headers': 'ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type', 'Access-Control-Allow-Origin': '*', 'Strict-Transport-Security': 'max-age=31536000; includeSubdomains; preload', 'X-Frame-Options': 'deny', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Referrer-Policy': 'origin-when-cross-origin, strict-origin-when-cross-origin', 'Content-Security-Policy': "default-src 'none'", 'Content-Encoding': 'gzip', 'X-GitHub-Request-Id': 'E439:4581:CF2351:1CA3E06:5C0EA741'}

.headers returns a dictionary-like object, allowing you to access header values by key. For example, to see the content type of the response payload, you can access Content-Type:

>>>
>>> response.headers['Content-Type']
'application/json; charset=utf-8'

There is something special about this dictionary-like headers object, though. The HTTP spec defines headers to be case-insensitive, which means we are able to access these headers without worrying about their capitalization:

>>>
>>> response.headers['content-type']
'application/json; charset=utf-8'

Whether you use the key 'content-type' or 'Content-Type', you’ll get the same value.

Now, you’ve learned the basics about Response. You’ve seen its most useful attributes and methods in action. Let’s take a step back and see how your responses change when you customize your GET requests.

Query String Parameters

One common way to customize a GET request is to pass values through query string parameters in the URL. To do this using get(), you pass data to params. For example, you can use GitHub’s Search API to look for the requests library:

import requests

# Search GitHub's repositories for requests
response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
)

# Inspect some attributes of the `requests` repository
json_response = response.json()
repository = json_response['items'][0]
print(f'Repository name: {repository["name"]}')  # Python 3.6+
print(f'Repository description: {repository["description"]}')  # Python 3.6+

By passing the dictionary {'q': 'requests+language:python'} to the params parameter of .get(), you are able to modify the results that come back from the Search API.

You can pass params to get() in the form of a dictionary, as you have just done, or as a list of tuples:

>>>
>>> requests.get(
...     'https://api.github.com/search/repositories',
...     params=[('q', 'requests+language:python')],
... )
<Response [200]>

You can even pass the values as bytes:

>>>
>>> requests.get(
...     'https://api.github.com/search/repositories',
...     params=b'q=requests+language:python',
... )
<Response [200]>

Query strings are useful for parameterizing GET requests. You can also customize your requests by adding or modifying the headers you send.

Request Headers

To customize headers, you pass a dictionary of HTTP headers to get() using the headers parameter. For example, you can change your previous search request to highlight matching search terms in the results by specifying the text-match media type in the Accept header:

import requests

response = requests.get(
    'https://api.github.com/search/repositories',
    params={'q': 'requests+language:python'},
    headers={'Accept': 'application/vnd.github.v3.text-match+json'},
)

# View the new `text-matches` array which provides information
# about your search term within the results
json_response = response.json()
repository = json_response['items'][0]
print(f'Text matches: {repository["text_matches"]}')

The Accept header tells the server what content types your application can handle. In this case, since you’re expecting the matching search terms to be highlighted, you’re using the header value application/vnd.github.v3.text-match+json, which is a proprietary GitHub Accept header where the content is a special JSON format.

Before you learn more ways to customize requests, let’s broaden the horizon by exploring other HTTP methods.

Other HTTP Methods

Aside from GET, other popular HTTP methods include POST, PUT, DELETE, HEAD, PATCH, and OPTIONS. requests provides a method, with a similar signature to get(), for each of these HTTP methods:

>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
>>> requests.put('https://httpbin.org/put', data={'key':'value'})
>>> requests.delete('https://httpbin.org/delete')
>>> requests.head('https://httpbin.org/get')
>>> requests.patch('https://httpbin.org/patch', data={'key':'value'})
>>> requests.options('https://httpbin.org/get')

Each function call makes a request to the httpbin service using the corresponding HTTP method. For each method, you can inspect their responses in the same way you did before:

>>>
>>> response = requests.head('https://httpbin.org/get')
>>> response.headers['Content-Type']
'application/json'

>>> response = requests.delete('https://httpbin.org/delete')
>>> json_response = response.json()
>>> json_response['args']
{}

Headers, response bodies, status codes, and more are returned in the Response for each method. Next you’ll take a closer look at the POST, PUT, and PATCH methods and learn how they differ from the other request types.

The Message Body

According to the HTTP specification, POST, PUT, and the less common PATCH requests pass their data through the message body rather than through parameters in the query string. Using requests, you’ll pass the payload to the corresponding function’s data parameter.

data takes a dictionary, a list of tuples, bytes, or a file-like object. You’ll want to adapt the data you send in the body of your request to the specific needs of the service you’re interacting with.

For example, if your request’s content type is application/x-www-form-urlencoded, you can send the form data as a dictionary:

>>>
>>> requests.post('https://httpbin.org/post', data={'key':'value'})
<Response [200]>

You can also send that same data as a list of tuples:

>>>
>>> requests.post('https://httpbin.org/post', data=[('key', 'value')])
<Response [200]>

If, however, you need to send JSON data, you can use the json parameter. When you pass JSON data via json, requests will serialize your data and add the correct Content-Type header for you.

httpbin.org is a great resource created by the author of requests, Kenneth Reitz. It’s a service that accepts test requests and responds with data about the requests. For instance, you can use it to inspect a basic POST request:

>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> json_response = response.json()
>>> json_response['data']
'{"key": "value"}'
>>> json_response['headers']['Content-Type']
'application/json'

You can see from the response that the server received your request data and headers as you sent them. requests also provides this information to you in the form of a PreparedRequest.

Inspecting Your Request

When you make a request, the requests library prepares the request before actually sending it to the destination server. Request preparation includes things like validating headers and serializing JSON content.

You can view the PreparedRequest by accessing .request:

>>>
>>> response = requests.post('https://httpbin.org/post', json={'key':'value'})
>>> response.request.headers['Content-Type']
'application/json'
>>> response.request.url
'https://httpbin.org/post'
>>> response.request.body
b'{"key": "value"}'

Inspecting the PreparedRequest gives you access to all kinds of information about the request being made such as payload, URL, headers, authentication, and more.

So far, you’ve made a lot of different kinds of requests, but they’ve all had one thing in common: they’re unauthenticated requests to public APIs. Many services you may come across will want you to authenticate in some way.

Authentication

Authentication helps a service understand who you are. Typically, you provide your credentials to a server by passing data through the Authorization header or a custom header defined by the service. All the request functions you’ve seen to this point provide a parameter called auth, which allows you to pass your credentials.

One example of an API that requires authentication is GitHub’s Authenticated User API. This endpoint provides information about the authenticated user’s profile. To make a request to the Authenticated User API, you can pass your GitHub username and password in a tuple to get():

>>>
>>> from getpass import getpass
>>> requests.get('https://api.github.com/user', auth=('username', getpass()))
<Response [200]>

The request succeeded if the credentials you passed in the tuple to auth are valid. If you try to make this request with no credentials, you’ll see that the status code is 401 Unauthorized:

>>>
>>> requests.get('https://api.github.com/user')
<Response [401]>

When you pass your username and password in a tuple to the auth parameter, requests is applying the credentials using HTTP’s Basic access authentication scheme under the hood.

Therefore, you could make the same request by passing explicit Basic authentication credentials using HTTPBasicAuth:

>>>
>>> from requests.auth import HTTPBasicAuth
>>> from getpass import getpass
>>> requests.get(
...     'https://api.github.com/user',
...     auth=HTTPBasicAuth('username', getpass())
... )
<Response [200]>

Though you don’t need to be explicit for Basic authentication, you may want to authenticate using another method. requests provides other methods of authentication out of the box such as HTTPDigestAuth and HTTPProxyAuth.

You can even supply your own authentication mechanism. To do so, you must first create a subclass of AuthBase. Then, you implement __call__():

import requests
from requests.auth import AuthBase

class TokenAuth(AuthBase):
    """Implements a custom authentication scheme."""

    def __init__(self, token):
        self.token = token

    def __call__(self, r):
        """Attach an API token to a custom auth header."""
        r.headers['X-TokenAuth'] = f'{self.token}'  # Python 3.6+
        return r


requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))

Here, your custom TokenAuth mechanism receives a token, then includes that token in the X-TokenAuth header of your request.

Bad authentication mechanisms can lead to security vulnerabilities, so unless a service requires a custom authentication mechanism for some reason, you’ll always want to use a tried-and-true auth scheme like Basic or OAuth.

While you’re thinking about security, let’s consider dealing with SSL Certificates using requests.

SSL Certificate Verification

Any time the data you are trying to send or receive is sensitive, security is important. The way that you communicate with secure sites over HTTP is by establishing an encrypted connection using SSL, which means that verifying the target server’s SSL Certificate is critical.

The good news is that requests does this for you by default. However, there are some cases where you might want to change this behavior.

If you want to disable SSL Certificate verification, you pass False to the verify parameter of the request function:

>>>
>>> requests.get('https://api.github.com', verify=False)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning)
<Response [200]>

requests even warns you when you’re making an insecure request to help you keep your data safe!

Note: requests uses a package called certifi to provide Certificate Authorities. This lets requests know which authorities it can trust. Therefore, you should update certifi frequently to keep your connections as secure as possible.

Performance

When using requests, especially in a production application environment, it’s important to consider performance implications. Features like timeout control, sessions, and retry limits can help you keep your application running smoothly.

Timeouts

When you make an inline request to an external service, your system will need to wait upon the response before moving on. If your application waits too long for that response, requests to your service could back up, your user experience could suffer, or your background jobs could hang.

By default, requests will wait indefinitely on the response, so you should almost always specify a timeout duration to prevent these things from happening. To set the request’s timeout, use the timeout parameter. timeout can be an integer or float representing the number of seconds to wait on a response before timing out:

>>>
>>> requests.get('https://api.github.com', timeout=1)
<Response [200]>
>>> requests.get('https://api.github.com', timeout=3.05)
<Response [200]>

In the first request, the request will timeout after 1 second. In the second request, the request will timeout after 3.05 seconds.

You can also pass a tuple to timeout with the first element being a connect timeout (the time it allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection):

>>>
>>> requests.get('https://api.github.com', timeout=(2, 5))
<Response [200]>

If the request establishes a connection within 2 seconds and receives data within 5 seconds of the connection being established, then the response will be returned as it was before. If the request times out, then the function will raise a Timeout exception:

import requests
from requests.exceptions import Timeout

try:
    response = requests.get('https://api.github.com', timeout=1)
except Timeout:
    print('The request timed out')
else:
    print('The request did not time out')

Your program can catch the Timeout exception and respond accordingly.

The Session Object

Until now, you’ve been dealing with high level requests APIs such as get() and post(). These functions are abstractions of what’s going on when you make your requests. They hide implementation details such as how connections are managed so that you don’t have to worry about them.

Underneath those abstractions is a class called Session. If you need to fine-tune your control over how requests are being made or improve the performance of your requests, you may need to use a Session instance directly.

Sessions are used to persist parameters across requests. For example, if you want to use the same authentication across multiple requests, you could use a session:

import requests
from getpass import getpass

# By using a context manager, you can ensure the resources used by
# the session will be released after use
with requests.Session() as session:
    session.auth = ('username', getpass())

    # Instead of requests.get(), you'll use session.get()
    response = session.get('https://api.github.com/user')

# You can inspect the response just like you did before
print(response.headers)
print(response.json())

Each time you make a request with session, once it has been initialized with authentication credentials, the credentials will be persisted.

The primary performance optimization of sessions comes in the form of persistent connections. When your app makes a connection to a server using a Session, it keeps that connection around in a connection pool. When your app wants to connect to the same server again, it will reuse a connection from the pool rather than establishing a new one.

Max Retries

When a request fails, you may want your application to retry the same request. However, requests will not do this for you by default. To apply this functionality, you need to implement a custom Transport Adapter.

Transport Adapters let you define a set of configurations per service you’re interacting with. For example, let’s say you want all requests to https://api.github.com to retry three times before finally raising a ConnectionError. You would build a Transport Adapter, set its max_retries parameter, and mount it to an existing Session:

import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError

github_adapter = HTTPAdapter(max_retries=3)

session = requests.Session()

# Use `github_adapter` for all requests to endpoints that start with this URL
session.mount('https://api.github.com', github_adapter)

try:
    session.get('https://api.github.com')
except ConnectionError as ce:
    print(ce)

When you mount the HTTPAdapter, github_adapter, to session, session will adhere to its configuration for each request to https://api.github.com.

Timeouts, Transport Adapters, and sessions are for keeping your code efficient and your application resilient.

Conclusion

You’ve come a long way in learning about Python’s powerful requests library.

You’re now able to:

Because you learned how to use requests, you’re equipped to explore the wide world of web services and build awesome applications using the fascinating data they provide.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 23, 2019 06:00 AM UTC

January 22, 2019


PyCoder’s Weekly

Issue #352 (Jan. 22, 2019)

#352 – JANUARY 22, 2019
View in Browser »

The PyCoder’s Weekly Logo


Test-Drive Programming Fonts in Your Browser

This is my new favorite thing. You can test-drive various coding fonts directly inside your browser, without cluttering up your local OS install. I’m still a fan of Luxi Mono :)
PROGRAMMINGFONTS.ORG

Async IO in Python: A Complete Walkthrough

This tutorial will give you a firm grasp of Python’s approach to async IO, which is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7 (and probably beyond).
REAL PYTHON

Become a Python Guru With PyCharm

alt

PyCharm is the Python IDE for Professional Developers by JetBrains providing a complete set of tools for productive Python, Web and scientific development. Be more productive and save time while PyCharm takes care of the routine.
JETBRAINS sponsor

Pyxel, Python’s Most Underrated Game Engine for Beginners

Nice little tutorial on building a Pong game with Pyxel, the Python retro game engine. This could be a fun project if you’re looking to get into game development with Python or if you’re trying to get your kids interested in programming.
VOMKONSTANT.IN

Python Quirks: Lambdas

Details the history and shortcomings of lambdas in Python. Also includes some patterns and corresponding benchmarks.
PHILIP TRAUNER

Regression With Keras

Detailed tutorial on how to perform regression using Keras. Learn how to train a Keras neural network for regression and continuous value prediction, specifically in the context of house price prediction.
ADRIAN ROSEBROCK

How I Built a Python Web Framework and Became an Open Source Maintainer

Inspirational thoughts and tips on starting and managing an open source project, based on the author’s experience building Bocadillo, an asynchronous Python web framework.
FLORIMOND MANCA • Shared by Python Bytes FM

Python Lands on the Windows 10 App Store

Python 3.7 can now be installed via the Windows app store. Assuming this works as intended, this has the potential to make setting up Python much easier for beginners and teachers alike.
MEHEDI HASSAN • Shared by Python Bytes FM

Speeding Up Python With Nim

How to create Python extension modules with Nim and NimPy to speed up the execution of code that runs slowly in CPython. This is similar to writing a C extension (Nim compiles to C) but writing correct Nim code might be easier than writing the equivalent C code directly. Nice writeup.
ROBERT MCDERMOTT

NumPy 1.16.0 Is the Last Release to Support Python 2.7

GITHUB.COM/NUMPY

Discussions

PSF Request for Proposals for Paid Contract Work on PyPI Closes on January 31st

If you’re interested in implementing important Security, Accessibility, and Localization features for PyPI, reach out to the PSF.
TWITTER.COM/THEPSF

Potential Suspension of Django Channels Development

Andrew is looking for contributors: “I’ve been the sole maintainer of these projects for quite a while and it has become unsustainable - all of my energy is taken up fielding issues and support requests and I haven’t been able to even get myself to start looking at Django async stuff because of it.”
ANDREW GODWIN

Useful Python Libraries/Modules You Should Know?

REDDIT

Python Jobs

Senior Engineer Python & More (Winterthur, Switzerland)

DEEP IMPACT AG

Sr Enterprise Python Developer (Toronto, Canada)

Kognitiv

Senior Software Engineer (Santa Monica, CA)

GoodRX

Python Software Engineers (Palo Alto, CA)

Rhythm Diagnostic Systems, Inc

Senior Python Developer (Vienna, Austria)

Adverity GmbH

Python Tutorial Authors Wanted (100% Remote)

Real Python

More Python Jobs >>>

Articles & Tutorials

Speeding Up Your Python Tests Feedback Loop

Working with a dog-slow test suite is frustrating. Itamar’s article gives you some practical tips you can use to speed up your testing feedback loop. This is where I learned about pytest-sermon.
ITAMAR TURNER-TRAURING

A Letter to the Python Community in Africa

“Over the last 12 months, I’ve been enraptured with the creativity and passion of the Python community across the African continent and want to use this letter to explore and share some of the amazing things that are happening.”
ANTHONY SHAW

Python Tricks: A Buffet of Awesome Python Features

alt

Discover Python’s best practices with simple examples and start writing even more beautiful + Pythonic code. “Python Tricks: The Book” shows you exactly how. You’ll master intermediate and advanced-level features in Python with practical examples and a clear narrative. Get the book + video bundle 33% off →
DAN BADER sponsor

Demystifying @decorators in Python

An introduction to decorators in Python.
SUMIT GHOSH

PSF Announces Fellow Members for Q4 2018

Congratulations!
PSF BLOG

Local Web Development vs Vagrant vs Docker

Web development is full of tools that claim to help you develop your perfect application. What’s the right tool? This post explores options like Docker, Vagrant, and honcho to see which tool can work for you on your next (or current) web app.
MATT LAYMAN • Shared by Matt Layman

Python and Finance: An Introductory Programming Tutorial

“Python provides many advantages over the traditionally popular VBA scripts for finance professionals looking to automate and enhance their work processes. This article explores how to use Python and finance together via a practical step-by-step tutorial.”
STEFAN THELIN

Top Seven Apps Built With Python

Find out why giants like Instagram and Spotify chose to run on Python.
DJANGOSTARS.COM

Working With Files in Python

In this tutorial, you’ll learn how you can work with files in Python by using built-in modules to perform practical tasks that involve groups of files, like renaming them, moving them around, archiving them, and getting their metadata.
REAL PYTHON

Python 2 Is Ending, We Need to Move to Python 3

“Open edX has nearly a million lines of Python code, and they all have to run on Python 3 by the end of the year. Much of the work is not hard, it’s just extensive, and can’t all be done automatically.”
NED BATCHELDER

Clean Architectures in Python

A pay-what-you-want book on software architecture.
LEONARDO GIORDANI book

Django Transaction Tests in Pytest

How to get the automatic rollback behavior of Django’s TransactionTestCase with Pytest.
ELENI LIXOURIOTI

Recursion vs Looping in Python

ETHAN JARRELL

Big O Notation and Algorithm Analysis With Python Examples

USMAN MALIK

Projects & Code

pytest-testmon: Taking TDD to a New Level With Testmon and Pytest for Python

A Pytest plug-in which automatically selects and re-executes only tests affected by recent changes.
GITHUB.COM/TARPAS

Django 2.2 Alpha 1 Released

New features and changes in the upcoming Django 2.2 can be found in the changelog.
DJANGOPROJECT.COM

Counteracting Code Complexity With Wily

Code complexity can reduce the maintainability of software projects. Wily is a Python tool to track and report on software complexity metrics to guide refactoring.
PODCASTINIT.COM podcast

awesome-flake8-extensions: Curated List of Flake8 Extensions

GITHUB.COM/DMYTROLITVINOV • Shared by Python Bytes FM

Mypy 0.660 Released

MYPY-LANG.BLOGSPOT.COM

nimpy: Nim → Python Bridge

GITHUB.COM/YGLUKHOV

mps-youtube: Terminal Based YouTube Player and Downloader

GITHUB.COM/MPS-YOUTUBE

Counting the Number of People Around You by Monitoring Wifi Signals

GITHUB.COM/SCHOLLZ

deep-learning-ocean: All You Need to Know About Deep Learning

Extensive list of learning resources.
GITHUB.COM/OSFORSCIENCE

drymail: Minimalist Email Framework for Python 3

GITHUB.COM/SKULLTECH

django-mfa2: Multi-Factor Authentication for Django

PYPI.ORG

Events

PyCon PH

February 23–24, 2019 in Makati, Philippines
PYTHON.PH • Shared by Angelica Lapastora

PythOnRio Meetup

January 26, 2019
PYTHON.ORG.BR

PyCoffee Porto Meetup

January 27, 2019
MEETUP.COM

Inland Empire Pyladies (CA, USA)

January 28, 2019
MEETUP.COM

Python Sheffield

January 29, 2019
GOOGLE.COM

Heidelberg Python Meetup

January 30, 2019
MEETUP.COM

PiterPy Breakfast

January 30, 2019
TIMEPAD.RU

SPb Python Drinkup

January 31, 2019
MEETUP.COM


Happy Pythoning!
This was PyCoder’s Weekly Issue #352.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

January 22, 2019 08:30 PM UTC


Mike Driscoll

Sample Chapters from Creating wxPython Applications Book

The Kickstarter campaign for my latest book has been going quite well, so I thought it would be fun to share some sample chapters of the book with you. You can check out the first couple of chapters here as a PDF.

I have also been doing some experiments with regards to some of the ideas that were given about other chapters for the book for the stretch goals of the Kickstarter. I haven’t made any concrete decisions as of yet, but I do think that interacting with the NASA website’s API sounds fun and appears easy to do as well.

I will research the feasibility of the other ideas too.

Thanks so much for your support!

January 22, 2019 08:24 PM UTC


Codementor

AI introduction in healthcare

Starting late AI frameworks have sent colossal waves transversely over social protection, despite fuelling the working systems. But the question remains whether AI experts will over the long haul...

January 22, 2019 03:49 PM UTC


codingdirectional

Create an animated image feature for the video editing application

It seems like there are lots of people reading this video editing project and share this project with their friends which has really encouraged me to continue developing this project. There are a few of you that ask me to include a few new features into this application and I will do that but for now, do allow me to tidy up the user interface for this project first before we proceed further.

In this chapter, we will create the input box which will receive the time duration input which then will be used to extract an animated image from a video. The user will enter the starting time where he wants the program to start recording the animated image and the total second he wants the animated image to play from that starting time. Besides that, I have also included a counter which will be used to name the new video or the animated image file. If you have entered a valid digit inside the time input box under the animated image part of the application then only the animated image with the selected size will be created. Otherwise the new video will be created as usual. Below is the entire program.

from tkinter import *
from tkinter import filedialog
import os
import subprocess
import tkinter.ttk as tk

win = Tk() # Create tk instance
win.title("NeW Vid") # Add a title
win.resizable(0, 0) # Disable resizing the GUI
win.configure(background='white') # change background color

mainframe = Frame(win) # create a frame
mainframe.pack()

eqFrame = Frame(win) # create eq frame
eqFrame.pack(side = TOP, fill=X)

animatedFrame = Frame(win) # create animated frame
animatedFrame.pack(side = TOP, fill=X)

buttonFrame = Frame(win) # create a button frame
buttonFrame.pack(side = BOTTOM, fill=X, pady = 6)

# Create a label and scale box for eq
contrast_variable = DoubleVar()
contrast = Scale(eqFrame, from_=float(-2.00), to=float(2.00), orient=HORIZONTAL, label="CONTRAST", digits=3, resolution=0.01, variable=contrast_variable)
contrast.set(1)
contrast.pack(side = LEFT)
brightness_variable = DoubleVar()
brightness = Scale(eqFrame, from_=float(-1.00), to=float(1.00), orient=HORIZONTAL, label="BRIGHTNESS", digits=3, resolution=0.01, variable=brightness_variable)
brightness.pack(side = LEFT)
saturation_variable = DoubleVar()
saturation = Scale(eqFrame, from_=float(0.00), to=float(3.00), orient=HORIZONTAL, label="SATURATION", digits=3, resolution=0.01, variable=saturation_variable)
saturation.set(1)
saturation.pack(side = LEFT)
gamma_variable = DoubleVar()
gamma = Scale(eqFrame, from_=float(0.10), to=float(10.00), orient=HORIZONTAL, label="GAMMA", digits=4, resolution=0.01, variable=gamma_variable)
gamma.set(1)
gamma.pack(side = LEFT)
loop_variable = DoubleVar()
loop = Scale(eqFrame, from_=float(0), to=float(10), orient=HORIZONTAL, label="REPEAT", digits=2, resolution=1, variable=loop_variable)
loop.pack(side = LEFT)
fr_variable = DoubleVar()
fr = Scale(eqFrame, from_=float(9), to=float(60), orient=HORIZONTAL, label="FPS", digits=2, resolution=1, variable=fr_variable)
fr.set(24)
fr.pack(side = LEFT)

#create animated gif
anime = Label(animatedFrame, text="Create Animated Image from Video   ")
anime.pack(side = TOP)
anime.pack(side = LEFT)

from_ = Label(animatedFrame, text="Start From (hour : minute : second)  ")
from_.pack(side = BOTTOM)
from_.pack(side = LEFT)
from_t_h_varable = StringVar()
from_t_h = Entry(animatedFrame, width=3, textvariable=from_t_h_varable)
from_t_h.pack(side=BOTTOM)
from_t_h.pack(side=LEFT)
from_m = Label(animatedFrame, text=" : ")
from_m.pack(side = BOTTOM)
from_m.pack(side = LEFT)
from_t_m_varable = StringVar()
from_t_m = Entry(animatedFrame, width=3,textvariable=from_t_m_varable)
from_t_m.pack(side=BOTTOM)
from_t_m.pack(side=LEFT)
from_s = Label(animatedFrame, text=" : ")
from_s.pack(side = BOTTOM)
from_s.pack(side = LEFT)
from_t_s_varable = StringVar()
from_t_s = Entry(animatedFrame, width=3,textvariable=from_t_s_varable)
from_t_s.pack(side=BOTTOM)
from_t_s.pack(side=LEFT)

to_ = Label(animatedFrame, text="  To (in second)  ")
to_.pack(side = BOTTOM)
to_.pack(side = LEFT)
#to_t_h_varable = StringVar()
#to_t_h = Entry(animatedFrame, width=3,textvariable=to_t_h_varable)
#to_t_h.pack(side=BOTTOM)
#to_t_h.pack(side=LEFT)
#to_m = Label(animatedFrame, text=" : ")
#to_m.pack(side = BOTTOM)
#to_m.pack(side = LEFT)
#to_t_m_varable = StringVar()
#to_t_m = Entry(animatedFrame, width=3,textvariable=to_t_m_varable)
#to_t_m.pack(side=BOTTOM)
#to_t_m.pack(side=LEFT)
#to_s = Label(animatedFrame, text=" : ")
#to_s.pack(side = BOTTOM)
#to_s.pack(side = LEFT)
to_t_s_varable = StringVar()
to_t_s = Entry(animatedFrame, width=3,textvariable=to_t_s_varable)
to_t_s.pack(side=BOTTOM)
to_t_s.pack(side=LEFT)

# Create a combo box
vid_size = StringVar() # create a string variable
preferSize = tk.Combobox(mainframe, textvariable=vid_size) 
preferSize['values'] = (1920, 1280, 854, 640) # video width in pixels
preferSize.current(0) # select item one 
preferSize.pack(side = LEFT)

# Create a combo box
vid_format = StringVar() # create a string variable
preferFormat = tk.Combobox(mainframe, textvariable=vid_format) 
preferFormat['values'] = ('.mp4', '.webm', '.avi', '.wmv', '.mpg', '.ogv') # video format
preferFormat.current(0) # select item one 
preferFormat.pack(side = LEFT)

removeAudioVal = IntVar()
removeAudio = tk.Checkbutton(mainframe, text="Remove Audio", variable=removeAudioVal)
removeAudio.pack(side = LEFT, padx=3)

newAudio = IntVar()
aNewAudio = tk.Checkbutton(mainframe, text="New Audio", variable=newAudio)
aNewAudio.pack(side = LEFT, padx=2)

count = 0 # counter uses to create multiple videos

# Open a video file
def openVideo():
        
        fullfilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Video file", "*.mp4; *.avi ")]) # select a video file from the hard drive
        audiofilename = ''
        if(newAudio.get() == 1):
                audiofilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Audio file", "*.wav; *.ogg ")]) # select a new audio file from the hard drive
            
        if(fullfilename != ''): 
                global count # access the global count variable
                scale_vid = preferSize.get() # retrieve value from the comno box
                new_size = str(scale_vid)
                dir_path = os.path.dirname(os.path.realpath(fullfilename))

                file_extension = fullfilename.split('.')[-1] # extract the video format from the original video

                os.chdir(dir_path) # change the directory to the original file's directory

                f = '_new_vid_' + new_size  + '.' + file_extension # the new output file name
                f2 = str(count)+f # second video
                f_gif = str(count) + f + '.gif' # create animated gif

                count += 1 # increase video counter for new video

                # create animated image from vieo
                animi_from_hour = from_t_h_varable.get()
                animi_from_minute = from_t_m_varable.get()
                animi_from_second = from_t_s_varable.get()

                #animi_to_hour = to_t_h_varable.get()
                #animi_to_minute = to_t_m_varable.get()
                animi_to_second = to_t_s_varable.get()

                # if the time areas are not empty and they have a digit then only the animated gif will be created 
                if((animi_from_hour != '' and animi_from_hour.isdigit()) and (animi_from_minute != '' and animi_from_minute.isdigit()) and (animi_from_second != '' and animi_from_second.isdigit()) and (animi_to_second != '' and animi_to_second.isdigit())):
                        subprocess.call(['ffmpeg', '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', f]) # resize video
                        subprocess.call(['ffmpeg', '-i', f, '-ss', animi_from_hour + ':' + animi_from_minute + ':' + animi_from_second, '-t',  animi_to_second, '-y', f_gif]) # create animated gif starting from 2 minutes and 30 seconds to the end
                        os.remove(f)
                        return 0

                # video editing part start here
                noAudio = removeAudioVal.get() # get the checkbox state for audio 

                subprocess.call(['ffmpeg', '-stream_loop', str(loop_variable.get()), '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', '-r', str(fr_variable.get()), f]) # resize, speedup and loop the video with ffmpeg
               
                if(noAudio == 1):
                        subprocess.call(['ffmpeg', '-i', f, '-c', 'copy', '-y', '-an', f2]) # remove audio from the original video
                
                if(audiofilename != '' and noAudio == 1 and newAudio.get() == 1):
                        subprocess.call(['ffmpeg', '-i', f2, '-i', audiofilename, '-shortest', '-c:v', 'copy', '-c:a', 'aac', '-b:a', '256k', '-y', f]) # add audio to the original video, trim either the audio or video depends on which one is longer

                subprocess.call(['ffmpeg', '-i', f, '-vf', 'eq=contrast=' + str(contrast_variable.get()) +':brightness='+ str(brightness_variable.get()) +':saturation=' + str(saturation_variable.get()) +':gamma='+ str(gamma_variable.get()), '-y', f2]) # adjust the saturation, gamma, contrast and brightness of video
                f3 = f + vid_format.get() # The final video format

                if(f3.split('.')[-1] != f2.split('.')[-1]):
                        subprocess.call(['ffmpeg', '-i', f2, '-y', f3]) # converting the video with ffmpeg
                        os.remove(f2) # remove two videos
                        os.remove(f)
                else:
                        os.remove(f) # remove a video

action_vid = tk.Button(buttonFrame, text="Open Video", command=openVideo)
action_vid.pack(fill=X)

win.mainloop()

Below is the new user interface of the video editing software.

The animated image user interface

Here is the animated image from part of the wild life video.

http://islandstropicalman.tumblr.com/post/182216728822/animated-gif

The gif file created from the video can be very large so be very very careful and try to reduce the duration of the animated image as much as possible.

January 22, 2019 01:14 PM UTC


Python Software Foundation

Python Software Foundation Fellow Members for Q4 2018

It's a new year and we are happy to announce our newest PSF Fellow Members!

Elana Hashman 
Github, Twitter, Blog

Alexander Hendorf 

Zachary Ware 
Github

Jeff Triplett
Github, Twitter, Website


Congratulations! Thank you for your continued contributions. We have added you to our Fellow roster online.

The above members have contributed to the Python ecosystem by maintaining popular libraries/tools, organizing Python events, contributing to CPython, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.

Let's continue to recognize Pythonistas all over the world for their impact on our community. Here's the criteria our Work Group uses to review nominations:

  • For those who have served the Python community by creating and/or maintaining various engineering/design contributions, the following statement should be true:
    • Nominated Person has served the Python community by making available code, tests, documentation, or design, either in a Python implementation or in a Python ecosystem project, that 1) shows technical excellence, 2) is an example of software engineering principles and best practices, and 3) has achieved widespread usage or acclaim.
  • For those who have served the Python community by coordinating, organizing, teaching, writing, and evangelizing, the following statement should be true:
    • Nominated Person has served the Python community through extraordinary efforts in organizing Python events, publicly promoting Python, and teaching and coordinating others. Nominated Person's efforts have shown leadership and resulted in long-lasting and substantial gains in the number and quality of Python users, and have been widely recognized as being above and beyond normal volunteering.
  • If someone is not accepted to be a fellow in the quarter they were nominated for, they will remain an active nominee for 1 year for future consideration.
  • It is suggested/recommended that the nominee have wide Python community involvement. Examples would be (not a complete list - just examples):
    • Someone who has received a Community Service Award or Distinguished Service Award
    • A developer that writes (more than one) documentation/books/tutorials for wider audience
    • Someone that helps translate (more than one) documentation/books/tutorials for better inclusivity
    • An instructor that teaches Python related tutorials in various regions
    • Someone that helps organize local meet ups and also helps organize a regional conference
  • Nominees should be aware of the Python community’s Code of Conduct and should have a record of fostering the community.
  • Sitting members of the PSF Board of Directors can be nominated if they meet the above criteria.

If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. We are accepting nominations for quarter 1 through February 20, 2019. More information is available at: https://www.python.org/psf/fellows/.


January 22, 2019 10:55 AM UTC


Vasudev Ram

Factorial one-liner using reduce and mul for Python 2 and 3


- By Vasudev Ram - Online Python training / SQL training / Linux training

$ foo bar | baz

Hi, readers,

A couple of days ago, I wrote this post for computing factorials using the reduce and operator.mul functions:

Factorial function using Python's reduce function

A bit later I realized that it can be made into a Python one-liner. Here is the one-liner - it works in both Python 2 and Python 3:
$ py -2 -c "from __future__ import print_function; from functools 
import reduce; from operator import mul; print(list(reduce(mul,
range(1, fact_num + 1)) for fact_num in range(1, 11)))"
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

$ py -3 -c "from __future__ import print_function; from functools
import reduce; from operator import mul; print(list(reduce(mul,
range(1, fact_num + 1)) for fact_num in range(1, 11)))"
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]

(I've split the commands above across multiple lines to avoid truncation while viewing, but if trying them out, enter each of the above commands on a single line.)

A small but interesting point is that one of the imports is not needed in Python 2, and the other is not needed in Python 3:

- importing print_function is not needed in Py 3, because in 3, print is a function, not a statement - but it is not an error to import it, for compatibility with Py 2 code - where it actually needs to be imported for compatibility with Py 3 code (for using print as a function), ha ha.

- importing reduce is not needed in Py 2, because in 2, reduce is both a built-in and also available in the functools module - and hence it is not an error to import it.

Because of the above two points, the same one-liner works in both Py 2 and Py 3.

Can you think of a similar Python one-liner that gives the same output as the above (and for both Py 2 and 3), but can work without one of the imports above (but by removing the same import for both Py 2 and 3)? If so, type it in a comment on the post.

py is The Python launcher for Windows.

Enjoy.


- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:




January 22, 2019 10:48 AM UTC


Codementor

How to Install Anaconda on ECS

By Arslan Ud Din Shafiq, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud...

January 22, 2019 07:03 AM UTC


Peter Bengtsson

variable_cache_control - Django view decorator to set max_age in runtime

If you use the `django.views.decorators.cache.cache_control` decorator, consider this one instead to change the `max_age` depending on the request.

January 22, 2019 03:22 AM UTC


Zaki Akhmad

Django Post Idea

Wow, it’s already 2019! And I just skipped 2018 without even a single post! For this post, I just want to write short.

So far, I have two ideas on writing a new post. They are:

Let’s hope I can write this soon. See you!

January 22, 2019 02:20 AM UTC

January 21, 2019


Stack Abuse

Big O Notation and Algorithm Analysis with Python Examples

There are multiple ways to solve a problem using a computer program. For instance, there are several ways to sort items in an array. You can use merge sort, bubble sort, insertion sort, etc. All these algorithms have their own pros and cons. An algorithm can be thought of a procedure or formula to solve a particular problem. The question is, which algorithm to use to solve a specific problem when there exist multiple solutions to the problem?

Algorithm analysis refers to the analysis of the complexity of different algorithms and finding the most efficient algorithm to solve the problem at hand. Big-O Notation is a statistical measure, used to describe the complexity of the algorithm.

In this article, we will briefly review algorithm analysis and Big-O notation. We will see how Big-O notation can be used to find algorithm complexity with the help of different Python functions.

Why is Algorithm Analysis Important?

To understand why algorithm analysis is important, we will take help of a simple example.

Suppose a manager gives a task to two of his employees to design an algorithm in Python that calculates the factorial of a number entered by the user.

The algorithm developed by the first employee looks like this:

def fact(n):  
    product = 1
    for i in range(n):
        product = product * (i+1)
    return product

print (fact(5))  

Notice that the algorithm simply takes an integer as an argument. Inside the fact function a variable named product is initialized to 1. A loop executes from 1 to N and during each iteration, the value in the product is multiplied by the number being iterated by the loop and the result is stored in the product variable again. After the loop executes, the product variable will contain the factorial.

Similarly, the second employee also developed an algorithm that calculates factorial of a number. The second employee used a recursive function to calculate the factorial of a program as shown below:

def fact2(n):  
    if n == 0:
        return 1
    else:
        return n * fact2(n-1)

print (fact2(5))  

The manager has to decide which algorithm to use. To do so, he has to find the complexity of the algorithm. One way to do so is by finding the time required to execute the algorithms.

In the Jupyter notebook, you can use the %timeit literal followed by the function call to find the time taken by the function to execute. Look at the following script:

%timeit fact(50)

Output:

9 µs ± 405 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  

The output says that the algorithm takes 9 microseconds (plus/minus 45 nanoseconds) per loop.

Similarly, execute the following script:

%timeit fact2(50)

Output:

15.7 µs ± 427 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  

The second algorithm involving recursion takes 15 microseconds (plus/minus 427 nanoseconds).

The execution time shows that the first algorithm is faster compared to the second algorithm involving recursion. This example shows the importance of algorithm analysis. In the case of large inputs, the performance difference can become more significant.

However, execution time is not a good metric to measure the complexity of an algorithm since it depends upon the hardware. A more objective complexity analysis metrics for the algorithms is needed. This is where Big O notation comes to play.

Algorithm Analysis with Big-O Notation

Big-O notation is a metrics used to find algorithm complexity. Basically, Big-O notation signifies the relationship between the input to the algorithm and the steps required to execute the algorithm. It is denoted by a big "O" followed by opening and closing parenthesis. Inside the parenthesis, the relationship between the input and the steps taken by the algorithm is presented using "n".

For instance, if there is a linear relationship between the input and the step taken by the algorithm to complete its execution, the Big-O notation used will be O(n). Similarly, the Big-O notation for quadratic functions is O(n^2)

The following are some of the most common Big-O functions:

NameBig O
ConstantO(c)
LinearO(n)
QuadraticO(n^2)
CubicO(n^3)
ExponentialO(2^n)
LogarithmicO(log(n))
Log LinearO(nlog(n))

To get an idea of how Big-O notation in is calculated, let's take a look at some examples of constant, linear, and quadratic complexity.

Constant Complexity (O(C))

The complexity of an algorithm is said to be constant if the steps required to complete the execution of an algorithm remain constant, irrespective of the number of inputs. The constant complexity is denoted by O(c) where c can be any constant number.

Let's write a simple algorithm in Python that finds the square of the first item in the list and then prints it on the screen.

def constant_algo(items):  
    result = items[0] * items[0]
    print ()

constant_algo([4, 5, 6, 8])  

In the above script, irrespective of the input size, or the number of items in the input list items, the algorithm performs only 2 steps: Finding the square of the first element and printing the result on the screen. Hence, the complexity remains constant.

If you draw a line plot with the varying size of the items input on the x-axis and the number of steps on the y-axis, you will get a straight line. To visualize this, execute the following script:

import matplotlib.pyplot as plt  
import numpy as np

x = [2, 4, 6, 8, 10, 12]

y = [2, 2, 2, 2, 2, 2]

plt.plot(x, y, 'b')  
plt.xlabel('Inputs')  
plt.ylabel('Steps')  
plt.title('Constant Complexity')  
plt.show()  

Output:

Linear Complexity (O(n))

The complexity of an algorithm is said to be linear if the steps required to complete the execution of an algorithm increase or decrease linearly with the number of inputs. Linear complexity is denoted by O(n).

In this example, let's write a simple program that displays all items in the list to the console:

def linear_algo(items):  
    for item in items:
        print(item)

linear_algo([4, 5, 6, 8])  

The complexity of the linear_algo function is linear in the above example since the number of iterations of the for-loop will be equal to the size of the input items array. For instance, if there are 4 items in the items list, the for-loop will be executed 4 times, and so on.

The plot for linear complexity with inputs on x-axis and # of steps on the x-axis is as follows:

import matplotlib.pyplot as plt  
import numpy as np

x = [2, 4, 6, 8, 10, 12]

y = [2, 4, 6, 8, 10, 12]

plt.plot(x, y, 'b')  
plt.xlabel('Inputs')  
plt.ylabel('Steps')  
plt.title('Linear Complexity')  
plt.show()  

Output:

Another point to note here is that in case of a huge number of inputs the constants become insignificant. For instance, take a look at the following script:

def linear_algo(items):  
    for item in items:
        print(item)

    for item in items:
        print(item)

linear_algo([4, 5, 6, 8])  

In the script above, there are two for-loops that iterate over the input items list. Therefore the complexity of the algorithm becomes O(2n), however in case of infinite items in the input list, the twice of infinity is still equal to infinity, therefore we can ignore the constant 2 (since it is ultimately insignificant) and the complexity of the algorithm remains O(n).

We can further verify and visualize this by plotting the inputs on x-axis and the number of steps on y-axis as shown below:

import matplotlib.pyplot as plt  
import numpy as np

x = [2, 4, 6, 8, 10, 12]

y = [4, 8, 12, 16, 20, 24]

plt.plot(x, y, 'b')  
plt.xlabel('Inputs')  
plt.ylabel('Steps')  
plt.title('Linear Complexity')  
plt.show()  

In the script above, you can clearly see that y=2n, however the output is linear and looks like this:

Quadratic Complexity (O(n^2))

The complexity of an algorithm is said to be quadratic when the steps required to execute an algorithm are a quadratic function of the number of items in the input. Quadratic complexity is denoted as O(n^2). Take a look at the following example to see a function with quadratic complexity:

def quadratic_algo(items):  
    for item in items:
        for item2 in items:
            print(item, ' ' ,item)

quadratic_algo([4, 5, 6, 8])  

In the script above, you can see that we have an outer loop that iterates through all the items in the input list and then a nested inner loop, which again iterates through all the items in the input list. The total number of steps performed is n * n, where n is the number of items in the input array.

The following graph plots the number of inputs vs the steps for an algorithm with quadratic complexity.

Finding the Complexity of Complex Functions

In the previous examples, we saw that only one function was being performed on the input. What if multiple functions are being performed on the input? Take a look at the following example.

def complex_algo(items):

    for i in range(5):
        print ("Python is awesome")

    for item in items:
        print(item)

    for item in items:
        print(item)

    print("Big O")
    print("Big O")
    print("Big O")

complex_algo([4, 5, 6, 8])  

In the script above several tasks are being performed, first, a string is printed 5 times on the console using the print statement. Next, we print the input list twice on the screen and finally, another string is being printed three times on the console. To find the complexity of such an algorithm, we need to break down the algorithm code into parts and try to find the complexity of the individual pieces.

Let's break our script down into individual parts. In the first part we have:

    for i in range(5):
        print ("Python is awesome")

The complexity of this part is O(5). Since five constant steps are being performed in this piece of code irrespective of the input.

Next, we have:

    for item in items:
        print(item)

We know the complexity of above piece of code is O(n).

Similarly, the complexity of the following piece of code is also O(n)

    for item in items:
        print(item)

Finally, in the following piece of code, a string is being printed three times, hence the complexity is O(3)

    print("Big O")
    print("Big O")
    print("Big O")

To find the overall complexity, we simply have to add these individual complexities. Let's do so:

O(5) + O(n) + O(n) + O(3)  

Simplifying above we get:

O(8) + O(2n)  

We said earlier that when the input (which has length n in this case) becomes extremely large, the constants become insignificant i.e. twice or half of the infinity still remains infinity. Therefore, we can ignore the constants. The final complexity of the algorithm will be O(n).

Worst vs Best Case Complexity

Usually, when someone asks you about the complexity of the algorithm he is asking you about the worst case complexity. To understand the best case and worse case complexity, look at the following script:

def search_algo(num, items):  
    for item in items:
        if item == num:
            return True
        else:
            return False
nums = [2, 4, 6, 8, 10]

print(search_algo(2, nums))  

In the script above, we have a function that takes a number and a list of numbers as input. It returns true if the passed number is found in the list of numbers, otherwise it returns false. If you search 2 in the list, it will be found in the first comparison. This is the best case complexity of the algorithm that the searched item is found in the first searched index. The best case complexity, in this case, is O(1). On the other hand, if you search 10, it will be found at the last searched index. The algorithm will have to search through all the items in the list, hence the worst case complexity becomes O(n).

In addition to best and worst case complexity, you can also calculate the average complexity of an algorithm, which tells you "given a random input, what is the expected time complexity of the algorithm"?

Space Complexity

In addition to the time complexity, where you count the number of steps required to complete the execution of an algorithm, you can also find space complexity which refers to the number of spaces you need to allocate in the memory space during the execution of a program.

Have a look at the following example:

def return_squares(n):  
    square_list = []
    return_squares
    for num in n:
        square_list.append(num * num)

    return square_list

nums = [2, 4, 6, 8, 10]  
print(return_squares(nums))  

In the script above, the function accepts a list of integers and returns a list with the corresponding squares of integers. The algorithm has to allocate memory for the same number of items as in the input list. Therefore, the space complexity of the algorithm becomes O(n).

Conclusion

The Big-O notation is the standard metric used to measure the complexity of an algorithm. In this article, we studied what Big-O notation is and how it can be used to measure the complexity of a variety of algorithms. We also studied different types of Big-O functions with the help of different Python examples. Finally, we briefly reviewed the worst and best case complexity along with the space complexity.

January 21, 2019 04:39 PM UTC


Filipe Saraiva

Call for Answers: Survey About Task Assignment

Professor Igor Steinmacher, from Northern Arizona University, is a proeminent researcher on several social dynamics in open source communities, like support of newcomers, gender bias, open sourcing proprietary software, and more. Some of his papers can de found in his website.

Currently, Prof. Igor is inviting mentors from open source communities to answer a survey about task assignment in projects. See below the description of the survey and take some time to answer the questions – the knowledgement obtained here can be very interesting for all of us.

Hello,

My name is Igor Steinmacher, and I am a professor at Northern Arizona University.

Along with some other researchers we are currently studying the strategies that mentors use to assign tasks to newcomers to Free/Open Source projects.Your experience is very important to us given the limited number of people that mentor or guide newcomers in FOSS projects.

You are therefore, a perfect person to get feedback for our research.

We would really appreciate if you could spare about 5 minutes of your time to answer a brief survey about your experiences.
The survey is here: https://goo.gl/forms/qCzgoG3Uc4O0w9da2I would like to emphasize that, if shared, your insights will play a prominent role in creating a better understanding of the mentors’ strategies to assignment tasks for newcomers, serving as input for heuristics, and helping other mentors. Thank you very much in advance for your time, and please contact me if you have any question.

Regards,

Igor Steinmacher

January 21, 2019 02:39 PM UTC


Podcast.__init__

Counteracting Code Complexity With Wily

As we build software projects, complexity and technical debt are bound to creep into our code. To counteract these tendencies it is necessary to calculate and track metrics that highlight areas of improvement so that they can be acted on. To aid in identifying areas of your application that are breeding grounds for incidental complexity Anthony Shaw created Wily. In this episode he explains how Wily traverses the history of your repository and computes code complexity metrics over time and how you can use that information to guide your refactoring efforts.

Summary

As we build software projects, complexity and technical debt are bound to creep into our code. To counteract these tendencies it is necessary to calculate and track metrics that highlight areas of improvement so that they can be acted on. To aid in identifying areas of your application that are breeding grounds for incidental complexity Anthony Shaw created Wily. In this episode he explains how Wily traverses the history of your repository and computes code complexity metrics over time and how you can use that information to guide your refactoring efforts.

Preface

  • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
  • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • And to keep track of how your team is progressing on building new features and squashing bugs, you need a project management system designed by software engineers, for software engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. Podcast.__init__ listeners get 2 months free on any plan by going to pythonpodcast.com/clubhouse today and signing up for a trial.
  • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
  • To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media.
  • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
  • Your host as usual is Tobias Macey and today I’m interviewing Anthony Shaw about Wily, a command-line application for tracking and reporting on complexity of Python tests and applications

Interview

  • Introductions
  • How did you get introduced to Python?
  • Can you start by describing what Wily is and what motivated you to create it?
  • What is software complexity and why should developers care about it?
    • What are some methods for measuring complexity?
  • I know that Python has the McCabe tool, but what other methods are there for determining complexity, both in Python and for other languages?
  • What kinds of useful signals can you derive from evaluating historical trends of complexity in a codebase?
  • What are some other useful metrics for tracking and maintaining the health of a software project?
  • Once you have established the points of complexity in your software, what are some strategies for remediating it?
  • What are your favorite tools for refactoring?
  • What are some of the aspects of developer-oriented tools that you have found to be most important in your own projects?
  • What are your plans for the future of Wily, or any other tools that you have in mind to aid in producing healthy software?

Keep In Touch

Picks

Links

The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

January 21, 2019 01:10 PM UTC


The Digital Cat

Clean Architectures in Python: the book

UPDATE: a Russian translation is in the works!

UPDATE: version 1.0.3 is out! Some readers unsurprisingly spotted typos and bad grammar and were so kind to submit fixes.

I'm excited to announce that the success of the post on clean architectures encouraged me to expand the subject and to write a book that I titled "Clean Architectures in Python. A practical approach to better software design".

The book contains a complete introduction to TDD and clean architectures, two topics that I believe are strictly interconnected. The book is 170 pages long and it is complete, at least for a first edition, but I am already planning to add content that could not fit in this release for several reasons (mostly because still unclear in my mind).

Cover

The book is available for free on Leanpub. If you enjoy it, please tweet about it with the #pycabook hashtag.

So far more than 3,100 readers downloaded the book. Thank you all!


The book will soon be translated into Russian by Алексей Пыльцын (Alexey Pyltsyn), who already worked on the translation of technical books like "The Majesty of Vue.js 2" and "The Road to learn React".

Alexey is a web developer and maintainer of the official PHP documentation in Russian. His website is https://lex111.ru/.

Cover

If you are interested you can show you support on the Leanpub page

January 21, 2019 01:00 PM UTC


Codementor

Temperature Monitoring with Raspberry Pi and Alibaba Cloud IoT Platform

By Amit Maity, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud...

January 21, 2019 06:48 AM UTC


Mike Driscoll

PyDev of the Week: Nina Zakharenko

This week we welcome Nina Zakharenko (@nnja) as our PyDev of the Week! Nina has been active in the Python community for several years and has spoken or keynoted dozens of conferences. She has also contributed to the Python core language! If you’d like to see what she is up to, check out her blog. Let’s spend a few moments getting to know Nina!

Can you tell us a little about yourself (hobbies, education, etc):

I’m embarrassed to admit it, but the mid-90s cyberpunk movie Hackers – about a group of hackers framed for deploying a world-threatening computer virus – inspired me to become a programmer at a very young age. Even more embarrassing – I owned a pair of rollerblades growing up. When I was 12, I learned HTML to make websites by reading and deconstructing the source code of sites I visited, and I slowly became more engrossed in technology. As an adult, I studied Computer Science in college, and since then I’ve held a variety of exciting jobs at companies like HBO writing software for satellite control computers, to working for companies like Meetup and at Reddit. This spring, I joined the incredible Cloud Developer Advocacy team at Microsoft as the first Advocate entirely devoted to Python. I love teaching and public speaking. In my spare time I like to snowboard, hike, travel, and tinker with microcontrollers and wearable electronics, such as these Python powered earrings. I tweet at @nnja and occasionally blog and post my talks at nnja.io.

Why did you start using Python?

I started using Python in 2012 for small scripts and internal tools and eventually started using Python to work on the meetup.com API. In 2013 I went to my first PyCon in Santa Clara. Back then, I was writing Java full-time. I was afraid that I’d get made fun of at the conference for being new to Python, but the opposite was true. So many members of the community welcomed me with open arms. I was totally blown away. I also realized that Python was for so much more than scripting – that it was a powerful first-class language used by some of the top companies in the world. Eventually, I quit my job as a Java Developer and spent a summer at the Recurse Center (known then as Hacker School, a developer retreat in NYC). I focused on learning new tools and languages, dabbling with machine learning, and teaching myself Python. At the Recurse Center, I was able to fall in love with programming all over again. Presently I’ve been writing Python professionally for five years for a variety of companies. The work is much more interesting, and the brevity of whitespace, naming conventions, and simplicity was a breath of fresh air after the verbosity of Java.

What other programming languages do you know and which is your favorite?

Professionally I’ve been lucky enough to work in a wide variety of languages, frameworks, and technologies. At various points in my career, I’ve been paid to write C++, Javascript, Java, Python, and Clojure. I’ve briefly dabbled in Go, Lisp, Haskell, and Octave for fun. Despite all my experience, Python is still hands down my favorite. It’s just fun to write, fun to learn, and is an exceptionally powerful teaching tool. No matter how long I work in it, I feel like there’s always more to learn.

What projects are you working on now?

Since starting as a developer advocate, I’m no longer in a position where I work on several large codebases. Instead, I’m working on a ton of different projects. I got my first (tiny) patch accepted into CPython while sprinting with Łukasz Langa at PyCon, and plan on working with a mentor to continue contributing to CPython in the future. I’ve been expanding and contributing to a project called loco to help bring our mostly remote team together, by displaying where in the world someone on the team is at during any given time. I’ve also been working on software and tools that take advantage of Azure and its Python friendly offerings, like Python serverless functions, or deploying web apps on Linux to App Service, a highly scalable, self-patching web hosting service.

Which Python libraries are your favorite (core or 3rd party)?

In no particular order:

  • ipdb — all the usefulness of the pdb debugger, but bonus features like tab completion and syntax highlighting!
  • ipython — ipython is a powerful interactive shell, complete with nice-to-have features missing from the default Python interactive interpreter such as excellent tab completion, the ability to easily edit classes and functions when editing via the up arrow, and magic functions that allow you to do things like paste chunks of code, open a full-fledged editor, or export your history.
  • black – black is the uncompromising Python code formatter. It’s a tool with practically no configuration options, so it removed any room for ambiguity. I love using the black formatter alongside the Python extension for VS Code.
  • micropython – MicroPython is an efficient implementation of Python 3 that includes a small subset of the standard library. The codebase is small enough to run on microcontrollers and can be used for incredible and educational electronics projects.
  • azure cli – the azure cli is a suite of command line tools for Azure, written in Python. Learn more about it, or contribute back to the open source project.
  • agithub – makes it so easy to prototype when working with a RESTful API. The project is looking for help and contributions. If you’re interested in getting started with open source, it’s a great opportunity.

How did you get started as a conference speaker on Python / tech topics

There’s a funny story behind how I got started that involves some mischievous friends. My second Python conference was PyCon Canada in 2013. I was chatting with friends about how impressed I was with the speakers, and how I wished I would have the nerve to speak on stage one day. I had presented my projects at a handful of local meetups, but never to such a large audience. At the conference, my friends found out that someone had dropped out of a lightning talk that was happening within in the next hour. They wrote my name down in the empty slot, and told me I had an hour to put some slides together and get a 5-minute talk ready. I was so nervous! I shook with fear throughout the whole thing, and could barely breathe. When it was over, I realized that it wasn’t so bad. I got through it. The following year I spoke at DjangoCon 2014, followed by PyCon 2015 on technical debt. The talk went very well and was wildly popular. Since then, I’ve spoken at and keynoted dozens of conferences all over the world. That little voice in my head held me back and told me that I wasn’t a good public speaker, that I didn’t have anything to say, that everyone in the room knew more than me. It was a total fallacy, a myth I had perpetuated myself. Public speaking, like everything else in life, is something that you get better at with practice.

Do you have any advice for people who want to speak at conferences or user groups?

My advice is – Don’t be afraid to throw your hat into the ring. “I have nothing new to say, it’s all been covered” is a myth that I hear beginners perpetuate as a way to talk themselves out of speaking. That’s just not true. Just because a topic has been covered in a talk before doesn’t mean it’s off-limits. What an audience is genuinely interested in is your unique perspective, your story, and the way you tell it. Storytelling is as much a part of a great talk as technical knowledge. Brandon Rhodes is one of the best storytellers in our community, and his talks are a great resource for becoming familiar with the technique. Some of the most interesting talks center on how you came upon a problem, the steps (especially incorrect ones!) taken in an attempt to fix it, and how you finally came to the correct solution. Just like a regular story, it has a beginning, a middle, and an end. If you find large stages intimidating, start small at local meetups or events, or even in front of one or two friends. Lastly, don’t be afraid of rejection. Submit as many CFPs as you can. Always ask for feedback on rejected talks, and use those suggestions to improve your proposals and try again at other events. Rejection stings, but don’t take it personally. Dust yourself off and try again.

Second, many conferences offer scholarships to folks underrepresented in tech, both to attendees and to speakers. If you need financial assistance, don’t be afraid to ask and use that opportunity. If you’re on the opposite side of that coin and you or your company are in a position to, some conferences allow you to pay extra for a ticket to sponsor a diversity ticket, or to sponsor the conference itself. Accepting and welcoming diversity in age, gender, race, ability, experience, ethnicity, and many other factors is the future of our industry, and where fresh new ideas will come from. Our community grows stronger when we’re accepting and welcoming to other people.

Is there anything else you’d like to say?

I’ve been lucky enough to have the opportunity to be mentored throughout my career by an incredible group of caring people. I’m now in a position to do the same. If you’re an underrepresented person in tech and you need advice, help, or mentorship, my twitter DMs are open. I’m happy to talk about a wide variety of topics – imposter syndrome, career advice, tech help, help preparing a CFP or working on a talk, or walking through code. If I’m not able to help you, I’ll do my best to connect you with someone who can. If you’d like to hear more about my thoughts on mentorship, you can catch me as a guest on episode 44 of the Test and Code podcast.

Thanks for doing the interview, Nina!

January 21, 2019 06:05 AM UTC


Real Python

Working With Files in Python

Python has several built-in modules and functions for handling files. These functions are spread out over several modules such as os, os.path, shutil, and pathlib, to name a few. This article gathers in one place many of the functions you need to know in order to perform the most common operations on files in Python.

In this tutorial, you’ll learn how to:

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Reading and Writing Data to Files in Python

Reading and writing data to files using Python is pretty straightforward. To do this, you must first open files in the appropriate mode. Here’s an example of how to open a text file and read its contents:

with open('data.txt', 'r') as f:
    data = f.read()

open() takes a filename and a mode as its arguments. r opens the file in read only mode. To write data to a file, pass in w as an argument instead:

with open('data.txt', 'w') as f:
    data = 'some data to be written to the file'
    f.write(data)

In the examples above, open() opens files for reading or writing and returns a file handle (f in this case) that provides methods that can be used to read or write data to the file. Read Working With File I/O in Python for more information on how to read and write to files.

Getting a Directory Listing

Suppose your current working directory has a subdirectory called my_directory that has the following contents:

.
├── file1.py
├── file2.csv
├── file3.txt
├── sub_dir
│   ├── bar.py
│   └── foo.py
├── sub_dir_b
│   └── file4.txt
└── sub_dir_c
    ├── config.py
    └── file5.txt

The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os.listdir() in legacy versions of Python or os.scandir() in Python 3.x. os.scandir() is the preferred method to use if you also want to get file and directory properties such as file size and modification date.

Directory Listing in Legacy Python Versions

In versions of Python prior to Python 3, os.listdir() is the method to use to get a directory listing:

>>>
>>> import os
>>> entries = os.listdir('my_directory/')

os.listdir() returns a Python list containing the names of the files and subdirectories in the directory given by the path argument:

>>>
>>> os.listdir('my_directory/')
['sub_dir_c', 'file1.py', 'sub_dir_b', 'file3.txt', 'file2.csv', 'sub_dir']

A directory listing like that isn’t easy to read. Printing out the output of a call to os.listdir() using a loop helps clean things up:

>>>
>>> entries = os.listdir('my_directory/')
>>> for entry in entries:
...     print(entry)
...
...
sub_dir_c
file1.py
sub_dir_b
file3.txt
file2.csv
sub_dir

Directory Listing in Modern Python Versions

In modern versions of Python, an alternative to os.listdir() is to use os.scandir() and pathlib.Path().

os.scandir() was introduced in Python 3.5 and is documented in PEP 471. os.scandir() returns an iterator as opposed to a list when called:

>>>
>>> import os
>>> entries = os.scandir('my_directory/')
>>> entries
<posix.ScandirIterator object at 0x7f5b047f3690>

The ScandirIterator points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:

import os

with os.scandir('my_directory/') as entries:
    for entry in entries:
        print(entry.name)

Here, os.scandir() is used in conjunction with the with statement because it supports the context manager protocol. Using a context manager closes the iterator and frees up acquired resources automatically after the iterator has been exhausted. The result is a print out of the filenames in my_directory/ just like you saw in the os.listdir() example:

sub_dir_c
file1.py
sub_dir_b
file3.txt
file2.csv
sub_dir

Another way to get a directory listing is to use the pathlib module:

from pathlib import Path


entries = Path('my_directory/')
for entry in entries.iterdir():
    print(entry.name)

The objects returned by Path are either PosixPath or WindowsPath objects depending on the OS.

pathlib.Path() objects have an .iterdir() method for creating an iterator of all files and folders in a directory. Each entry yielded by .iterdir() contains information about the file or directory such as its name and file attributes. pathlib was first introduced in Python 3.4 and is a great addition to Python that provides an object oriented interface to the filesystem.

In the example above, you call pathlib.Path() and pass a path argument to it. Next is the call to .iterdir() to get a list of all files and directories in my_directory.

pathlib offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way. Using pathlib is more if not equally efficient as using the functions in os. Another benefit of using pathlib over os is that it reduces the number of imports you need to make to manipulate filesystem paths. For more information, read Python 3’s pathlib Module: Taming the File System.

Running the code above produces the following:

sub_dir_c
file1.py
sub_dir_b
file3.txt
file2.csv
sub_dir

Using pathlib.Path() or os.scandir() instead of os.listdir() is the preferred way of getting a directory listing, especially when you’re working with code that needs the file type and file attribute information. pathlib.Path() offers much of the file and path handling functionality found in os and shutil, and it’s methods are more efficient than some found in these modules. We will discuss how to get file properties shortly.

Here are the directory-listing functions again:

Function Description
os.listdir() Returns a list of all files and folders in a directory
os.scandir() Returns an iterator of all the objects in a directory including file attribute information
pathlib.Path.iterdir() Returns an iterator of all the objects in a directory including file attribute information

These functions return a list of everything in the directory, including subdirectories. This might not always be the behavior you want. The next section will show you how to filter the results from a directory listing.

Listing All Files in a Directory

This section will show you how to print out the names of files in a directory using os.listdir(), os.scandir(), and pathlib.Path(). To filter out directories and only list files from a directory listing produced by os.listdir(), use os.path:

import os


# List all files in a directory using os.listdir
basepath = 'my_directory/'
for entry in os.listdir(basepath):
    if os.path.isfile(os.path.join(basepath, entry)):
        print(entry)

Here, the call to os.listdir() returns a list of everything in the specified path, and then that list is filtered by os.path.isfile() to only print out files and not directories. This produces the following output:

file1.py
file3.txt
file2.csv

An easier way to list files in a directory is to use os.scandir() or pathlib.Path():

import os


# List all files in a directory using scandir()
basepath = 'my_directory/'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)

Using os.scandir() has the advantage of looking cleaner and being easier to understand than using os.listdir(), even though it is one line of code longer. Calling entry.is_file() on each item in the ScandirIterator returns True if the object is a file. Printing out the names of all files in the directory gives you the following output:

file1.py
file3.txt
file2.csv

Here’s how to list files in a directory using pathlib.Path():

from pathlib import Path

basepath = Path('my_directory/')
files_in_basepath = basepath.iterdir()
for item in files_in_basepath:
    if item.is_file():
        print(item.name)

Here, you call .is_file() on each entry yielded by .iterdir(). The output produced is the same:

file1.py
file3.txt
file2.csv

The code above can be made more concise if you combine the for loop and the if statement into a single generator expression. Dan Bader has an excellent article on generator expressions and list comprehensions.

The modified version looks like this:

from pathlib import Path


# List all files in directory using pathlib
basepath = Path('my_directory/')
files_in_basepath = (entry for entry in basepath.iterdir() if entry.is_file())
for item in files_in_basepath:
    print(item.name)

This produces exactly the same output as the example before it. This section showed that filtering files or directories using os.scandir() and pathlib.Path() feels more intuitive and looks cleaner than using os.listdir() in conjunction with os.path.

Listing Subdirectories

To list subdirectories instead of files, use one of the methods below. Here’s how to use os.listdir() and os.path():

import os


# List all subdirectories using os.listdir
basepath = 'my_directory/'
for entry in os.listdir(basepath):
    if os.path.isdir(os.path.join(basepath, entry)):
        print(entry)

Manipulating filesystem paths this way can quickly become cumbersome when you have multiple calls to os.path.join(). Running this on my computer produces the following output:

sub_dir_c
sub_dir_b
sub_dir

Here’s how to use os.scandir():

import os


# List all subdirectories using scandir()
basepath = 'my_directory/'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)

As in the file listing example, here you call .is_dir() on each entry returned by os.scandir(). If the entry is a directory, .is_dir() returns True, and the directory’s name is printed out. The output is the same as above:

sub_dir_c
sub_dir_b
sub_dir

Here’s how to use pathlib.Path():

from pathlib import Path


# List all subdirectory using pathlib
basepath = Path('my_directory/')
for entry in basepath.iterdir():
    if entry.is_dir():
        print(entry.name)

Calling .is_dir() on each entry of the basepath iterator checks if an entry is a file or a directory. If the entry is a directory, its name is printed out to the screen, and the output produced is the same as the one from the previous example:

sub_dir_c
sub_dir_b
sub_dir

Getting File Attributes

Python makes retrieving file attributes such as file size and modified times easy. This is done through os.stat(), os.scandir(), or pathlib.Path().

os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined. This can be potentially more efficient than using os.listdir() to list files and then getting file attribute information for each file.

The examples below show how to get the time the files in my_directory/ were last modified. The output is in seconds:

>>>
>>> import os
>>> with os.scandir('my_directory/') as dir_contents:
...     for entry in dir_contents:
...         info = entry.stat()
...         print(info.st_mtime)
...
1539032199.0052035
1539032469.6324475
1538998552.2402923
1540233322.4009316
1537192240.0497339
1540266380.3434134

os.scandir() returns a ScandirIterator object. Each entry in a ScandirIterator object has a .stat() method that retrieves information about the file or directory it points to. .stat() provides information such as file size and the time of last modification. In the example above, the code prints out the st_mtime attribute, which is the time the content of the file was last modified.

The pathlib module has corresponding methods for retrieving file information that give the same results:

>>>
>>> from pathlib import Path
>>> current_dir = Path('my_directory')
>>> for path in current_dir.iterdir():
...     info = path.stat()
...     print(info.st_mtime)
...
1539032199.0052035
1539032469.6324475
1538998552.2402923
1540233322.4009316
1537192240.0497339
1540266380.3434134

In the example above, the code loops through the object returned by .iterdir() and retrieves file attributes through a .stat() call for each file in the directory list. The st_mtime attribute returns a float value that represents seconds since the epoch. To convert the values returned by st_mtime for display purposes, you could write a helper function to convert the seconds into a datetime object:

from datetime import datetime
from os import scandir

def convert_date(timestamp):
    d = datetime.utcfromtimestamp(timestamp)
    formated_date = d.strftime('%d %b %Y')
    return formated_date

def get_files():
    dir_entries = scandir('my_directory/')
    for entry in dir_entries:
        if entry.is_file():
            info = entry.stat()
            print(f'{entry.name}\t Last Modified: {convert_date(info.st_mtime)}')

This will first get a list of files in my_directory and their attributes and then call convert_date() to convert each file’s last modified time into a human readable form. convert_date() makes use of .strftime() to convert the time in seconds into a string.

The arguments passed to .strftime() are the following:

Together, these directives produce output that looks like this:

>>>
>>> get_files()
file1.py        Last modified:  04 Oct 2018
file3.txt       Last modified:  17 Sep 2018
file2.txt       Last modified:  17 Sep 2018

The syntax for converting dates and times into strings can be quite confusing. To read more about it, check out the official documentation on it. Another handy reference that is easy to remember is http://strftime.org/ .

Making Directories

Sooner or later, the programs you write will have to create directories in order to store data in them. os and pathlib include functions for creating directories. We’ll consider these:

Function Description
os.mkdir() Creates a single subdirectory
pathlib.Path.mkdir() Creates single or multiple directories
os.makedirs() Creates multiple directories, including intermediate directories

Creating a Single Directory

To create a single directory, pass a path to the directory as a parameter to os.mkdir():

import os


os.mkdir('example_directory/')

If a directory already exists, os.mkdir() raises FileExistsError. Alternatively, you can create a directory using pathlib:

from pathlib import Path


p = Path('example_directory/')
p.mkdir()

If the path already exists, mkdir() raises a FileExistsError:

>>>
>>> p.mkdir()
Traceback (most recent call last):
  File '<stdin>', line 1, in <module>
  File '/usr/lib/python3.5/pathlib.py', line 1214, in mkdir
    self._accessor.mkdir(self, mode)
  File '/usr/lib/python3.5/pathlib.py', line 371, in wrapped
    return strfunc(str(pathobj), *args)
FileExistsError: [Errno 17] File exists: '.'
[Errno 17] File exists: '.'

To avoid errors like this, catch the error when it happens and let your user know:

from pathlin Path


p = Path('example_directory')
try:
    p.mkdir()
except FileExistsError as exc:
    print(exc)

Alternatively, you can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir():

from pathlib import Path


p = Path('example_directory')
p.mkdir(exist_ok=True)

This will not raise an error if the directory already exists.

Creating Multiple Directories

os.makedirs() is similar to os.mkdir(). The difference between the two is that not only can os.makedirs() create individual directories, it can also be used to create directory trees. In other words, it can create any necessary intermediate folders in order to ensure a full path exists.

os.makedirs() is similar to running mkdir -p in Bash. For example, to create a group of directories like 2018/10/05, all you have to do is the following:

import os


os.makedirs('2018/10/05')

This will create a nested directory structure that contains the folders 2018, 10, and 05:

.
└── 2018
    └── 10
        └── 05

.makedirs() creates directories with default permissions. If you need to create directories with different permissions call .makedirs() and pass in the mode you would like the directories to be created in:

import os


os.makedirs('2018/10/05', mode=0o770)

This creates the 2018/10/05 directory structure and gives the owner and group users read, write, and execute permissions. The default mode is 0o777, and the file permission bits of existing parent directories are not changed. For more details on file permissions, and how the mode is applied, see the docs.

Run tree to confirm that the right permissions were applied:

$ tree -p -i .
.
[drwxrwx---]  2018
[drwxrwx---]  10
[drwxrwx---]  05

This prints out a directory tree of the current directory. tree is normally used to list contents of directories in a tree-like format. Passing the -p and -i arguments to it prints out the directory names and their file permission information in a vertical list. -p prints out the file permissions, and -i makes tree produce a vertical list without indentation lines.

As you can see, all of the directories have 770 permissions. An alternative way to create directories is to use .mkdir() from pathlib.Path:

import pathlib


p = pathlib.Path('2018/10/05')
p.mkdir(parents=True)

Passing parents=True to Path.mkdir() makes it create the directory 05 and any parent directories necessary to make the path valid.

By default, os.makedirs() and Path.mkdir() raise an OSError if the target directory already exists. This behavior can be overridden (as of Python 3.2) by passing exist_ok=True as a keyword argument when calling each function.

Running the code above produces a directory structure like the one below in one go:

.
└── 2018
    └── 10
        └── 05

I prefer using pathlib when creating directories because I can use the same function to create single or nested directories.

Filename Pattern Matching

After getting a list of files in a directory using one of the methods above, you will most probably want to search for files that match a particular pattern.

These are the methods and functions available to you:

Each of these is discussed below. The examples in this section will be performed on a directory called some_directory that has the following structure:

.
├── admin.py
├── data_01_backup.txt
├── data_01.txt
├── data_02_backup.txt
├── data_02.txt
├── data_03_backup.txt
├── data_03.txt
├── sub_dir
│   ├── file1.py
│   └── file2.py
└── tests.py

1 directory, 10 files

If you’re following along using a Bash shell, you can create the above directory structure using the following commands:

$ mkdir some_directory
$ cd some_directory/
$ mkdir sub_dir
$ touch sub_dir/file1.py sub_dir/file2.py
$ touch data_{01..03}.txt data_{01..03}_backup.txt admin.py tests.py

This will create the some_directory/ directory, change into it, and then create sub_dir. The next line creates file1.py and file2.py in sub_dir, and the last line creates all the other files using expansion. To learn more about shell expansion, visit this site.

Using String Methods

Python has several built-in methods for modifying and manipulating strings. Two of these methods, .startswith() and .endswith(), are useful when you’re searching for patterns in filenames. To do this, first get a directory listing and then iterate over it:

>>>
>>> import os

>>> # Get .txt files
>>> for f_name in os.listdir('some_directory'):
...     if f_name.endswith('.txt'):
...         print(f_name)

The code above finds all the files in some_directory/, iterates over them and uses .endswith() to print out the filenames that have the .txt file extension. Running this on my computer produces the following output:

data_01.txt
data_03.txt
data_03_backup.txt
data_02_backup.txt
data_02.txt
data_01_backup.txt

Simple Filename Pattern Matching Using fnmatch

String methods are limited in their matching abilities. fnmatch has more advanced functions and methods for pattern matching. We will consider fnmatch.fnmatch(), a function that supports the use of wildcards such as * and ? to match filenames. For example, in order to find all .txt files in a directory using fnmatch, you would do the following:

>>>
>>> import os
>>> import fnmatch


for file_name in os.listdir('some_directory/'):
    if fnmatch.fnmatch(file_name, '*.txt'):
        print(file_name)

This iterates over the list of files in some_directory and uses .fnmatch() to perform a wildcard search for files that have the .txt extension.

More Advanced Pattern Matching

Let’s suppose you want to find .txt files that meet certain criteria. For example, you could be only interested in finding .txt files that contain the word data, a number between a set of underscores, and the word backup in their filename. Something similar to data_01_backup, data_02_backup, or data_03_backup.

Using fnmatch.fnmatch(), you could do it this way:

>>>
>>> for filename in os.listdir('.'):
...     if fnmatch.fnmatch(filename, 'data_*_backup.txt'):
...         print(filename)

Here, you print only the names of files that match the data_*_backup.txt pattern. The asterisk in the pattern will match any character, so running this will find all text files whose filenames start with the word data and end in backup.txt, as you can see from the output below:

data_03_backup.txt
data_02_backup.txt
data_01_backup.txt

Filename Pattern Matching Using glob

Another useful module for pattern matching is glob.

.glob() in the glob module works just like fnmatch.fnmatch(), but unlike fnmatch.fnmatch(), it treats files beginning with a period (.) as special.

UNIX and related systems translate name patterns with wildcards like ? and * into a list of files. This is called globbing.

For example, typing mv *.py python_files/ in a UNIX shell moves (mv) all files with the .py extension from the current directory to the directory python_files. The * character is a wildcard that means “any number of characters,” and *.py is the glob pattern. This shell capability is not available in the Windows Operating System. The glob module adds this capability in Python, which enables Windows programs to use this feature.

Here’s an example of how to use glob to search for all Python (.py) source files in the current directory:

>>>
>>> import glob
>>> glob.glob('*.py')
['admin.py', 'tests.py']

glob.glob('*.py') searches for all files that have the .py extension in the current directory and returns them as a list. glob also supports shell-style wildcards to match patterns:

>>>
>>> import glob
>>> for name in glob.glob('*[0-9]*.txt'):
...     print(name)

This finds all text (.txt) files that contain digits in the filename:

data_01.txt
data_03.txt
data_03_backup.txt
data_02_backup.txt
data_02.txt
data_01_backup.txt

glob makes it easy to search for files recursively in subdirectories too:

>>>
>>> import glob
>>> for file in glob.iglob('**/*.py', recursive=True):
...     print(file)

This example makes use of glob.iglob() to search for .py files in the current directory and subdirectories. Passing recursive=True as an argument to .iglob() makes it search for .py files in the current directory and any subdirectories. The difference between glob.iglob() and glob.glob() is that .iglob() returns an iterator instead of a list.

Running the program above produces the following:

admin.py
tests.py
sub_dir/file1.py
sub_dir/file2.py

pathlib contains similar methods for making flexible file listings. The example below shows how you can use .Path.glob() to list file types that start with the letter p:

>>>
>>> from pathlib import Path
>>> p = Path('.')
>>> for name in p.glob('*.p*'):
...     print(name)

admin.py
scraper.py
docs.pdf

Calling p.glob('*.p*') returns a generator object that points to all files in the current directory that start with the letter p in their file extension.

Path.glob() is similar to os.glob() discussed above. As you can see, pathlib combines many of the best features of the os, os.path, and glob modules into one single module, which makes it a joy to use.

To recap, here is a table of the functions we have covered in this section:

Function Description
startswith() Tests if a string starts with a specified pattern and returns True or False
endswith() Tests if a string ends with a specified pattern and returns True or False
fnmatch.fnmatch(filename, pattern) Tests whether the filename matches the pattern and returns True or False
glob.glob() Returns a list of filenames that match a pattern
pathlib.Path.glob() Finds patterns in path names and returns a generator object

Traversing Directories and Processing Files

A common programming task is walking a directory tree and processing files in the tree. Let’s explore how the built-in Python function os.walk() can be used to do this. os.walk() is used to generate filename in a directory tree by walking the tree either top-down or bottom-up. For the purposes of this section, we’ll be manipulating the following directory tree:

.
├── folder_1
│   ├── file1.py
│   ├── file2.py
│   └── file3.py
├── folder_2
│   ├── file4.py
│   ├── file5.py
│   └── file6.py
├── test1.txt
└── test2.txt

The following is an example that shows you how to list all files and directories in a directory tree using os.walk().

os.walk() defaults to traversing directories in a top-down manner:

# Walking a directory tree and printing the names of the directories and files
for dirpath, dirnames, files in os.walk('.'):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)

os.walk() returns three values on each iteration of the loop:

  1. The name of the current folder

  2. A list of folders in the current folder

  3. A list of files in the current folder

On each iteration, it prints out the names of the subdirectories and files it finds:

Found directory: .
test1.txt
test2.txt
Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py

To traverse the directory tree in a bottom-up manner, pass in a topdown=False keyword argument to os.walk():

for dirpath, dirnames, files in os.walk('.', topdown=False):
    print(f'Found directory: {dirpath}')
    for file_name in files:
        print(file_name)

Passing the topdown=False argument will make os.walk() print out the files it finds in the subdirectories first:

Found directory: ./folder_1
file1.py
file3.py
file2.py
Found directory: ./folder_2
file4.py
file5.py
file6.py
Found directory: .
test1.txt
test2.txt

As you can see, the program started by listing the contents of the subdirectories before listing the contents of the root directory. This is very useful in situations where you want to recursively delete files and directories. You will learn how to do this in the sections below. By default, os.walk does not walk down into symbolic links that resolve to directories. This behavior can be overridden by calling it with a followlinks=True argument.

Making Temporary Files and Directories

Python provides a handy module for creating temporary files and directories called tempfile.

tempfile can be used to open and store data temporarily in a file or directory while your program is running. tempfile handles the deletion of the temporary files when your program is done with them.

Here’s how to create a temporary file:

 from tempfile import TemporaryFile


 # Create a temporary file and write some data to it
 fp = TemporaryFile('w+t')
 fp.write('Hello universe!')
 # Go back to the beginning and read data from file
 fp.seek(0)
 data = fp.read()
 # Close the file, after which it will be removed
 fp.close()

The first step is to import TemporaryFile from the tempfile module. Next, create a file like object using the TemporaryFile() method by calling it and passing the mode you want to open the file in. This will create and open a file that can be used as a temporary storage area.

In the example above, the mode is 'w+t', which makes tempfile create a temporary text file in write mode. There is no need to give the temporary file a filename since it will be destroyed after the script is done running.

After writing to the file, you can read from it and close it when you’re done processing it. Once the file is closed, it will be deleted from the filesystem. If you need to name the temporary files produced using tempfile, use tempfile.NamedTemporaryFile().

The temporary files and directories created using tempfile are stored in a special system directory for storing temporary files. Python searches a standard list of directories to find one that the user can create files in.

On Windows, the directories are C:\TEMP, C:\TMP, \TEMP, and \TMP, in that order. On all other platforms, the directories are /tmp, /var/tmp, and /usr/tmp, in that order. As a last resort, tempfile will save temporary files and directories in the current directory.

.TemporaryFile() is also a context manager so it can be used in conjunction with the with statement. Using a context manager takes care of closing and deleting the file automatically after it has been read:

with TemporaryFile('w+t') as fp:
    fp.write('Hello universe!')
    fp.seek(0)
    fp.read()
# File is now closed and removed

This creates a temporary file and reads data from it. As soon as the file’s contents are read, the temporary file is closed and deleted from the file system.

tempfile can also be used to create temporary directories. Let’s look at how you can do this using tempfile.TemporaryDirectory():

>>>
>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmpdir:
...     print('Created temporary directory ', tmpdir)
...     os.path.exists(tmpdir)
...
Created temporary directory  /tmp/tmpoxbkrm6c
True

>>> # Directory contents have been removed
...
>>> tmpdir
'/tmp/tmpoxbkrm6c'
>>> os.path.exists(tmpdir)
False

Calling tempfile.TemporaryDirectory() creates a temporary directory in the file system and returns an object representing this directory. In the example above, the directory is created using a context manager, and the name of the directory is stored in tmpdir. The third line prints out the name of the temporary directory, and os.path.exists(tmpdir) confirms if the directory was actually created in the file system.

After the context manager goes out of context, the temporary directory is deleted and a call to os.path.exists(tmpdir) returns False, which means that the directory was succesfully deleted.

Deleting Files and Directories

You can delete single files, directories, and entire directory trees using the methods found in the os, shutil, and pathlib modules. The following sections describe how to delete files and directories that you no longer need.

Deleting Files in Python

To delete a single file, use pathlib.Path.unlink(), os.remove(). or os.unlink().

os.remove() and os.unlink() are semantically identical. To delete a file using os.remove(), do the following:

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.remove(data_file)

Deleting a file using os.unlink() is similar to how you do it using os.remove():

import os

data_file = 'C:\\Users\\vuyisile\\Desktop\\Test\\data.txt'
os.unlink(data_file)

Calling .unlink() or .remove() on a file deletes the file from the filesystem. These two functions will throw an OSError if the path passed to them points to a directory instead of a file. To avoid this, you can either check that what you’re trying to delete is actually a file and only delete it if it is, or you can use exception handling to handle the OSError:

import os


data_file = 'home/data.txt'

# If the file exists, delete it
if os.path.is_file(data_file):
    os.remove(data_file)
else:
    print(f'Error: {data_file} not a valid filename')

os.path.is_file() checks whether data_file is actually a file. If it is, it is deleted by the call to os.remove(). If data_file points to a folder, an error message is printed to the console.

The following example shows how to use exception handling to handle errors when deleting files:

import os


data_file = 'home/data.txt'

# Use exception handling
try:
    os.remove(data_file)
except OSError as e:
    print(f'Error: {data_file} : {e.strerror}')

The code above attempts to delete the file first before checking its type. If data_file isn’t actually a file, the OSError that is thrown is handled in the except clause, and an error message is printed to the console. The error message that gets printed out is formatted using Python f-strings.

Finally, you can also use pathlib.Path.unlink() to delete files:

from pathlib import Path


data_file = Path('home/data.txt')

try:
    data_file.unlink()
except IsADirectoryError as e:
    print(f'Error: {data_file} : {e.strerror}')

This creates a Path object called data_file that points to a file. Calling .remove() on data_file will delete home/data.txt. If data_file points to a directory, an IsADirectoryError is raised. It is worth noting that the Python program above has the same permissions as the user running it. If the user does not have permission to delete the file, a PermissionError is raised.

Deleting Directories

The standard library offers the following functions for deleting directories:

To delete a single directory or folder, use os.rmdir() or pathlib.rmdir(). These two functions only work if the directory you’re trying to delete is empty. If the directory isn’t empty, an OSError is raised. Here is how to delete a folder:

import os


trash_dir = 'my_documents/bad_dir'

try:
    os.rmdir(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')

Here, the trash_dir directory is deleted by passing its path to os.rmdir(). If the directory isn’t empty, an error message is printed to the screen:

>>>
Traceback (most recent call last):
  File '<stdin>', line 1, in <module>
OSError: [Errno 39] Directory not empty: 'my_documents/bad_dir'

Alternatively, you can use pathlib to delete directories:

from pathlib import Path


trash_dir = Path('my_documents/bad_dir')

try:
    trash_dir.rmdir()
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')

Here, you create a Path object that points to the directory to be deleted. Calling .rmdir() on the Path object will delete it if it is empty.

Deleting Entire Directory Trees

To delete non-empty directories and entire directory trees, Python offers shutil.rmtree():

import shutil


trash_dir = 'my_documents/bad_dir'

try:
    shutil.rmtree(trash_dir)
except OSError as e:
    print(f'Error: {trash_dir} : {e.strerror}')

Everything in trash_dir is deleted when shutil.rmtree() is called on it. There may be cases where you want to delete empty folders recursively. You can do this using one of the methods discussed above in conjunction with os.walk():

import os


for dirpath, dirnames, files in os.walk('.', topdown=False):
     try:
         os.rmdir(dirpath)
     except OSError as ex:
         pass

This walks down the directory tree and tries to delete each directory it finds. If the directory isn’t empty, an OSError is raised and that directory is skipped. The table below lists the functions covered in this section:

Function Description
os.remove() Deletes a file and does not delete directories
os.unlink() Is identical to os.remove() and deletes a single file
pathlib.Path.unlink() Deletes a file and cannot delete directories
os.rmdir() Deletes an empty directory
pathlib.Path.rmdir() Deletes an empty directory
shutil.rmtree() Deletes entire directory tree and can be used to delete non-empty directories

Copying, Moving, and Renaming Files and Directories

Python ships with the shutil module. shutil is short for shell utilities. It provides a number of high-level operations on files to support copying, archiving, and removal of files and directories. In this section, you’ll learn how to move and copy files and directories.

Copying Files in Python

shutil offers a couple of functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2(). To copy a file from one location to another using shutil.copy(), do the following:

import shutil


src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy(src, dst)

shutil.copy() is comparable to the cp command in UNIX based systems. shutil.copy(src, dst) will copy the file src to the location specified in dst. If dst is a file, the contents of that file are replaced with the contents of src. If dst is a directory, then src will be copied into that directory. shutil.copy() only copies the file’s contents and the file’s permissions. Other metadata like the file’s creation and modification times are not preserved.

To preserve all file metadata when copying, use shutil.copy2():

import shutil


src = 'path/to/file.txt'
dst = 'path/to/dest_dir'
shutil.copy2(src, dst)

Using .copy2() preserves details about the file such as last access time, permission bits, last modification time, and flags.

Copying Directories

While shutil.copy() only copies a single file, shutil.copytree() will copy an entire directory and everything contained in it. shutil.copytree(src, dest) takes two arguments: a source directory and the destination directory where files and folders will be copied to.

Here’s an example of how to copy the contents of one folder to a different location:

>>>
>>> import shutil
>>> shutil.copytree('data_1', 'data1_backup')
'data1_backup'

In this example, .copytree() copies the contents of data_1 to a new location data1_backup and returns the destination directory. The destination directory must not already exist. It will be created as well as missing parent directories. shutil.copytree() is a good way to back up your files.

Moving Files and Directories

To move a file or directory to another location, use shutil.move(src, dst).

src is the file or directory to be moved and dst is the destination:

>>>
>>> import shutil
>>> shutil.move('dir_1/', 'backup/')
'backup'

shutil.move('dir_1/', 'backup/') moves dir_1/ into backup/ if backup/ exists. If backup/ does not exist, dir_1/ will be renamed to backup.

Renaming Files and Directories

Python includes os.rename(src, dst) for renaming files and directories:

>>>
>>> os.rename('first.zip', 'first_01.zip')

The line above will rename first.zip to first_01.zip. If the destination path points to a directory, it will raise an OSError.

Another way to rename files or directories is to use rename() from the pathlib module:

>>>
>>> from pathlib import Path
>>> data_file = Path('data_01.txt')
>>> data_file.rename('data.txt')

To rename files using pathlib, you first create a pathlib.Path() object that contains a path to the file you want to replace. The next step is to call rename() on the path object and pass a new filename for the file or directory you’re renaming.

Archiving

Archives are a convenient way to package several files into one. The two most common archive types are ZIP and TAR. The Python programs you write can create, read, and extract data from archives. You will learn how to read and write to both archive formats in this section.

Reading ZIP Files

The zipfile module is a low level module that is part of the Python Standard Library. zipfile has functions that make it easy to open and extract ZIP files. To read the contents of a ZIP file, the first thing to do is to create a ZipFile object. ZipFile objects are similar to file objects created using open(). ZipFile is also a context manager and therefore supports the with statement:

import zipfile


with zipfile.ZipFile('data.zip', 'r') as zipobj:

Here, you create a ZipFile object, passing in the name of the ZIP file to open in read mode. After opening a ZIP file, information about the archive can be accessed through functions provided by the zipfile module. The data.zip archive in the example above was created from a directory named data that contains a total of 5 files and 1 subdirectory:

.
├── file1.py
├── file2.py
├── file3.py
└── sub_dir
    ├── bar.py
    └── foo.py

1 directory, 5 files

To get a list of files in the archive, call namelist() on the ZipFile object:

import zipfile


with zipfile.ZipFile('data.zip', 'r') as zipobj:
    zipobj.namelist()

This produces a list:

['file1.py', 'file2.py', 'file3.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']

.namelist() returns a list of names of the files and directories in the archive. To retrieve information about the files in the archive, use .getinfo():

import zipfile


with zipfile.ZipFile('data.zip', 'r') as zipobj:
    bar_info = zipobj.getinfo('sub_dir/bar.py')
    bar_info.file_size

Here’s the output:

15277

.getinfo() returns a ZipInfo object that stores information about a single member of the archive. To get information about a file in the archive, you pass its path as an argument to .getinfo(). Using getinfo(), you’re able to retrieve information about archive members such as the date the files were last modified, their compressed sizes, and their full filenames. Accessing .file_size retrieves the file’s original size in bytes.

The following example shows how to retrieve more details about archived files in a Python REPL. Assume that the zipfile module has been imported and bar_info is the same object you created in previous examples:

>>>
>>> bar_info.date_time
(2018, 10, 7, 23, 30, 10)
>>> bar_info.compress_size
2856
>>> bar_info.filename
'sub_dir/bar.py'

bar_info contains details about bar.py such as its size when compressed and its full path.

The first line shows how to retrieve a file’s last modified date. The next line shows how to get the size of the file after compression. The last line shows the full path of bar.py in the archive.

ZipFile supports the context manager protocol, which is why you’re able to use it with the with statement. Doing this automatically closes the ZipFile object after you’re done with it. Trying to open or extract files from a closed ZipFile object will result in an error.

Extracting ZIP Archives

The zipfile module allows you to extract one or more files from ZIP archives through .extract() and .extractall().

These methods extract files to the current directory by default. They both take an optional path parameter that allows you to specify a different directory to extract files to. If the directory does not exist, it is automatically created. To extract files from the archive, do the following:

>>>
>>> import zipfile
>>> import os

>>> os.listdir('.')
['data.zip']
>>> data_zip = zipfile.ZipFile('data.zip', 'r')
>>> # Extract a single file to current directory
...
>>> data_zip.extract('file1.py')
'/home/terra/test/dir1/zip_extract/file1.py'
>>> os.listdir('.')
['file1.py', 'data.zip']
>>> # Extract all files into a different directory
...
>>> data_zip.extractall(path='extract_dir/')
>>> os.listdir('.')
['file1.py', 'extract_dir', 'data.zip']
>>> os.listdir('extract_dir')
['file1.py', 'file3.py', 'file2.py', 'sub_dir']
>>> data_zip.close()

The third line of code is a call to os.listdir(), which shows that the current directory has only one file, data.zip.

Next, you open data.zip in read mode and call .extract() to extract file1.py from it. .extract() returns the full file path of the extracted file. Since there’s no path specified, .extract() extracts file1.py to the current directory.

The next line prints a directory listing showing that the current directory now includes the extracted file in addition to the original archive. The line after that shows how to extract the entire archive into the zip_extract directory. .extractall() creates the extract_dir and extracts the contents of data.zip into it. The last line closes the ZIP archive.

Extracting Data From Password Protected Archives

zipfile supports extracting password protected ZIPs. To extract password protected ZIP files, pass in the password to the .extract() or .extractall() method as an argument:

>>>
>>> import zipfile

>>> with zipfile.ZipFile('secret.zip', 'r') as pwd_zip:
...     # Extract from a password protected archive
...     pwd_zip.extractall(path='extract_dir', pwd='Quish3@o')

This opens the secret.zip archive in read mode. A password is supplied to .extractall(), and the archive contents are extracted to extract_dir. The archive is closed automatically after the extraction is complete thanks to the with statement.

Creating New ZIP Archives

To create a new ZIP archive, you open a ZipFile object in write mode (w) and add the files you want to archive:

>>>
>>> import zipfile

>>> file_list = ['file1.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip', 'w') as new_zip:
...     for name in file_list:
...         new_zip.write(name)

In the example, new_zip is opened in write mode and each file in file_list is added to the archive. When the with statement suite is finished, new_zip is closed. Opening a ZIP file in write mode erases the contents of the archive and creates a new archive.

To add files to an existing archive, open a ZipFile object in append mode and then add the files:

>>>
>>> # Open a ZipFile object in append mode
...
>>> with zipfile.ZipFile('new.zip', 'a') as new_zip:
...     new_zip.write('data.txt')
...     new_zip.write('latin.txt')

Here, you open the new.zip archive you created in the previous example in append mode. Opening the ZipFile object in append mode allows you to add new files to the ZIP file without deleting its current contents. After adding files to the ZIP file, the with statement goes out of context and closes the ZIP file.

Opening TAR Archives

TAR files are uncompressed file archives like ZIP. They can be compressed using gzip, bzip2, and lzma compression methods. The TarFile class allows reading and writing of TAR archives.

Do this to read from an archive:

import tarfile

with tarfile.open('example.tar', 'r') as tar_file:
    print(tar_file.getnames())

tarfile objects open like most file-like objects. They have an open() function that takes a mode that determines how the file is to be opened.

Use the 'r', 'w' or 'a' modes to open an uncompressed TAR file for reading, writing, and appending, respectively. To open compressed TAR files, pass in a mode argument to tarfile.open() that is in the form filemode[:compression]. The table below lists the possible modes TAR files can be opened in:

Mode Action
r Opens archive for reading with transparent compression
r:gz Opens archive for reading with gzip compression
r:bz2 Opens archive for reading with bzip2 compression
r:xz Opens archive for reading with lzma compression
w Opens archive for uncompressed writing
w:gz Opens archive for gzip compressed writing
w:xz Opens archive for lzma compressed writing
a Opens archive for appending with no compression

.open() defaults to 'r' mode. To read an uncompressed TAR file and retrieve the names of the files in it, use .getnames():

>>>
>>> import tarfile

>>> tar = tarfile.open('example.tar', mode='r')
>>> tar.getnames()
['CONTRIBUTING.rst', 'README.md', 'app.py']

This returns a list with the names of the archive contents.

Note: For the purposes of showing you how to use different tarfile object methods, the TAR file in the examples is opened and closed manually in an interactive REPL session.

Interacting with the TAR file this way allows you to see the output of running each command. Normally, you would want to use a context manager to open file-like objects.

The metadata of each entry in the archive can be accessed using special attributes:

>>>
>>> for entry in tar.getmembers():
...     print(entry.name)
...     print(' Modified:', time.ctime(entry.mtime))
...     print(' Size    :', entry.size, 'bytes')
...     print()
CONTRIBUTING.rst
 Modified: Sat Nov  1 09:09:51 2018
 Size    : 402 bytes

README.md
 Modified: Sat Nov  3 07:29:40 2018
 Size    : 5426 bytes

app.py
 Modified: Sat Nov  3 07:29:13 2018
 Size    : 6218 bytes

In this example, you loop through the list of files returned by .getmembers() and print out each file’s attributes. The objects returned by .getmembers() have attributes that can be accessed programmatically such as the name, size, and last modified time of each of the files in the archive. After reading or writing to the archive, it must be closed to free up system resources.

Extracting Files From a TAR Archive

In this section, you’ll learn how to extract files from TAR archives using the following methods:

To extract a single file from a TAR archive, use extract(), passing in the filename:

>>>
>>> tar.extract('README.md')
>>> os.listdir('.')
['README.md', 'example.tar']

The README.md file is extracted from the archive to the file system. Calling os.listdir() confirms that README.md file was successfully extracted into the current directory. To unpack or extract everything from the archive, use .extractall():

>>>
>>> tar.extractall(path="extracted/")

.extractall() has an optional path argument to specify where extracted files should go. Here, the archive is unpacked into the extracted directory. The following commands show that the archive was successfully extracted:

$ ls
example.tar  extracted  README.md

$ tree
.
├── example.tar
├── extracted
│   ├── app.py
│   ├── CONTRIBUTING.rst
│   └── README.md
└── README.md

1 directory, 5 files

$ ls extracted/
app.py  CONTRIBUTING.rst  README.md

To extract a file object for reading or writing, use .extractfile(), which takes a filename or TarInfo object to extract as an argument. .extractfile() returns a file-like object that can be read and used:

>>>
>>> f = tar.extractfile('app.py')
>>> f.read()
>>> tar.close()

Opened archives should always be closed after they have been read or written to. To close an archive, call .close() on the archive file handle or use the with statement when creating tarfile objects to automatically close the archive when you’re done. This frees up system resources and writes any changes you made to the archive to the filesystem.

Creating New TAR Archives

Here’s how you do it:

>>>
>>> import tarfile

>>> file_list = ['app.py', 'config.py', 'CONTRIBUTORS.md', 'tests.py']
>>> with tarfile.open('packages.tar', mode='w') as tar:
...     for file in file_list:
...         tar.add(file)

>>> # Read the contents of the newly created archive
>>> with tarfile.open('package.tar', mode='r') as t:
...     for member in t.getmembers():
...         print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py

First, you make a list of files to be added to the archive so that you don’t have to add each file manually.

The next line uses the with context manager to open a new archive called packages.tar in write mode. Opening an archive in write mode('w') enables you to write new files to the archive. Any existing files in the archive are deleted and a new archive is created.

After the archive is created and populated, the with context manager automatically closes it and saves it to the filesystem. The last three lines open the archive you just created and print out the names of the files contained in it.

To add new files to an existing archive, open the archive in append mode ('a'):

>>>
>>> with tarfile.open('package.tar', mode='a') as tar:
...     tar.add('foo.bar')

>>> with tarfile.open('package.tar', mode='r') as tar:
...     for member in tar.getmembers():
...         print(member.name)
app.py
config.py
CONTRIBUTORS.md
tests.py
foo.bar

Opening an archive in append mode allows you to add new files to it without deleting the ones already in it.

Working With Compressed Archives

tarfile can also read and write TAR archives compressed using gzip, bzip2, and lzma compression. To read or write to a compressed archive, use tarfile.open(), passing in the appropriate mode for the compression type.

For example, to read or write data to a TAR archive compressed using gzip, use the 'r:gz' or 'w:gz' modes respectively:

>>>
>>> files = ['app.py', 'config.py', 'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
...     tar.add('app.py')
...     tar.add('config.py')
...     tar.add('tests.py')

>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
...     for member in t.getmembers():
...         print(member.name)
app.py
config.py
tests.py

The 'w:gz' mode opens the archive for gzip compressed writing and 'r:gz' opens the archive for gzip compressed reading. Opening compressed archives in append mode is not possible. To add files to a compressed archive, you have to create a new archive.

An Easier Way of Creating Archives

The Python Standard Library also supports creating TAR and ZIP archives using the high-level methods in the shutil module. The archiving utilities in shutil allow you to create, read, and extract ZIP and TAR archives. These utilities rely on the lower level tarfile and zipfile modules.

Working With Archives Using shutil.make_archive()

shutil.make_archive() takes at least two arguments: the name of the archive and an archive format.

By default, it compresses all the files in the current directory into the archive format specified in the format argument. You can pass in an optional root_dir argument to compress files in a different directory. .make_archive() supports the zip, tar, bztar, and gztar archive formats.

This is how to create a TAR archive using shutil:

import shutil

# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup', 'tar', 'data/')

This copies everything in data/ and creates an archive called backup.tar in the filesystem and returns its name. To extract the archive, call .unpack_archive():

shutil.unpack_archive('backup.tar', 'extract_dir/')

Calling .unpack_archive() and passing in an archive name and destination directory extracts the contents of backup.tar into extract_dir/. ZIP archives can be created and extracted in the same way.

Reading Multiple Files

Python supports reading data from multiple input streams or from a list of files through the fileinput module. This module allows you to loop over the contents of one or more text files quickly and easily. Here’s the typical way fileinput is used:

import fileinput
for line in fileinput.input()
    process(line)

fileinput gets its input from command line arguments passed to sys.argv by default.

Using fileinput to Loop Over Multiple Files

Let’s use fileinput to build a crude version of the common UNIX utility cat. The cat utility reads files sequentially, writing them to standard output. When given more than one file in its command line arguments, cat will concatenate the text files and display the result in the terminal:

# File: fileinput-example.py
import fileinput
import sys


files = fileinput.input()
for line in files:
    if fileinput.isfirstline():
        print(f'\n--- Reading {fileinput.filename()} ---')
    print(' -> ' + line, end='')
print()

Running this on two text files in my current directory produces the following output:

$ python3 fileinput-example.py bacon.txt cupcake.txt


--- Reading bacon.txt ---
 -> Spicy jalapeno bacon ipsum dolor amet in in aute est qui enim aliquip,
 -> irure cillum drumstick elit.
 -> Doner jowl shank ea exercitation landjaeger incididunt ut porchetta.
 -> Tenderloin bacon aliquip cupidatat chicken chuck quis anim et swine.
 -> Tri-tip doner kevin cillum ham veniam cow hamburger.
 -> Turkey pork loin cupidatat filet mignon capicola brisket cupim ad in.
 -> Ball tip dolor do magna laboris nisi pancetta nostrud doner.

--- Reading cupcake.txt ---
 -> Cupcake ipsum dolor sit amet candy I love cheesecake fruitcake.
 -> Topping muffin cotton candy.
 -> Gummies macaroon jujubes jelly beans marzipan.

fileinput allows you to retrieve more information about each line such as whether or not it is the first line (.isfirstline()), the line number (.lineno()), and the filename (.filename()). You can read more about it here.

Conclusion

You now know how to use Python to perform the most common operations on files and groups of files. You’ve learned about the different built-in modules used to read, find, and manipulate them.

You’re now equipped to use Python to:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

January 21, 2019 06:00 AM UTC


codingdirectional

Further modifying the video editing application

Our video editing application project is nearly completed so then let us continue to modify it in this post. In this new code edition we will include the repeat scale box which allows the user to select how many times he wishes the video to repeat, the fps scale box which allows the user to select the fps for the new video, besides that we have also included the video format combo box where the user can select which new video format he wishes his new video to have. Below is the entire program.

from tkinter import *
from tkinter import filedialog
import os
import subprocess
import tkinter.ttk as tk

win = Tk() # Create tk instance
win.title("NeW Vid") # Add a title
win.resizable(0, 0) # Disable resizing the GUI
win.configure(background='white') # change background color

mainframe = Frame(win) # create a frame
mainframe.pack()

eqFrame = Frame(win) # create eq frame
eqFrame.pack(side = TOP, fill=X)

buttonFrame = Frame(win) # create a button frame
buttonFrame.pack(side = BOTTOM, fill=X)

# Create a label and scale box for eq
contrast_variable = DoubleVar()
contrast = Scale(eqFrame, from_=float(-2.00), to=float(2.00), orient=HORIZONTAL, label="CONTRAST", digits=3, resolution=0.01, variable=contrast_variable)
contrast.set(1)
contrast.pack(side = LEFT)
brightness_variable = DoubleVar()
brightness = Scale(eqFrame, from_=float(-1.00), to=float(1.00), orient=HORIZONTAL, label="BRIGHTNESS", digits=3, resolution=0.01, variable=brightness_variable)
brightness.pack(side = LEFT)
saturation_variable = DoubleVar()
saturation = Scale(eqFrame, from_=float(0.00), to=float(3.00), orient=HORIZONTAL, label="SATURATION", digits=3, resolution=0.01, variable=saturation_variable)
saturation.set(1)
saturation.pack(side = LEFT)
gamma_variable = DoubleVar()
gamma = Scale(eqFrame, from_=float(0.10), to=float(10.00), orient=HORIZONTAL, label="GAMMA", digits=4, resolution=0.01, variable=gamma_variable)
gamma.set(1)
gamma.pack(side = LEFT)
loop_variable = DoubleVar()
loop = Scale(eqFrame, from_=float(0), to=float(10), orient=HORIZONTAL, label="REPEAT", digits=2, resolution=1, variable=loop_variable)
loop.pack(side = LEFT)
fr_variable = DoubleVar()
fr = Scale(eqFrame, from_=float(9), to=float(60), orient=HORIZONTAL, label="FPS", digits=2, resolution=1, variable=fr_variable)
fr.set(24)
fr.pack(side = LEFT)

# Create a combo box
vid_size = StringVar() # create a string variable
preferSize = tk.Combobox(mainframe, textvariable=vid_size) 
preferSize['values'] = (1920, 1280, 854, 640) # video width in pixels
preferSize.current(0) # select item one 
preferSize.pack(side = LEFT)

# Create a combo box
vid_format = StringVar() # create a string variable
preferFormat = tk.Combobox(mainframe, textvariable=vid_format) 
preferFormat['values'] = ('.mp4', '.webm', '.avi', '.wmv', '.mpg', '.ogv') # video format
preferFormat.current(0) # select item one 
preferFormat.pack(side = LEFT)

removeAudioVal = IntVar()
removeAudio = tk.Checkbutton(mainframe, text="Remove Audio", variable=removeAudioVal)
removeAudio.pack(side = LEFT, padx=3)

newAudio = IntVar()
aNewAudio = tk.Checkbutton(mainframe, text="New Audio", variable=newAudio)
aNewAudio.pack(side = LEFT, padx=2)

# Open a video file
def openVideo():
        
        fullfilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Video file", "*.mp4; *.avi ")]) # select a video file from the hard drive
        audiofilename = ''
        if(newAudio.get() == 1):
                audiofilename = filedialog.askopenfilename(initialdir="/", title="Select a file", filetypes=[("Audio file", "*.wav; *.ogg ")]) # select a new audio file from the hard drive
                
        if(fullfilename != ''): 

                scale_vid = preferSize.get() # retrieve value from the comno box
                new_size = str(scale_vid)
                dir_path = os.path.dirname(os.path.realpath(fullfilename))

                file_extension = fullfilename.split('.')[-1] # extract the video format from the original video

                os.chdir(dir_path) # change the directory to the original file's directory

                f = new_size  + '.' + file_extension # the new output file name
                f2 = '0' + f # second video

                noAudio = removeAudioVal.get() # get the checkbox state for audio 

                subprocess.call(['ffmpeg', '-stream_loop', str(loop_variable.get()), '-i', fullfilename, '-vf', 'scale=' + new_size + ':-1', '-y', '-r', str(fr_variable.get()), f]) # resize, speedup and loop the video with ffmpeg
               
                #subprocess.call(['ffmpeg', '-i', f, '-ss', '00:02:30', '-y', f2]) # create animated gif starting from 2 minutes and 30 seconds to the end
                

                if(noAudio == 1):
                        subprocess.call(['ffmpeg', '-i', f, '-c', 'copy', '-y', '-an', f2]) # remove audio from the original video
                
                if(audiofilename != '' and noAudio == 1 and newAudio.get() == 1):
                        subprocess.call(['ffmpeg', '-i', f2, '-i', audiofilename, '-shortest', '-c:v', 'copy', '-c:a', 'aac', '-b:a', '256k', '-y', f]) # add audio to the original video, trim either the audio or video depends on which one is longer

                subprocess.call(['ffmpeg', '-i', f, '-vf', 'eq=contrast=' + str(contrast_variable.get()) +':brightness='+ str(brightness_variable.get()) +':saturation=' + str(saturation_variable.get()) +':gamma='+ str(gamma_variable.get()), '-y', f2]) # adjust the saturation, gamma, contrast and brightness of video
                f3 = f + vid_format.get() # The final video format

                if(f3.split('.')[-1] != f2.split('.')[-1]):
                        subprocess.call(['ffmpeg', '-i', f2, '-y', f3]) # converting the video with ffmpeg
                        os.remove(f2) # remove two videos
                        os.remove(f)
                else:
                        os.remove(f) # remove a video

action_vid = tk.Button(buttonFrame, text="Open Video", command=openVideo)
action_vid.pack(fill=X)

win.mainloop()

If you run the above program you will get below outcome.

The new video editor user interface

The above program needs more edits so stay tuned for the next chapter.

January 21, 2019 04:58 AM UTC


Vasudev Ram

Factorial function using Python's reduce function


- By Vasudev Ram - Online Python training / SQL training / Linux training



[This is a beginner-level Python post. I label such posts as "python-beginners" in the Blogger labels at the bottom of the post. You can get a sub-feed of all such posts for any label using the label (case-sensitive) in a URL of the form:

https://jugad2.blogspot.com/search/label/label_name where label_name is to be replaced by an actual label,

such as in:

jugad2.blogspot.com/search/label/python-beginners

and

jugad2.blogspot.com/search/label/python
]

Hi, readers,

The factorial function (Wikipedia article) is often implemented in programming languages as either an iterative or a recursive function. Both are fairly simple to implement.

For the iterative version, to find the value of n factorial (written n! in mathematics), you set a variable called, say, product, equal to 1, then multiply it in a loop by each value of a variable i that ranges from 1 to n.

For the recursive version, you define the base case as 0! = 1, and then for all higher values of n factorial, you compute them recursively as the product of n with (n - 1) factorial.

[ Wikipedia article about Iteration. ]

[ Wikipedia article about Recursion in computer_science. ]

Here is another way of doing it, which is also iterative, but uses no explicit loop; instead it uses Python's built-in reduce() function, which is part of the functional programming paradigm or style:
In [179]: for fact_num in range(1, 11):
...: print reduce(mul, range(1, fact_num + 1))
...:
1
2
6
24
120
720
5040
40320
362880
3628800
The above snippet (run in IPython - command-line version), loops over the values 1 to 10, and computes the factorial of each of those values, using reduce with operator.mul (which is a functional version of the multiplication operator). In more detail: the function call range(1, 11) returns a list with the values 1 to 10, and the for statement iterates over those values, passing each to the expression involving reduce and mul, which together compute each value's factorial, using the iterable returned by the second range call, which produces all the numbers that have to be multiplied together to get the factorial of fact_num.

The Python docstring for reduce:
reduce.__doc__: reduce(function, sequence[, initial]) -> value

Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Did you know that there are many different kinds of factorials? To learn more, check out this post:

Permutation facts

- Enjoy.


- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Sell More Digital Products With SendOwl.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


January 21, 2019 04:25 AM UTC


Kushal Das

That missing paragraph

In my last blog post, I wrote about a missing paragraph. I did not keep that text anywhere, I just deleted it while reviewing the post. Later Jason asked me in the comments to actually post that paragraph too.

So, I will write about it. 2018 was an amazing year, all told;, good, great, and terrible moments all together. Things were certain highs , and a few really low moments. Some things take time to heal, some moments make a life long impact.

The second part of 2018 went downhill at a pretty alarming rate, personally. Just after coming back from PyCon US 2018, from the end of May to the beginning of December, within 6 months we lost 4 family members. On the night of 30th May, my uncle called, telling me that my dad was admitted to the hospital, and the doctor wanted to talk to me. He told me to come back home as soon as possible. There was a very real chance that I wouldn’t be able to talk to him again. Anwesha and I, managed to reach Durgapur by 9AM and dad passed away within a few hours. From the time of that phone call, my brain suddenly became quite detached, very calm and thinking about next steps. Things to be handled, official documents to be taken care of, what needs to be done next.

I felt a few times that I’dburst into tears, but, the next thing that sprang to mind was that if I started crying, that would affect my mother and rest of the family too. Somehow, I managed not to cry and every time I got emotionally overwhelmed, I started thinking about next logical steps. I actually made sure, I did not talk about the whole incident much, until recently after things settled down. I also spent time in my village and then in Kolkata.

In the next 4 months, there have been 3 more deaths. Every time the news came, I did not show any reaction, but, it hurt.

Our education system is what supposed to help us grow in life. But, I feel it is more likely, that school is just training for the society to work cohesively and to make sure that the machines are well oiled. Nothing prepares us to deal with real life incidents. Moreover, death is a taboo subject with most of us.

Coming back to the effect of these demises, for a moment it created a real panic in my brain. What if I just vanish tomorrow? In my mind, our physical bodies are some amazing complex robots / programs. When one fails, the rest of them try to cope , try to fill in the gaps. But, the nearby endpoints never stay the same. I am working as usual, but, somehow my behavior has changed. I know that I have a long lasting problem with emails, but, that has grown a little out of hand in the last 5 months. I am putting in a lot of extra effort to reply to the emails I actually managed to notice. Before that, I was opening the editor to reply, but my mind blanked, and I could not type anything.

I don’t quite know how to end the post. The lines above are almost like a stream of consciousness in my mind and I don’t even know if they make sense in the order I put them in. But, at the same time, it makes sense to write it down. At the end of the day, we are all human, we make mistakes, we all have emotions, and often times it is okay to let it out.

In a future post, I will surely write another post talking about the changes I am bringing in my life to cope.

January 21, 2019 01:30 AM UTC

January 20, 2019


Carl Chenet

How I Switched Working From Office Full Time to Remote 3 Days A Week

Remote work is not for everyone. It depends a lot of anyone’s taste. But it’s definitely for me. Here is why and how I switched from working full time in an office to 3 days of remote work.

TL;DR: After working from home for a few months, I was convinced remote work was my thing. I had to look for a new freelance contract including remote work and I had to refuse a lot of good offers banning remote work. At least in my country (France), finding remote work, even part time, is still difficult. It greatly depends on the company culture. I’m lucky enough, my current client promotes remote work.


Foreword

If you follow my blog on a regular basis (RSS feed if you like it), you know I’ve been working remotely for a while, starting one day a weekwhen I was working part time in order to be more productive for my side projects.

But this article will explain why and how I decided to start working remotely and what kind of choice – professional and personal – I had to do in order to achieve this goal.

Why working remotely suits me

I’m a freelance since 2012 and usually work at the office of my clients.  I had 2 intense years some time ago and it was so intense I needed a break.That’s not optimal because, as you know, a freelance does not earn money when he does not work. No paid vacation. Moreover the freelancer can not count on any unemployment compensation (at least in France). I work on side projects since 2015 but I’m far from being self-sufficient. After my previous mission, I took a 6 month break and had important personal finance issue after that. You guess. So it’s obvious if I want to remain freelance, I need to work on a regular basis.

But.

Paris is a quite crowded city. Public transportation are overcrowded and some subway lines are too old. It generates a lot of stress for everyone, public transportation workers and users. When you go to work and especially if you live in Paris suburbs, after a chaotic ride from home, it usually means you haven’t started to work but you’re already stressed out. You also waste between 45mn and 1h30mn for each ride, between 1h30 and 3 hours each day!

Given the fact I work on several side projects, helping communities to grow and developing online services, I need time. Even the lunch break time. I’m not a workaholic, I love playing squash, watching movies, reading, playing poker so I’m not going to work everyday until 2 or 3am.

Playing Squash during lunch break on remote days (Paris, Charléty stadium)

Once my daily job is finished, I need my free time. And don’t talk to me about waking up at 5am. Tried that. Once. Never again.

My main job is system architect. I mostly work on complex issues, like scalability of high-trafic websites or migrating old platform to the cloud. I need peace and long period of time without interruptions in order to think.

Open space is a waste of both my time and resources, given the fact I can not manage useless interruptions when I’m at the office as efficiently as while I work remotely (I just don’t reply). Some jobs need a lot of interactions with others (managers), others need silence and peace. It’s a fact. Meetings are sometimes useful, but I don’t need to be at the office 5 days a week for these. And online tools are now quite efficient for short meetings.

How I switched

During my 6-month break, I have been working on my side projects from home. I knew I was ready.

When I started to search for a new contract, I was looking for companies allowing remote work. In France it’s not so common and sometimes remote workers are seen as slackers. Given this reputation, I had to stand firm about it while candidating.

Another issue comes from the recruiters. Some of them are overoptimistic about remote work and tell candidates that remote work will be allowed soon in their company. That is not often the case and even if it is, it could take months or years. Moreover remote work is a culture, it’s not so simple to set up. If the company culture is not ready, it’s only a matter of time before cancellation. Yahoo! has a famous record about it, banning work from home.

I finally chose for a company being the best fit for me. Given the price of venues in Paris, the company still growing , they had to encourage working from home. A really good point for me. During the interview, my boss told me members of the team were working remotely on a regular basis, 1 or 2 times a week.

Of course I started slow and for some months worked full time in the office. I started with only one day a week. I was not bored to death working at home. I was still efficient. Even more efficient on complex tasks. Less interruption, less noise and I was not forced to use headphone any more.

At home, drinking tea all day long

I soon started to work 2 times a week from home. Working remotely is written in the DNA of this company and anybody is easily reachable. Being quite self-sufficient on my projects, I mostly need to go to the office for enjoying the team and for meetings. From a technical point of view, being at home or at the office is exactly the same thing. We use a laptop and a VPN. Most of the company tools are Software As A Service (SAAS), reachable from anywhere around the world.

These days, depending on business or team meetings, I work up to 3 days from home. and I enjoy doing so.

To be continued

Working remotely is a great asset some positions can offer. Definitely not for all kinds of jobs, but it allows to improve some real issues like commuting and allows a better personal time management. I guess the taste for working remotely is different for anyone, but in my case it suits my lifestyle and I’ll make it a requirement for my next jobs.

About The Author

Carl Chenet, Free Software Indie Hacker, Founder of LinuxJobs.io, a Job board dedicated to Free and Open Source Software Jobs in the US (soon to be released).

Follow Me On Social Networks

 

January 20, 2019 11:00 PM UTC


Toshio Kuratomi

Optimizating Conway

Conway’s Game of Life seems to be a common programming exercise. I had to program it in Pascal when in High School and in C in an intro college programming course. I remember in college, since I had already programmed it before, that I wanted to optimize the algorithm. However, a combination of writing in C and having only a week to work on it didn’t leave me with enough time to implement anything fancy.

A couple years later, I hiked the Appalachian Trail. Seven months away from computers, just hiking day in and day out. One of the things I found myself contemplating when walking up and down hills all day was that pesky Game of Life algorithm and ways that I could improve it.

Fast forward through twenty intervening years of life and experience with a few other programming languages to last weekend. I needed a fun programming exercise to raise my spirits so I looked up the rules to Conway’s Game of Life, sat down with vim and python, and implemented a few versions to test out some of the ideas I’d had kicking around in my head for a quarter century.

This blog post will only contain a few snippets of code to illustrate the differences between each approach.  Full code can be checked out from this github repository.

The Naive Approach: Tracking the whole board

The naive branch is an approximation of how I would have first coded Conway’s Game of Life way back in that high school programming course. The grid of cells is what I would have called a two dimensional array in my Pascal and C days. In Python, I’ve more often heard it called a list of lists. Each entry in the outer list is a row on the grid which are each represented by another list. Each entry in the inner lists are cells in that row of the grid. If a cell is populated, then the list entry contains True. If not, then the list entry contains False.

One populated cell surrounded by empty cells would look like this:

board = [
         [False, False, False],
         [False, True, False],
         [False, False, False],
        ]

Looking up an individual cell’s status is a matter of looking up an index in two lists: First the y-index in the outer list and then the x-index in an inner list:


# Is there a populated cell at x-axis 0, y-axis 1?
if board[0][1] is True:
    pass

Checking for changes is done by looping through every cell on the Board, and checking whether each cell’s neighbors made the cell fit a rule to populate or depopulate the cell on the next iteration.

for y_idx, row in enumerate(board):
    for x_idx, cell in enumerate(row):
        if cell:
            if not check_will_live((x_idx, y_idx), board, max_x, max_y):
                next_board[y_idx][x_idx] = False
        else:
            if check_new_life((x_idx, y_idx), board, max_x, max_y):
                next_board[y_idx][x_idx] = True

This is a simple mapping of the two-dimensional grid that Conway’s takes place on into a computer data structure and then a literal translation of Conway’s ruleset onto those cells. However, it seems dreadfully inefficient. Even in college I could see that there should be easy ways to speed this up; I just needed the time to implement them.

Intermediate: Better checking

The intermediate branch rectifies inefficiencies with checking of the next generation cells. The naive branch checks every single cell that is present in the grid. However, thinking about most Conway setups, most of the cells are blank. If we find a way to ignore most of the blank cells, then it would save us a lot of work. We can’t ignore all blank cells, though; if a blank cell has exactly three populated neighbors then the blank cell will become populated in the next generation.

The key to satisfying both of those is to realize that all the cells we’re going to need to change will either be populated (in which case, they could die and become empty in the next generation) or they will be a neighbor of a populated cell (in which case, they may become populated next generation). So we can loop through our board and ignore all of the unpopulated cells at first. If we find a populated cell, then we must both check that cell to see if it will die and also check its empty neighbors to see if they will be filled in the next generation.

The major change to implement that is here:

       checked_cells = set()
       # We still loop through every cell in the board but now
       # the toplevel code to do something if the cell is empty
       # has been removed.
       for y_idx, row in enumerate(board):
            for x_idx, cell in enumerate(row):
                if cell:
                    if not check_will_live((x_idx, y_idx), board, max_x, max_y):
                        next_board[y_idx][x_idx] = False
                    # Instead, inside of the conditional block to
                    # process when a cell is populated, there's
                    # a new loop to check all of the neighbors of
                    # a populated cell.
                    for neighbor in (n for n in find_neighbors((x_idx, y_idx), max_x, max_y)
                                     if n not in checked_cells):
                        # If the cell is empty, then we check whether
                        # it should be populated in the next generation
                        if not board[neighbor[1]][neighbor[0]]:
                            checked_cells.add(neighbor)
                            if check_new_life(neighbor, board, max_x, max_y):
                                next_board[neighbor[1]][neighbor[0]] = True

Observant readers might also notice that I’ve added a checked_cells set. This tracks which empty cells we’ve already examined to see if they will be populated next generation. Making use of that means that we will only check a specific empty cell once per generation no matter how many populated cells it’s a neighbor of.

These optimizations to the checking code made things about 6x as fast as the naive approach.

Gridless: Only tracking populated cells

The principle behind the intermediate branch of only operating on populated cells and their neighbors seemed like it should be applicable to the data structure I was storing the grid in as well as the checks. Instead of using fixed length arrays to store both the populated and empty portions of the grid, I figured it should be possible to simply store the populated portions of the grid and then use those for all of our operations.

However, C is a very anemic language when it comes to built in data structures.
if I was going to do that in my college class, I would have had to implement a linked list or a hash map data structure before I even got to the point where I could implement the rules of Conway’s Game of Life. Today, working in Python with it’s built in data types, it was very quick to implement a data structure of only the populated cells.

For the gridless branch, I replaced the 2d array with a set. The set contained tuples of (x-coordinate, y-coordinate) which defined the populated cells. One populated cell surrounded by empty cells would look like this:

board = set((
             (1,1),
           ))

Using a set had all sorts of advantages:

    - for y in range(0, max_y):
    -     for x in range(0, max_x):
    -         if board[x][y]:
    -             screen.addstr(y, x, ' ', curses.A_REVERSE)
    + for (x, y) in board:
    +     screen.addstr(y, x, ' ', curses.A_REVERSE)
  • Instead of copying the old board and then changing the status of cells each generation, it is now feasible to simply populate a new set with populated cells every generation. This is because the set() is empty except for the populated cells whereas the list of lists would need to have both the populated and the empty cells set to either True or False.
  •          - next_board = copy.deepcopy(board)
             + next_board = set()
    
  • Similar to the display loop, the main loop which checks what happens to the cells can now just loop through the populated cells instead of having to search the entire grid for them:
  •         # Perform checks and update the board
            for cell in board:
                if check_will_live(cell, board, max_x, max_y):
                    next_board.add(cell)
                babies = check_new_life(cell, board, max_x, max_y)
                next_board.update(babies)
            board = next_board
    
  • Checking whether a cell is populated now becomes a simple containment check on a set which is faster than looking up the list indexes in the 2d array:
  • - if board[cell[0]][cell[1]]:
    + if cell in board:
    

    Gridless made the program about 3x faster than intermediate, or about 20x faster than naive.

    Master: Only checking cell changes once per iteration

    Despite being 3x faster than intermediate, gridless was doing some extra work. The code in the master branch attempts to correct those.

    The most important change was that empty cells in gridless were being checked for each populated cell that was its neighbor. Adding a checked_cells set like the intermediate branch had to keep track of that ensures that we only check whether an empty cell should be populated in the next generation one time:

            checked_cells = set()
            for cell in board:
                if check_will_live(cell, board, max_x, max_y):
                    next_board.add(cell)
                checked_cells.add(cell)
                # Pass checked_cells into check_new_life so that
                # checking skips empty neighbors which have already
                # been checked this generation
                babies, barren = check_new_life(cell, board, checked_cells, max_x, max_y)
                checked_cells.update(babies)
                checked_cells.update(barren)
    

    The other, but relatively small, optimization was to use Python’s builtin least-recently-used cache decorator on the find_neighbors function. This allowed us to skip computing the set of neighboring cells when those cells were requested soon after each other. In the set-based code, finding_neighbors is called for the same cell back to back quite frequently so this did have a noticable impact.

    + @functools.lru_cache()
      def find_neighbors(cell, max_x, max_y):
    

    These changes sped up the master branch an additional 30% over what gridless had achieved or nearly 30x as fast as the naive implementation that we started with.

    What have I learned?

    January 20, 2019 10:42 PM UTC