skip to navigation
skip to content

Planet Python

Last update: April 25, 2018 01:47 AM

April 24, 2018

Stack Abuse

Single Page Apps with Vue.js and Flask: Deployment

Deployment to a Virtual Private Server

Single Page Apps with Vue.js and Flask: Deployment

Welcome to the seventh and final installment to this multi-part tutorial series on full-stack web development using Vue.js and Flask. In this post I will be demonstrating how do deploy the application built throughout this series.

The code for this post can be found on my GitHub account under the branch SeventhPost.

Series Content

  1. Seup and Getting to Know VueJS
  2. Navigating Vue Router
  3. State Management with Vuex
  4. RESTful API with Flask
  5. AJAX Integration with REST API
  6. JWT Authentication
  7. Deployment to a Virtual Private Server (you are here)

Overview of the Technologies

This tutorial will be covering several technologies necessary to deploy a distributed multi-tier Flask REST API and Vue.js SPA application. Below I have listed the technologies and their uses:

Gettting the Code Ready for Deployment

There are a couple of changes that need to be made to the code to make it more maintainable once the application has been deployed to my production environment.

For example, in api/index.js of the survey-spa Vue.js application I have hardcoded a variable called API_URL to point to the dev server Doing this I will need to remember to change this to the production server's IP address every time I need to deploy.

Experience has taught me that the there will always be changes to the application requiring future deployments where I am likely to forget to update this IP address. A better approach is to remove the risk of me forgetting to update this and instead utilizing configurations in the build process to handle this for me resulting in less that I have to remember (ie, fewer steps needed) during deployment. This significantly reduces the risk of an unsuccessful deployment on future updates.

I accomplish this by moving over to the survey-spa/config directory and modifying the dev.env.js and prod.env.js files by defining a variable called API_URL which are assigned a value of http://localhost:5000/api for dev and http://${process.env.BASE_URL}/api for prod as shown below:

// dev.env.js

'use strict'  
const merge = require('webpack-merge')  
const prodEnv = require('./prod.env')

module.exports = merge(prodEnv, {  
  NODE_ENV: '"development"',
  API_URL: JSON.stringify(`http://localhost:5000/api`)
// prod.env.js
'use strict'  
module.exports = {  
  NODE_ENV: '"production"',
  API_URL: JSON.stringify(`http://${process.env.BASE_URL}/api`)

Note: the value of process.env.BASE_URL is an environment variable that I will add to the Ubuntu server's user .bash_profile and set it equal to the IP address of the server.

Then over in api/index.js I modify the line const API_URL = '' and set it equal to process.env.API_URL.

Next, over in the Flask application I need to add a new module called to serve as the entry point to the Flask REST API. The module looks quite similar to the module except for that it does not have any calls to the run(...) method of the app object. This is because the app object will serve as a callable for the uwsgi container server to execute against using its fast binary protocol rather than the regular development server that gets created when is called.

# backend/

from surveyapi.application import create_app  
app = create_app()  

With this finished I can push my changes to version control and hop onto my production server to pull down the project and set up the programs I will use to run the application on the production server.

Readying the Ubuntu Server

Next I'll get onto my production Ubuntu virtual private server which could be hosted by one of the many Cloud services such as AWS, DigitalOcean, Linode, ect... and begin installing all of the goodies I listed in the Overview of the Technologies section.

$ apt-get update
$ apt-get install python3-pip python3-dev python3-venv nginx nodejs npm

With those installs out of the way I can now create a user called "survey" to execute the application under and house the code.

$ adduser survey
$ usermod -aG sudo survey
$ su survey
$ cd

I should now be in the "survey" user's home directory at /home/survey.

With the survey user created I can update the .bash_profile file to contain the IP address of my production server by adding this line to the end of the file. Note that represents a fake a IP address of my server. Replace it with your true IP address if you are following along.

export BASE_URL=  

Next I want to tell the firewall (ufw) that OpenSSH is acceptable and enable it.

$ sudo ufw allow OpenSSH
$ sudo ufw enable

With this done I will now clone the repo onto the server so I can build and deploy it.

$ git clone

Now I will cd into flask-vuejs-survey/frontend/survey-spa and install the frontend dependencies as well as build the production application.

$ cd flask-vuejs-survey/frontend/survey-spa
$ npm install
$ npm run build

This creates a new directory called "dist", which will contain an index.html page and a directory called "static" that contains all the compiled CSS and JavaScript files. These are what I will have Nginx server up to constitute the SPA's front-end application.

Next up I will create a virtual environment in the /home/survey directory for an isolated Python3 interpreter to run the Python application. Once installed I activate it and move into the backend project directory to install its dependency packages specified in the requirements.txt file.

$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ cd flask-vuejs-survey/backend
(venv) $ pip install -r requirements.txt

Now I can initialize the sqlite database and run the migrations to create the various database tables required by the REST API.

(venv) $ python db upgrade

At this point I would like to fire up the Flask dev server to make sure that all is working as expected. Before doing so I need to tell the ufw service to allow traffic in on port 5000.

(venv) $ sudo ufw allow 5000
(venv) $ python

In a browser I can now go to and I should see a simple JSON reponse of [] because there are no surveys in this database yet but, this does indicate that a succesful request was made. Additionally, in the terminal connected to the server there should be a logged message for the GET request issued from my browser.

I key in Ctrl+C in the terminal to kill the Flask dev server and move on to configuring uwsgi to control the execution of my Flask REST API. If you are wondering where uwsgi came from it is specified as a requirement in the requirements.txt file that I pip installed with earlier.

Setting up uWSGI Container Server

Similar to what I just did with the Flask dev server I will now test that the uWSGI server can serve up the application as follows.

(venv) $ uwsgi --socket --protocol=http -w wsgi:app

Again, going to my browser and refreshing the same request I made previously should return an empty JSON array response. Once satisfied with my progress I can again key Ctrl+C into the terminal and move on.

There are two more steps I would like to do to complete the configuration of the uWSGI container server. One step is to create a configuration file that uWSGI will read in which will replace many of those command line flags and arguments I used above. The second step is to create a systemd service file to manage the uWSGI container server as a service like many of the others already running on the Ubuntu server.

In the backend directory I make a file called surveyapi.ini and fill it with the following:

module = wsgi:app

master = true  
processes = 4

socket = myproject.sock  
chmod-socket = 660  
vacuum = true

die-on-term = true  

This config file lets uWSGI know that the callable is the app object inside of the module. It also tells it to spawn and use four processes to handle application requests communicated over a socket file called surveyapi.sock which has a loose enough permission to allow the Nginx web server to read and write from it. The vacuum and die-on-term settings are to ensure proper cleanup.

For the systemd service file I need to create a file called surveyapi.service in the /etc/systemd/system directory and add some descriptors plus access, write, and execution commands like so:

(venv) $ sudo nano /etc/systemd/system/surveyapi.service

Then populate it with the following:

Description=uWSGI Python container server

ExecStart=/home/survey/venv/bin/uwsgi --ini surveyapi.ini


Now I can start the service and check its status and make sure the backend directory now contains surveyapi.sock.

(venv) $ sudo systemctl start surveyapi
(venv) $ sudo systemctl status surveyapi
   Loaded: loaded (/etc/systemd/system/surveyapi.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2018-04-23 19:23:01 UTC; 2min 28s ago
 Main PID: 11221 (uwsgi)
    Tasks: 6
   Memory: 28.1M
      CPU: 384ms
   CGroup: /system.slice/surveyapi.service
           ├─11221 /home/survey/venv/bin/uwsgi --ini surveyapi.ini
           ├─11226 /home/survey/venv/bin/uwsgi --ini surveyapi.ini
           ├─11227 /home/survey/venv/bin/uwsgi --ini surveyapi.ini
           ├─11228 /home/survey/venv/bin/uwsgi --ini surveyapi.ini
           ├─11229 /home/survey/venv/bin/uwsgi --ini surveyapi.ini
           └─11230 /home/survey/venv/bin/uwsgi --ini surveyapi.ini

Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: mapped 437520 bytes (427 KB) for 5 cores  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: *** Operational MODE: preforking ***  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0x8b4c30 pid: 112  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: *** uWSGI is running in multiple interpreter mode ***  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: spawned uWSGI master process (pid: 11221)  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: spawned uWSGI worker 1 (pid: 11226, cores: 1)  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: spawned uWSGI worker 2 (pid: 11227, cores: 1)  
Apr 23 19:23:01 ubuntu-s-1vcpu-2gb-sfo2-01 uwsgi[11221]: spawned uWSGI worker 3 (pid: 11228, cores: 1)  
lines 1-23  
(venv) $ ls -l /home/survey/flask-vuejs-survey/backend
-rw-rw-r-- 1 survey survey     201 Apr 23 18:18
-rw-rw-r-- 1 survey survey     745 Apr 23 17:55
drwxrwxr-x 4 survey survey    4096 Apr 23 18:06 migrations  
drwxrwxr-x 2 survey survey    4096 Apr 23 18:52 __pycache__  
-rw-rw-r-- 1 survey survey     397 Apr 23 18:46 requirements.txt
drwxrwxr-x 3 survey survey    4096 Apr 23 18:06 surveyapi  
-rw-rw-r-- 1 survey survey     133 Apr 23 19:04 surveyapi.ini
srw-rw---- 1 survey www-data     0 Apr 23 19:23 surveyapi.sock  
-rw-r--r-- 1 survey survey   10240 Apr 23 18:19 survey.db
-rw-rw-r-- 1 survey survey      84 Apr 23 18:42

Excellent! The last thing I should do is enable automatic starting each time the system boots up ensuring that the application is always up.

(venv) $ sudo systemctl enable surveyapi

Setting Up Nginx

I will utilize Nginx to serve static content such as HTML, CSS, and JavaScript as well as to reverse proxy REST API calls to the Flask / uWSGI application. To set up nginx to accomplish these things I will need to create a config file which defines how to manage these various requests.

Over in /etc/nginx/sites-available I will create a file called survey which will contain the following:

server {  
    listen 80;

    location /api {
        include uwsgi_params;
        uwsgi_pass unix:/home/survey/flask-vuejs-survey/backend/surveyapi.sock;

  location / {
    root /home/survey/flask-vuejs-survey/frontend/survey-spa/dist;
    try_files $uri $uri/ /home/survey/flask-vuejs-survey/frontend/survey-spa/dist/index.html;

This file creates a new server block configuration which says to listen to IP address on the standard HTTP port of 80. Then it says look for any URI paths beginning with /api and reverse proxy that to the Flask / uWSGI REST API server using the previously defined socket file. Lastly, the config says to catch everything else under / and serve up the index.html file in the dist directory created when I built the Vue.js front-end SPA application prior.

With this config file created I need to let Nginx know that it is an available site by creating a symbolic link to the /etc/nginx/sites-enabled directory like so:

$ sudo ln -s /etc/nginx/sites-available/survey /etc/nginx/sites-enabled 

To allow traffic over the HTTP port and bind to Nginx I will issue the following update to ufw as well as close the previously opened 5000 port.

$ sudo ufw delete allow 5000
$ sudo ufw allow 'Nginx Full'

Following this command I will need to restart the Nginx service like so for the updates to take effect.

$ sudo systemctl restart nginx

Now I can go to my browser again and visit http://123.454.67.89 and I am presented with the survey application I've shown in prior articles.


Well this is the concluding post to this multi-part tutorial series on how to utilize Flask and Vue.js to build a REST API enabled SPA application. I have attempted to cover most of the important topics that are common to many web application use cases assuming very little prior knowledge of the Flask and Vue.js technologies used.

I thank you for following along with this series and please do not be shy about commenting or critiquing below.

April 24, 2018 02:09 PM

Real Python

Pipenv: A Guide to the New Python Packaging Tool

Pipenv is a packaging tool for Python that solves some common problems associated with the typical workflow using pip, virtualenv, and the good old requirements.txt.

In addition to addressing some common issues, it consolidates and simplifies the development process to a single command line tool.

This guide will go over what problems Pipenv solves and how to manage your Python dependencies with Pipenv. Additionally, it will cover how Pipenv fits in with previous methods for package distribution.

Free Bonus: Click here to get access to a free 5-day class that shows you how to avoid common dependency management issues with tools like Pip, PyPI, Virtualenv, and requirements files.

Problems that Pipenv Solves

To understand the benefits of Pipenv, it’s important to walk through the current methods for packaging and dependency management in Python.

Let’s start with a typical situation of handling third-party packages. We’ll then build our way towards deploying a complete Python application.

Dependency Management with requirements.txt

Imagine you’re working on a Python project that uses a third-party package like flask. You’ll need to specify that requirement so that other developers and automated systems can run your application.

So you decide to include the flask dependency in a requirements.txt file:


Great, everything works fine locally, and after hacking away on your app for a while, you decide to move it to production. Here’s where things get a little scary…

The above requirements.txt file doesn’t specify which version of flask to use. In this case, pip install -r requirements.txt will install the latest version by default. This is okay unless there are interface or behavior changes in the newest version that break our application.

For the sake of this example, let’s say that a new version of flask got released. However, it isn’t backward compatible with the version you used during development.

Now, let’s say you deploy your application to production and do a pip install -r requirements.txt. Pip gets the latest, not-backward-compatible version of flask, and just like that, your application breaks… in production.

“But hey, it worked on my machine!”—I’ve been there myself, and it’s not a great feeling.

At this point, you know that the version of flask you used during development worked fine. So, to fix things, you try to be a little more specific in your requirements.txt. You add a version specifier to the flask dependency. This is also called pinning a dependency:


Pinning the flask dependency to a specific version ensures that a pip install -r requirements.txt sets up the exact version of flask you used during development. But does it really?

Keep in mind that flask itself has dependencies as well (which pip installs automatically). However, flask itself doesn’t specify exact versions for its dependencies. For example, it allows any version of Werkzeug>=0.14.

Again, for the sake of this example, let’s say a new version of Werkzeug got released, but it introduces a show-stopper bug to your application.

When you do pip install -r requirements.txt in production this time, you will get flask==0.12.1 since you’ve pinned that requirement. However, unfortunately, you’ll get the latest, buggy version of Werkzeug. Again, the product breaks in production.

The real issue here is that the build isn’t deterministic. What I mean by that is that, given the same input (the requirements.txt file), pip doesn’t always produce the same environment. At the moment, you can’t easily replicate the exact environment you have on your development machine in production.

The typical solution to this problem is to use pip freeze. This command allows you to get exact versions for all 3rd party libraries currently installed, including the sub-dependencies pip installed automatically. So you can freeze everything in development to ensure that you have the same environment in production.

Executing pip freeze results in pinned dependencies you can add to a requirements.txt:


With these pinned dependencies, you can ensure that the packages installed in your production environment match those in your development environment exactly, so your product doesn’t unexpectedly break. This “solution,” unfortunately, leads to a whole new set of problems.

Now that you’ve specified the exact versions of every third-party package, you are responsible for keeping these versions up to date, even though they’re sub-dependencies of flask. What if there’s a security hole discovered in Werkzeug==0.14.1 that the package maintainers immediately patched in Werkzeug==0.14.2? You really need to update to Werkzeug==0.14.2 to avoid any security issues arising from the earlier, unpatched version of Werkzeug.

First, you need to be aware that there’s an issue with the version you have. Then, you need to get the new version in your production environment before someone exploits the security hole. So, you have to change your requirements.txt manually to specify the new version Werkzeug==0.14.2. As you can see in this situation, the responsibility of staying up to date with necessary updates falls on you.

The truth is that you really don’t care what version of Werkzeug gets installed as long as it doesn’t break your code. In fact, you probably want the latest version to ensure that you’re getting bug fixes, security patches, new features, more optimization, and so on.

The real question is: “How do you allow for deterministic builds for your Python project without gaining the responsibility of updating versions of sub-dependencies?”

Spoiler alert: The easy answer is using Pipenv.

Development of Projects with Different Dependencies

Let’s switch gears a bit to talk about another common issue that arises when you’re working on multiple projects. Imagine that ProjectA needs django==1.9, but ProjectB needs django==1.10.

By default, Python tries to store all your third-party packages in a system-wide location. This means that every time you want to switch between ProjectA and ProjectB, you have to make sure the right version of django is installed. This makes switching between projects painful because you have to uninstall and reinstall packages to meet the requirements for each project.

The standard solution is to use a virtual environment that has its own Python executable and third-party package storage. That way, ProjectA and ProjectB are adequately separated. Now you can easily switch between projects since they’re not sharing the same package storage location. PackageA can have whatever version of django it needs in its own environment, and PackageB can have what it needs totally separate. A very common tool for this is virtualenv (or venv in Python 3).

Pipenv has virtual environment management built in so that you have a single tool for your package management.

Dependency Resolution

What do I mean by dependency resolution? Let’s say you’ve got a requirements.txt file that looks something like this:


Let’s say package_a has a sub-dependency package_c, and package_a requires a specific version of this package: package_c>=1.0. In turn, package_b has the same sub-dependency but needs package_c<=2.0.

Ideally, when you try to install package_a and package_b, the installation tool would look at the requirements for package_c (being >=1.0 and <=2.0) and select a version that fulfills those requirements. You’d hope that the tool resolves the dependencies so that your program works in the end. This is what I mean by “dependency resolution.”

Unfortunately, pip itself doesn’t have real dependency resolution at the moment, but there’s an open issue to support it.

The way pip would handle the above scenario is as follows:

  1. It installs package_a and looks for a version of package_c that fulfills the first requirement (package_c>=1.0).

  2. Pip then installs the latest version of package_c to fulfill that requirement. Let’s say the latest version of package_c is 3.1.

This is where the trouble (potentially) starts.

If the version of package_c selected by pip doesn’t fit future requirements (such as package_b needing package_c<=2.0), the installation will fail.

The “solution” to this problem is to specify the range required for the sub-dependency (package_c) in the requirements.txt file. That way, pip can resolve this conflict and install a package that meets those requirements:


Just like before though, you’re now concerning yourself directly with sub-dependencies (package_c). The issue with this is that if package_a changes their requirement without you knowing, the requirements you specified (package_c>=1.0,<=2.0) may no longer be acceptable, and installation may fail… again. The real problem is that once again, you’re responsible for staying up to date with requirements of sub-dependencies.

Ideally, your installation tool would be smart enough to install packages that meet all the requirements without you explicitly specifying sub-dependency versions.

Pipenv Introduction

Now that we’ve addressed the problems, let’s see how Pipenv solves them.

First, let’s install it:

$ pip install pipenv

Once you’ve done that, you can effectively forget about pip since Pipenv essentially acts as a replacement. It also introduces two new files, the Pipfile (which is meant to replace requirements.txt) and the Pipfile.lock (which enables deterministic builds).

Pipenv uses pip and virtualenv under the hood but simplifies their usage with a single command line interface.

Example Usage

Let’s start over with creating your awesome Python application. First, spawn a shell in a virtual environment to isolate the development of this app:

$ pipenv shell

This will create a virtual environment if one doesn’t already exist. Pipenv creates all your virtual environments in a default location. If you want to change Pipenv’s default behavior, there are some environmental variables for configuration.

You can force the creation of a Python 2 or 3 environment with the arguments --two and --three respectively. Otherwise, Pipenv will use whatever default virtualenv finds.

Sidenote: If you require a more specific version of Python, you can provide a --python argument with the version you require. For example: --python 3.6

Now you can install the 3rd party package you need, flask. Oh, but you know that you need version 0.12.1 and not the latest version, so go ahead and be specific:

$ pipenv install flask==0.12.1

You should see something like the following in your terminal:

Adding flask==0.12.1 to Pipfile's [packages]…
Pipfile.lock not found, creating…

You’ll notice that two files get created, a Pipfile and Pipfile.lock. We’ll take a closer look at these in a second. Let’s install another 3rd party package, numpy, for some number-crunching. You don’t need a specific version so don’t specify one:

$ pipenv install numpy

If you want to install something directly from a version control system (VCS), you can! You specify the locations similarly to how you’d do so with pip. For example, to install the requests library from version control, do the following:

$ pipenv install -e git+

Note the -e argument above to make the installation editable. Currently, this is required for Pipenv to do sub-dependency resolution.

Let’s say you also have some unit tests for this awesome application, and you want to use pytest for running them. You don’t need pytest in production so you can specify that this dependency is only for development with the --dev argument:

$ pipenv install pytest --dev

Providing the --dev argument will put the dependency in a special [dev-packages] location in the Pipfile. These development packages only get installed if you specify the --dev argument with pipenv install.

The different sections separate dependencies needed only for development from ones needed for the base code to actually work. Typically, this would be accomplished with additional requirements files like dev-requirements.txt or test-requirements.txt. Now, everything is consolidated in a single Pipfile under different sections.

Okay, so let’s say you’ve got everything working in your local development environment and you’re ready to push it to production. To do that, you need to lock your environment so you can ensure you have the same one in production:

$ pipenv lock

This will create/update your Pipfile.lock, which you’ll never need to (and are never meant to) edit manually. You should always use the generated file.

Now, once you get your code and Pipfile.lock in your production environment, you should install the last successful environment recorded:

$ pipenv install --ignore-pipfile

This tells Pipenv to ignore the Pipfile for installation and use what’s in the Pipfile.lock. Given this Pipfile.lock, Pipenv will create the exact same environment you had when you ran pipenv lock, sub-dependencies and all.

The lock file enables deterministic builds by taking a snapshot of all the versions of packages in an environment (similar to the result of a pip freeze).

Now let’s say another developer wants to make some additions to your code. In this situation, they would get the code, including the Pipfile, and use this command:

$ pipenv install --dev

This installs all the dependencies needed for development, which includes both the regular dependencies and those you specified with the --dev argument during install.

When an exact version isn’t specified in the Pipfile, the install command gives the opportunity for dependencies (and sub-dependencies) to update their versions.

This is an important note because it solves some of the previous problems we discussed. To demonstrate, let’s say a new version of one of your dependencies becomes available. Because you don’t need a specific version of this dependency, you don’t specify an exact version in the Pipfile. When you pipenv install, the new version of the dependency will be installed in your development environment.

Now you make your changes to the code and run some tests to verify everything is still working as expected. (You do have unit tests, right?) Now, just as before, you lock your environment with pipenv lock, and an updated Pipfile.lock will be generated with the new version of the dependency. Just as before, you can replicate this new environment in production with the lock file.

As you can see from this scenario, you no longer have to force exact versions you don’t truly need to ensure your development and production environments are the same. You also don’t need to stay on top of updating sub-dependencies you “don’t care about.” This workflow with Pipenv, combined with your excellent testing, fixes the issues of manually doing all your dependency management.

Pipenv’s Dependency Resolution Approach

Pipenv will attempt to install sub-dependencies that satisfy all the requirements from your core dependencies. However, if there are conflicting dependencies (package_a needs package_c>=1.0, but package_b needs package_c<1.0), Pipenv will not be able to create a lock file and wil output an error like the following:

Warning: Your dependencies could not be resolved. You likely have a mismatch in your sub-dependencies.
  You can use $ pipenv install --skip-lock to bypass this mechanism, then run $ pipenv graph to inspect the situation.
Could not find a version that matches package_c>=1.0,package_c<1.0

As the warning says, you can also show a dependency graph to understand your top-level dependencies and their sub-dependencies:

$ pipenv graph

This command will print out a tree-like structure showing your dependencies. Here’s an example:

  - click [required: >=2.0, installed: 6.7]
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.10]
    - MarkupSafe [required: >=0.23, installed: 1.0]
  - Werkzeug [required: >=0.7, installed: 0.14.1]
  - attrs [required: >=17.2.0, installed: 17.4.0]
  - funcsigs [required: Any, installed: 1.0.2]
  - pluggy [required: <0.7,>=0.5, installed: 0.6.0]
  - py [required: >=1.5.0, installed: 1.5.2]
  - setuptools [required: Any, installed: 38.5.1]
  - six [required: >=1.10.0, installed: 1.11.0]
  - certifi [required: >=2017.4.17, installed: 2018.1.18]
  - chardet [required: >=3.0.2,<3.1.0, installed: 3.0.4]
  - idna [required: >=2.5,<2.7, installed: 2.6]
  - urllib3 [required: <1.23,>=1.21.1, installed: 1.22]

From the output of pipenv graph, you can see the top-level dependencies we installed previously (Flask, numpy, pytest, and requests), and underneath you can see the packages they depend on.

Additionally, you can reverse the tree to show the sub-dependencies with the parent that requires it:

$ pipenv graph --reverse

This reversed tree may be more useful when you are trying to figure out conflicting sub-dependencies.

The Pipfile

Pipfile intends to replace requirements.txt. Pipenv is currently the reference implementation for using Pipfile. It seems very likely that pip itself will be able to handle these files. Also, it’s worth noting that Pipenv is even the official package management tool recommended by Python itself.

The syntax for the Pipfile is TOML, and the file is separated into sections. [dev-packages] for development-only packages, [packages] for minimally required packages, and [requires] for other requirements like a specific version of Python. See an example file below:

url = ""
verify_ssl = true
name = "pypi"

pytest = "*"

flask = "==0.12.1"
numpy = "*"
requests = {git = "", editable = true}

python_version = "3.6"

Ideally, you shouldn’t have any sub-dependencies in your Pipfile. What I mean by that is you should only include the packages you actually import and use. No need to keep chardet in your Pipfile just because it’s a sub-dependency of requests. (Pipenv will install it automatically.) The Pipfile should convey the top-level dependencies your package requires.

The Pipfile.lock

This file enables deterministic builds by specifying the exact requirements for reproducing an environment. It contains exact versions for packages and hashes to support more secure verification, which pip itself now supports as well. An example file might look like the following. Note that the syntax for this file is JSON and that I’ve excluded parts of the file with ...:

    "_meta": {
    "default": {
        "flask": {
            "hashes": [
            "version": "==0.12.1"
        "requests": {
            "editable": true,
            "git": "",
            "ref": "4ea09e49f7d518d365e7c6f7ff6ed9ca70d6ec2e"
        "werkzeug": {
            "hashes": [
            "version": "==0.14.1"
    "develop": {
        "pytest": {
            "hashes": [
            "version": "==3.4.1"

Note the exact version specified for every dependency. Even the sub-dependencies like werkzeug that aren’t in our Pipfile appear in this Pipfile.lock. The hashes are used to ensure you’re retrieving the same package as you did in development.

It’s worth noting again that you should never change this file by hand. It is meant to be generated with pipenv lock.

Pipenv Extra Features

Open a third-party package in your default editor with the following command:

$ pipenv open flask

This will open the flask package in the default editor, or you can specify a program with an EDITOR environmental variable. For example, I use Sublime Text, so I just set EDITOR=subl. This makes it super simple to dig into the internals of a package you’re using.

You can run a command in the virtual environment without launching a shell:

$ pipenv run <insert command here>

Check for security vulnerabilities (and PEP 508 requirements) in your environment:

$ pipenv check

Now, let’s say you no longer need a package. You can uninstall it:

$ pipenv uninstall numpy

Additionally, let’s say you want to completely wipe all the installed packages from your virtual environment:

$ pipenv uninstall --all

You can replace --all with --all-dev to just remove dev packages.

Pipenv supports the automatic loading of environmental variables when a .env file exists in the top-level directory. That way, when you pipenv shell to open the virtual environment, it loads your environmental variables from the file. The .env file just contains key-value pairs:


Finally, here are some quick commands to find out where stuff is. How to find out where your virtual environment is:

$ pipenv --venv

How to find out where your project home is:

$ pipenv --where

Package Distribution

You may be asking how this all works if you intend to distribute your code as a package.

Yes, I need to distribute my code as a package

How does Pipenv work with files?

There are a lot of nuances to that question. First, a file is necessary when you’re using setuptools as your build/distribution system. This has been the de facto standard for a while now, but recent changes have made the use of setuptools optional.

This means that projects like flit can use the new pyproject.toml to specify a different build system that doesn’t require a

All that being said, for the near future setuptools and an accompanying will still be the default choice for many people.

Here’s a recommended workflow for when you are using a as a way to distribute your package:

To clarify, put your minimum requirements in instead of directly with pipenv install. Then use the pipenv install '-e .' command to install your package as editable. This gets all the requirements from into your environment. Then you can use pipenv lock to get a reproducible environment.

I don’t need to distribute my code as a package

Great! If you are developing an application that isn’t meant to be distributed or installed (a personal website, a desktop application, a game, or similar), you don’t really need a

In this situation, you could use Pipfile/Pipfile.lock combo for managing your dependencies with the flow described previously to deploy a reproducible environment in production.

I already have a requirements.txt. How do I convert to a Pipfile?

If you run pipenv install it should automatically detect the requirements.txt and convert it to a Pipfile, outputting something like the following:

requirements.txt found, instead of Pipfile! Converting…
Warning: Your Pipfile now contains pinned versions, if your requirements.txt did.
We recommend updating your Pipfile to specify the "*" version, instead.

Take note of the above warning.

If you have pinned exact versions in your requirements.txt file, you’ll probably want to change your Pipfile to only specify exact versions you truly require. This will allow you to gain the real benefits of transitioning. For example, let’s say you have the following but really don’t need that exact version of numpy:

numpy = "==1.14.1"

If you don’t have any specific version requirements for your dependencies, you can use the wildcard character * to tell Pipenv that any version can be installed:

numpy = "*"

If you feel nervous about allowing any version with the *, it’s typically a safe bet to specify greater than or equal to the version you’re already on so you can still take advantage of new versions:

numpy = ">=1.14.1"

Of course, staying up to date with new releases also means you’re responsible for ensuring your code still functions as expected when packages change. This means a test suite is essential to this whole Pipenv flow if you want to ensure functioning releases of your code.

You allow packages to update, run your tests, ensure they all pass, lock your environment, and then you can rest easy knowing that you haven’t introduced breaking changes. If things do break because of a dependency, you’ve got some regression tests to write and potentially some more restrictions on versions of dependencies.

For example, if numpy==1.15 gets installed after running pipenv install and it breaks your code, which you hopefully either notice during development or during your tests, you have a couple options:

  1. Update your code to function with the new version of the dependency.

    If backward compatibility with previous versions of the dependency isn’t possible, you’ll also need to bump your required version in your Pipfile:

    numpy = ">=1.15"
  2. Restrict the version of the dependency in the Pipfile to be < the version that just broke your code:

    numpy = ">=1.14.1,<1.15"

Option 1 is preferred as it ensures that your code is using the most up-to-date dependencies. However, Option 2 takes less time and doesn’t require code changes, just restrictions on dependencies.

You can also install from requirement files with the same -r argument pip takes:

$ pipenv install -r requirements.txt

If you have a dev-requirements.txt or something similar, you can add those to the Pipfile as well. Just add the --dev argument so it gets put in the right section:

$ pipenv install -r dev-requirements.txt --dev

Additionally, you can go the other way and generate requirements files from a Pipfile:

$ pipenv lock -r > requirements.txt
$ pipenv lock -r -d > dev-requirements.txt

What’s next?

It appears to me that a natural progression for the Python ecosystem would be a build system that uses the Pipfile to install the minimally required dependencies when retrieving and building a package from a package index (like PyPI). It is important to note again that the Pipfile design specification is still in development, and Pipenv is just a reference implementation.

That being said, I could see a future where the install_requires section of doesn’t exist, and the Pipfile is referenced for minimal requirements instead. Or the is gone entirely, and you get metadata and other information in a different manner, still using the Pipfile to get the necessary dependencies.

Is Pipenv worth checking out?

Definitely. Even if it’s just as a way to consolidate the tools you already use (pip & virtualenv) into a single interface. However, it’s much more than that. With the addition of the Pipfile, you only specify the dependencies you truly need.

You no longer have the headache of managing the versions of everything yourself just to ensure you can replicate your development environment. With the Pipfile.lock, you can develop with peace of mind knowing that you can exactly reproduce your environment anywhere.

In addition to all that, it seems very likely that the Pipfile format will get adopted and supported by official Python tools like pip, so it’d be beneficial to be ahead of the game. Oh, and make sure you’re updating all your code to Python 3 as well: 2020 is coming up fast.

References, further reading, interesting discussions, and so forth

Free Bonus: Click here to get access to a free 5-day class that shows you how to avoid common dependency management issues with tools like Pip, PyPI, Virtualenv, and requirements files.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

April 24, 2018 02:00 PM

Davide Moro

Hello pytest-play!

pytest-play is a rec&play (rec not yet available) pytest plugin that let you execute a set of actions and assertions using commands serialized in JSON format. It tries to make test automation more affordable for non programmers or non Python programmers for browser, functional, API, integration or system testing thanks to its pluggable architecture and third party plugins that let you interact with the most common databases and systems.

In addition it provides also some facilitations for writing browser UI actions (e.g., implicit waits before interacting with an input element. The Cypress framework for me was a great source of inspiration) and asynchronous checks (e.g., wait until a certain condition is true).

You can use pytest-play programmatically (e.g., use the pytest-play engine as a library for pytest-play standalone scenarios or using the pytest-play API implementing BDD steps).

Starting from pytest-play>1.4.x it was introduced a new experimental feature that let you use pytest-play as a framework creating Python-free automated tests based on a JSON based serialization format for actions and assertions (in the next future the more user friendly YAML format will be supported).

So now depending on your needs and skills you can choose to use pytest-play as a library or as a framework.

In this article I'm going to show how to implement a Plone CMS based login test using the python-free approach without having to write any line of Python code.

What is pytest-play and why it exists

In this section I'm going to add more information about the pytest-play approach and other considerations: if you want to see now how to implement our Python-free automated login test jump to the next section!

Hyper specialized tool problems

There are many commercial products or tools that offer solutions for API testing only, browser testing only. Sometimes hyper specialized tools might fit your needs (e.g., a content management system based web application) but sometimes they are not helpful for other distributed applications.

For example an API-only platform is not effective for testing a CQRS based application. It is not useful testing only HTTP 200 OK response, you should test that all the expected commands are generated on the event store (e.g., Cassandra) or other side effects.

Another example for an IoT applications and UI/browser only testing platforms. You cannot test reactive web apps only with a browser, you should control also simulated device activities (e.g., MQTT, queues, API) for messages/alarms/reports) or any other web based interactions performed by other users (e.g., HTTP calls); you might need to check asynchronously the expected results on web sockets instead of using a real browser implementing when some actions are performed.

What is pytest-play

In other words pytest-play is an open source testing solution based on the pytest framework that let you:
using a serialization format (JSON at this time of writing, YAML in the next future) that should be more affordable for non technical testers, non programmers or programmers with no Python knowledge.

Potentially you will be able to share and execute a new scenario not yet included in your test library copying and pasting a pytest-play JSON to a Jenkins build with parameters form like the following one (see the PLAY textarea):


In addition if you are a technical user you can extend it writing your own plugins, you can provide the integration with external tools (e.g., test management tools, software metrics engines, etc), you can decide the test abstraction depending on deadlines/skills/strategy (e.g., use plain json files, a programmatic approach based on json scenarios or BDD steps based on pytest-play).

What pytest-play is not

For example pytest-play doesn't provide a test scenario recorder but it enforces users to understand what they are doing.

It requires a very very little programming knowledge for writing some assertions using simple code expressions but with a little training activity it is still affordable by non programmers (you don't have to learn a programming language, just some basic assertions).

It is not feature complete but it is free software.

If you want to know more in this previous article I've talked about:

A pytest-play example: parametrized login (featuring Plone CMS)

In this example we'll see how to write and execute pure json pytest-play scenarios with test data decoupled by the test implementation and test parametrization. I'm using the available online Plone 5 demo site kindly hosted by Andreas Jung (

The project is available here:
The tests could be launched this way as a normal pytest project once you installed pytest and the dependencies (there is a requirements.txt file, see the above link):

$ pytest --variables env-ALPHA.yml --splinter-webdriver firefox --splinter-screenshot-dir /tmp -x
Where the you can have multiple environment/variable files. E.g., env-ALPHA.yml containing the alpha base url and any other variables:
Our login test_login.json scenario contains (as you can see there are NO asynchronous waits because they are not needed for basic examples so you can focus on actions and assertions thanks to implicit waits):
"steps": [
"comment": "visit base url",
"type": "get",
"url": "$base_url"
"comment": "click on login link",
"locator": {
"type": "id",
"value": "personaltools-login"
"type": "clickElement"
"comment": "provide a username",
"locator": {
"type": "id",
"value": "__ac_name"
"text": "$username",
"type": "setElementText"
"comment": "provide a password",
"locator": {
"type": "id",
"value": "__ac_password"
"text": "$password",
"type": "setElementText"
"comment": "click on login submit button",
"locator": {
"type": "css",
"value": ".pattern-modal-buttons > input[name=submit]"
"type": "clickElement"
"comment": "wait for page loaded",
"locator": {
"type": "css",
"value": ".icon-user"
"type": "waitForElementVisible"
Plus an optional test scenario metadata file test_login.ini that contains pytest keyword and decoupled test data:
markers =
test_data =
{"username": "siteadmin", "password": "siteadmin"}
{"username": "editor", "password": "editor"}
{"username": "reader", "password": "reader"}
Thanks to the metadata file you have just one scenario and it will be executed 3 times (as many times as test data rows)!

Et voilà, let's see it in action out scenario without having to write any line of Python code:

There is only a warning I have to remove but it worked and we got exactly 3 different test runs for our login scenario as expected!

pytest-play status

pytest-play should be still considered experimental software and many features needs to be implemented or refactored:

PyCon Nove @ Florence

If you are going to attending next PyCon Nove in Florence don't miss the following pytest-play talk presented by Serena Martinetti:

Do you like pytest-play?

Tweets about pytest-play happens on @davidemoro.
Positive or negative feedback is always appreciated. If you find interesting the concepts behind pytest-play let me know with a tweet, add a new pytest-play adapter and/or add a GitHub star if you liked it:


April 24, 2018 10:40 AM

Test automation framework thoughts and examples with Python, pytest and Jenkins

In this article I'll share some personal thoughts about Test Automation Frameworks; you can take inspiration from them if you are going to evaluate different test automation platforms or assess your current test automation solution (or solutions).

Despite it is a generic article about test automation, you'll find many examples explaining how to address some common needs using the Python based test framework named pytest and the Jenkins automation server: use the information contained here just as a comparison and feel free to comment sharing alternative methods or ideas coming from different worlds.

It contains references to some well (or less) known pytest plugins or testing libraries too.

Before talking about automation and test automation framework features and characteristics let me introduce the most important test automation goal you should always keep in mind.

Test automation goals: ROI

You invest in automation for a future return of investment.
Simpler approaches let you start more quickly but in the long term they don't perform well in terms of ROI and vice versa. In addition the initial complexity due to a higher level of abstraction may produce better results in the medium or long term: better ROI and some benefits for non technical testers too. Have a look at the test automation engineer ISTQB certification syllabus for more information:

So what I mean is that test automation is not easy: it doesn't mean just recording some actions or write some automated test procedures because how you decide to automate things affects the ROI. Your test automation strategy should consider your tester technical skills now and future evolutions, considerations about how to improve your system testability (is your software testable?), good test design and architecture/system/domain knowledge. In other words be aware of vendors selling "silver bullet" solutions promising smooth test automation for everyone, especially rec&play solutions: there are no silver bullets.

Test automation solution features and characteristics

A test automation solution should be enough generic and flexible, otherwise there is the risk of having to adopt different and maybe incompatible tools for different kind of tests. Try to imagine the mess of having the following situation: one tool or commercial service for browser based tests only based on rec&play, one tool for API testing only, performance test frameworks that doesn't let you reuse existing scenarios, one tool for BDD only scenarios, different Jenkins jobs with different settings for each different tool, no test management tool integration, etc. A unique solution, if possible, would be better: something that let you choose the level of abstraction and that doesn't force you. Something that let you start simple and that follow your future needs and the skill evolution of your testers.
That's one of the reasons why I prefer pytest over an hyper specialized solution like behave for example: if you combine pytest+pytest-bdd you can write BDD scenarios too and you are not forced to use a BDD only capable test framework (without having the pytest flexibility and tons of additional plugins).

And now, after this preamble, an unordered list of features or characteristics that you may consider for your test automation solution software selection:
Typically a test automation engineer will be able to drive automated test runs using the framework command line interface (CLI) during test development but you'll find out very soon that you need an automation server for long running tests, scheduled builds, CI and here it comes Jenkins. Jenkins could be used by non technical testers for launching test runs or initialize an environment with some test data.


What is Jenkins? From the Jenkins website:
Continuous Integration and Continuous Delivery. As an extensible automation server, Jenkins can be used as a simple CI server or turned into the continuous delivery hub for any project.
So thanks to Jenkins everyone can launch a parametrized automated test session just using a browser: no command line and nothing installed on your personal computer. So more power to non technical users thanks to Jenkins!

With Jenkins you can easily schedule recurrent automatic test runs, start remotely via external software some parametrized test runs, implement a CI and many other things. In addition as we will see Jenkins is quite easy to configure and manage thanks to through the web configuration and/or Jenkins pipelines.

Basically Jenkins is very good at starting builds and generally jobs. In this case Jenkins will be in charge of launching our parametrized automated test runs.

And now let's talk a little bit of Python and the pytest test framework.

Python for testing

I don't know if there are some articles talking about statistics on the net about the correlation between Test Automation Engineer job offers and the Python programming language, with a comparison between other programming languages. If you find a similar resource share with me please!

My personal feeling observing for a while many Test Automation Engineer job offers (or any similar QA job with some automation flavor) is that the Python word is very common. Most of times is one of the nice to have requirements and other times is mandatory.

Let's see why the programming language of choice for many QA departments is Python, even for companies that are not using Python for building their product or solutions.

Why Python for testing

Why Python is becoming so popular for test automation? Probably because it is more affordable for people with no or little programming knowledge compared to other languages. In addition the Python community is very supportive and friendly especially with new comers, so if you are planning to attend any Python conference be prepared to fall in love with this fantastic community and make new friends (friends, not only connections!). For example at this time of writing you are still in time for attending PyCon Nove 2018 in the beautiful Florence (even better if you like history, good wine, good food and meet great people): 
You can just compare the most classical hello world, for example with Java:
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!");
and compare it with the Python version now:
print("Hello, World!")
Do you see any difference? If you are trying to explain to a non programmer how to print a line in the terminal window with Java you'll have to introduce public, static, void, class, System, installing a runtime environment choosing from different version, installing an IDE, running javac, etc and only at the end you will be able to see something printed on the screen. With Python, most of times it comes preinstalled in many distributions, you just focus on what to need to do. Requirements: a text editor and Python installed. If you are not experienced you start with a simple approach and later you can progressively learn more advanced testing approaches.

And what about test assertions? Compare for example a Javascript based assertions:
with the Python version:
assert b != c
So no expect(a).not.toBeLessThan(b), expect(c >= d).toBeTruthy() or expect(e).toBeLessThan(f): with Python you just say assert a >= 0 so nothing to remember for assertions!

Python is a big fat and very powerful programming language but it follows a "pay only for what you eat" approach.

Why pytest

If Python is the language of your choice you should consider the pytest framework and its high quality community plugins and I think it is a good starting point for building your own test automation solution.

The pytest framework ( makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.

Most important pytest features:
I strongly suggest to have a look at the pytest documentation but I'd like to make some examples showing something about fixtures, code reuse, test parametrization and improved maintainability of your tests. If you are not a technical reader you can skip this section.

I'm trying to explain fixtures with practical examples based on answers and questions:
Here you can see an example of fixture parametrization (the test_smtp will be executed twice because you have 2 different fixture configurations):
import pytest
import smtplib

                        params=["", ""])
def smtp(request):
    smtp = smtplib.SMTP(request.param, 587, timeout=5)
    yield smtp
    print("finalizing %s" % smtp)

def test_smtp(smtp):
    # use smtp fixture (e.g., smtp.sendmail(...))
    # and make some assertions.
    # The same test will be executed twice (2 different params)

 And now an example of test parametrization:
import pytest
@pytest.mark.parametrize("test_input,expected", [
    ("3+5", 8),
    ("2+4", 6),
    ("6*9", 42), ])
def test_eval(test_input, expected):
    assert eval(test_input) == expected
For more info see:
This is only pytest, as we will see there are many pytest plugins that extend the pytest core features.

Pytest plugins

There are hundreds of pytest plugins, the ones I am using more frequently are:
Python libraries for testing:
Scaffolding tools:

Pytest + Jenkins together

We've discussed about Python, pytest and Jenkins main ingredients for our cocktail recipe (shaken, not stirred). Optional ingredients: integration with external test management tools and selenium grid providers.

Thanks to pytest and its plugins you have a rich command line interface (CLI); with Jenkins you can schedule automated builds, setup a CI, let not technical users or other stakeholders executing parametrized test runs or building test always fresh test data on the fly for manual testing, etc. You just need a browser, nothing installed on your computer.

Here you can see how our recipe looks like:

Now lets comment all our features provided by the Jenkins "build with parameters" graphical interface, explaining option by option when and why they are useful.

Target environment (ENVIRONMENT)

In this article we are not talking about regular unit tests, the basis for your testing pyramid. Instead we are talking about system, functional, API, integration, performance tests to be launched against a particular instance of an integrated system (e.g., dev, alpha or beta environments).

You know, unit tests are good they are not sufficient: it is important to verify if the integrated system (sometimes different complex systems developed by different teams under the same or third party organizations) works fine as it is supposed to do. It is important because it might happen that 100% unit tested systems doesn't play well after the integration for many different reasons. So with unit tests you take care about your code quality, with higher test levels you take care about your product quality. Thanks to these tests you can confirm an expected product behavior or criticize your product.

So thanks to the ENVIRONMENT option you will be able to choose one of the target environments. It is important to be able to reuse all your tests and launch them against different environments without having to change your testware code. Under the hood the pytest launcher will be able to switch between different environments thanks to the pytest-variables parametrization using the --variables command line option, where each available option in the ENVIRONMENT select element is bound to a variables files (e.g., DEV.yml, ALPHA.yml, etc) containing what the testware needs to know about the target environment.

Generally speaking you should be able to reuse your tests without any modification thanks to a parametrization mechanism.If your test framework doesn't let you change target environment and it forces you to modify your code, change framework.

Browser settings (BROWSER)

This option makes sense only if you are going to launch browser based tests otherwise it will be ignored for other type of tests (e.g., API or integration tests).

You should be able to select a particular version of browser (latest or a specific version) if any of your tests require a real browser (not needed for API tests just for making one example) and preferably you should be able to integrate with a cloud system that allows you to use any combination of real browsers and OS systems (not only a minimal subset of versions and only Firefox and Chrome like several test platforms online do). Thanks to the BROWSER option you can choose which browser and version use for your browser based tests. Under the hood the pytest launcher will use the --variables command line option provided by the pytest-variables plugin, where each option is bound to a file containing the browser type, version and capabilities (e.g., FIREFOX.yml, FIREFOX-xy.yml, etc). Thanks to pytest, or any other code based testing framework, you will be able to combine browser interactions with non browser actions or assertions.

A lot of big fat warnings about rec&play online platforms for browser testing or if you want to implement your testing strategy using only or too many browser based tests. You shouldn't consider only if they provide a wide range of OS and versions, the most common browsers. They should let you perform also non browser based actions or assertions (interaction with queues, database interaction, http POST/PUT/etc calls, etc). What I mean is that sometimes only a browser is not sufficient for testing your system: it might be good for a CMS but if you are testing an IoT platform you don't have enough control and you will write completely useless tests or low value tests (e.g., pure UI checks instead of testing reactive side effects depending on eternal triggers, reports, device activity simulations causing some effects on the web platform under test, etc).

In addition be aware that some browser based online testing platforms doesn't use Selenium for their browser automation engine under the hood. For example during a software selection I found an online platform using some Javascript injection for implementing user actions interaction inside the browser and this might be very dangerous. For example let's consider a login page that takes a while before the input elements become ready for accepting the user input when some conditions are met. If for some reasons a bug will never unlock the disabled login form behind a spinner icon, your users won't be able to login to that platform. Using Selenium you'll get a failing result in case of failure due to a timeout error (the test will wait for elements won't never be ready to interact with and after few seconds it will raise an exception) and it's absolutely correct. Using that platform the test was green because under the hood the input element interaction was implemented using DOM actions with the final result of having all your users stuck: how can you trust such platform?

OS settings (OS)

This option is useful for browser based tests too. Many Selenium grid vendors provide real browser on real OS systems and you can choose the desired combination of versions.

Resolution settings (RESOLUTION)

Same for the above options, many vendor solutions let you choose the desired screen resolution for automated browser based testing sessions.

Select tests by names expressions (KEYWORDS)

Pytest let you select the tests you are going to launch selecting a subset of tests that matches a pattern language based on test and module names.

For example I find very useful to add the test management tool reference in test names, this way you will be able to launch exactly just that test:
Or for example all test names containing the login word but not c92411:
login and not c92411
Or if you organize your tests in different modules you can just specify the folder name and you'll select all the tests that live under that module:
Under the hood the pytest command will be launched with -k "EXPRESSION", for example
-k "c93466"
It is used in combination with markers, a sort of test tags.

Select tests to be executed by tag expressions (MARKERS)

Markers can be used alone or in conjunction with keyword expressions. They are a sort of tag expression that let you select just the minimum set of tests for your test run.

Under the hood the pytest launcher uses the command line syntax -m "EXPRESSION".

For example you can see a marker expression that selects all tests marked with the edit tag excluding the ones marked with CANBusProfileEdit:
edit and not CANBusProfileEdit
Or execute only edit negative tests: 
edit and negative
Or all integration tests
It's up to you creating granular keywords for features and all you need for select your tests (e.g., functional, integration, fast, negative, ci, etc).

Test management tool integration (TESTRAIL_ENABLE)

All my tests are decorated with the test case identifier provided by the test management tool, in my company we are using TestRail.

If this option is enabled the test results of executed tests will be reported in the test management tool.

Implemented using the pytest-testrail plugin.

Enable debug mode (DEBUG)

The debug mode enables verbose logging.

In addition for browser based tests open selenium grid sessions activating debug capabilities options ( For example verbose browser console logs, video recordings, screenshots for each step, etc. In my company we are using a local installation of Zalenium and BrowserStack automate.

Block on first failure (BLOCK_FIRST_FAILURE)

This option is very useful for the following needs:
The first usage let you gain confidence with a new build and you want to stop on the very first failure for analyzing what happened.

The second usage is very helpful for:
As you can imagine you may combine this option with COUNT, PARALLEL_SESSIONS, RANDOM_ENABLE and DEBUG depending on your needs. You can test your tests robustness too.

Under the hood implemented using the pytest's -x option.

Parallel test executions (PARALLEL_SESSIONS)

Under the hood implemented with pytest-xdist's command line option called -n NUM and let you execute your tests with the desired parallelism level.

pytest-xdist is very powerful and provides more advanced options and network distributed executions. See for further options.

Switch from different selenium grid providers (SELENIUM_GRID_URL)

For browser based testing by default your tests will be launched on a remote grid URL. If you don't touch this option the default grid will be used (a local Zalenium or any other provider) but in case of need you can easily switch provider without having to change nothing in your testware.

If you want you can save money maintaining and using a local Zalenium as default option; Zalenium can be configured as a selenium grid router that will dispatch capabilities that it is not able to satisfy. This way you will be able to save money and augment a little bit the parallelism level without having to change plan.

Repeat test execution for a given amount of times (COUNT)

Already discussed before, often used in conjunction with BLOCK_FIRST_FAILURE (pytest core -x option)

If you are trying to diagnose an intermittent failure, it can be useful to run the same test or group of tests over and over again until you get a failure. You can use py.test's -x option in conjunction with pytest-repeat to force the test runner to stop at the first failure.

Based on pytest-repeat's --count=COUNT command line option.

Enable random test ordering execution (RANDOM_ENABLE)

This option enables random test execution order.

At the moment I'm using the pytest-randomly plugin but there are 3 or 4 similar alternatives I have to try out.

By randomly ordering the tests, the risk of surprising inter-test dependencies is reduced.

Specify a random seed (RANDOM_SEED)

If you get a failure executing a random test, it should be possible to reproduce systematically rerunning the same tests order with the same test data.

Always from the pytest-randomly readme:
By resetting the random seed to a repeatable number for each test, tests can create data based on random numbers and yet remain repeatable, for example factory boy’s fuzzy values. This is good for ensuring that tests specify the data they need and that the tested system is not affected by any data that is filled in randomly due to not being specified.

Play option (PLAY)

This option will be discussed in a dedicated blog post I am going to write.

Basically you are able to paste a JSON serialization of actions and assertions and the pytest runner will be able to execute your test procedure.

You need just a computer with a browser for running any test (API, integration, system, UI, etc). You can paste how to reproduce a bug on a JIRA bug and everyone will be able to paste it on the Jenkins build with parameters form.

See pytest-play for further information.

If you are going to attending next Pycon in Florence don't miss the following pytest-play talk presented by Serena Martinetti:
UPDATE 20180424:

How to create a pytest project

If you are a little bit curious about how to install pytest or create a pytest runner with Jenkins you can have a look at the following scaffolding tool:
It provides a hello world example that let you start with the test technique more suitable for you: plain selenium scripts, BDD or pytest-play JSON test procedures. If you want you can install page objects library. So you can create a QA project in minutes.

Your QA project will be shipped with a Jenkinsfile file that requires a tox-py36 docker executor that provides a python3.6 environment with tox already installed; unfortunately tox-py36 is not yet public so you should implement it by your own at the moment.
Once you provide a tox-py36 docker executor the Jenkinsfile will create for you the build with parameters Jenkins form for you automatically on the very first Jenkins build for your project.


I hope you'll find some useful information in this article: nice to have features for test frameworks or platform, a little bit of curiosity for the Python world or  new pytest plugin you never heard about.

Feedback and contributions are always welcome.

Tweets about test automation and new articles happens here:

April 24, 2018 10:40 AM

Techiediaries - Django

Beginner's Angular 4|5 Tutorial Series for Django Developers

In the previous tutorial, we've learned how to integrate Angular 4 with Python & Django. This tutorial will be dedicated to how to get started with Angular 4|5. Throughout this beginner's series, you'll learn how you can use Angular 4|5 to build client side web applications for mobile and desktop with a Django backend.

This tutorial is a part of a tutorial series that contains the following tutorials:

Angular 5 has been released (on October 2017) so this tutorial series is updated to reflect any updates. This tutorial will provide you with all of the fundamentals to help you get started quickly developing Angular 5 applications without prior knowledge of Angular.

Angular is a powerful front-end Javascript/TypeScript framework developed by Google. It allows you to build structured client side applications and PWAs(Progressive Web Apps).

Prior knowledge of Angular is not required for this tutorial series but you'll need to have a few requirements:

Angular 4 Features

Angular 4 is available and comes with many improvements and new features such as:

<div *ngIf="ready ; else loading">
    <p>Hello Angular 4</p>
<ng-template #loading>Still loading</ng-template> 

If the ready variable is false Angular will show the loading template.

You can also assign and use local variables inside both *ngIf and *ngFor expressions, for example:

<div *ngFor="let el of list as users; " >
        { { el } }

You can find more information on:Angular 4.0.0 now available official Angular blog post and Angular 4.1.0 now available official Angular blog post

Getting Started with Angular 4 / Angular 5

If you want to get started developing Angular 4/5 web applications, you have multiple options:

Before you can install Angular you need to have Node.js and NPM installed on your development machine.

So go ahead and open your terminal and type the following

node -v

If you get the version of an installed Node.js then you already have the platform installed. If the command is unknown by your terminal then you need to install Node.js.

Installing Node.js is easy and straightforward, you just need to visit their official website then grab the installer for your operating system and follow the instructions.

Now if you open your terminal under Linux/MAC or command prompt under Windows and execute

node -v 

You should get an output displaying your Node.js installed version

Updating to Angular 4 from Angular 2

If you have already an Angular 2 project and want to update it to Angular 4, you can do that by simply installing a few npm packages.


Just copy and paste the following command in your prompt

npm install @angular/common@latest @angular/compiler@latest @angular/compiler-cli@latest @angular/core@latest @angular/forms@latest @angular/http@latest @angular/platform-browser@latest @angular/platform-browser-dynamic@latest @angular/platform-server@latest @angular/router@latest @angular/animations@latest typescript@latest --save

Linux and MAC

Copy and execute this on your terminal

npm install @angular/{common,compiler,compiler-cli,core,forms,http,platform-browser,platform-browser-dynamic,platform-server,router,animations}@latest typescript@latest --save 

Installing the Angular CLI

The Angular CLI is a handy command line utility built by the Angular team to easily and quickly generate new Angular applications and serve them locally. It can also be used to generate different Angular constructs such as components, services and pipes etc.

Before you can use the Angular CLI, you need to install it via npm, so go ahead and open your terminal or your command prompt then simply enter:

npm install -g @angular/cli

To check the version of your installed Angular CLI, type:

ng -v
You can also run ng -v from inside an Angular project to get the version of Angular

Generating an Angular 4 / Angular 5 Project Using the Angular CLI

Using the Angular CLI, you can generate an Angular 4+ project with a few commands, the CLI will take care of generating the project files and install all the required dependencies.

Open your terminal or your command prompt then run:

ng new angular4-project 

After finishing the installation enter:

cd angular4-project 
ng serve 

Your project will be served locally from http://localhost:4200.

Generating an Angular 4 from GitHub Repository

You can also clone a quick-start Angular project from GitHub to generate a new Angular 4 project.

So make sure you have Git installed then run the following:

git clone  my-proj
cd my-proj
npm install
npm start

You can find more information here.

Angular 5 Features

Angular 5, code named pentagonal-donut, was just released. It has new features and internal changes which make Angular applications faster and smaller. In this section we will go over the most important changes and instructions on how to upgrade your existing Angular 2+ project to latest version.

ng serve --aot

Changes before Upgrading

If you have an existing Angular 2 or Angular 4 project, you need to make sure you apply some changes to your project's source code before you can upgrade to Angular 5. This is the list of changes that need to be done.

$ npm install @angular/{animations,common,compiler,compiler-cli,core,forms,http,platform-browser,platform-browser-dynamic,platform-server,router}@5.0.0

Getting Started with Angular 5 from Scratch

Fortunately for you, if you already have a previous working experience with Angular 2 or Angular 4, starting a new Angular 5 project is very much the same process.

In case you don't have any previous experience with Angular framework just follow the instructions below to install Angular 5 from scratch.


Before you can install Angular 5, you need to have some prerequisites.

Don't worry both requirements can be installed by going to the official website and download the installer for your operating system.

Next install the latest CLI from npm by running the following command from your terminal:

npm install @angular/cli -g

Once the Angular CLI v1.5.0 is installed on your system. You can create Angular 5 applications using the ng command.

You can check for the installed version of the Angular CLI using:

$ ng -v

You should get an output like:

Angular CLI: 1.5.0
Node: 6.11.4
OS: linux ia32

You can create your first Angular 5 project using one command:

$ ng new a-new-project --style=scss --routing

You can notice the two flags at the end, --style=scss which instructs the Angular CLI to use SCSS for styling and --routing for adding basic routing support to the new Angular project.

Once the project is scaffolded, you can navigate inside your project then serve it.

$ cd a-new-project
$ ng serve

That's it, you now have a new Angular 5 project ready for you to build your next awesome Angular application.

Just like Angular 4, you can also use the quick start project from Github to generate Angular 5 projects.

git clone angular5project
cd angular5project 
npm install
npm start


Thanks to Angular CLI v1.5.0 you can get started with Angular 5 by generating a new project quickly with a variety of flags to customize and control the generation process.

Now that we have created a new project, in the next tutorial, we're going to start learning about the fundamentals of Angular 5 starting with components.

On the previous section we have seen different ways to create a new Angular 4 project or updating an existing Angular 2+ project to use Angular 4.

April 24, 2018 12:00 AM

Django 2 Tutorial for Beginners: Building a CRM

Throughout this beginner's tutorial for Django 2.0, we are going to learn to build web applications with Python and Django. This tutorial assumes no prior experience with Django, so we'll be covering the basic concepts and elements of the Django framework by emphasizing essential theory with practice.

Basically, we are going to learn Django fundamental concepts while building a real world real estate web application starting from the idea to database design to full project implementation and deployment.

This tutorial doesn't only cover fundamental basics of Django but also advanced concepts such as how to use and integrate Django with modern front end frameworks like Angular 2+, Vue and React.

What's Django?

Django is an open source Python based web framework for building web applications quickly.

What's MVC?

MVC is a software architectural design pattern which encourages the separation of concerns and effective collaboration between designers and developers when working on the same project. It basically divides or separates your app into three parts:

Thanks to MVC, you as a developer can work in the model and controller parts without being concerned with the user interface (left to designers) so if anything changes on the side of designers on the user interface, you can rest assured that you will not be affected.

Introduction to Python

Python is a general purpose programing language that's suitable for developing all kind of applications including web applications. Python is known by a clean syntax and a large standard library which contains a wide range of modules that can be used by developers to build their applications instead of reinventing the wheel.

Here is a list of features and characteristics of Python:

For more information you can head to where you can also download Python binaries for supported systems.

For Linux and MAC, Python is included by default so you don't have to install it. For Windows just head over to the official Python website and grab your installer. Just like any normal Windows program, the installation dead process is easy and straightforward.

Why Using Django?

Due to its popularity and large community, Python has numerous web frameworks among them Django. So what makes Django the right choice for you or your next project?

Django is a batteries-included framework

Django includes a set of batteries that can be used to solve common web problems without reinventing the wheel such as:

The Django ORM

Django has a powerful ORM (Object Relational Mapper) which allows developers to use Python OOP classes and methods instead of SQL tables and queries to work with SQL based databases. Thanks to the Django ORM, developers can work with any database system such as MySQL or PostgresSQL without knowing anything about SQL. In the same time the ORM doesn't get in the way. You can write custom SQL anytime you want especially if you need to optimize the queries against your server database for increased performance.

Support for Internationalization: i18n

You can use Django for writing web applications for other languages than English with a lot of ease thanks to its powerful support for internationalization or you can also create multi lingual websites

The Admin Interface

Django is a very suitable framework for quickly building prototypes thanks to its auto-generated admin interface.

You can generate a full fledged admin application that can be used to do all sorts of CRUD operations against your database models you have registered with the admin module using a few lines of code.

Community and Extensive Documentation

Django has a great community that has contributed all sorts of awesome things to Django from tutorials and books to reusable open source packages that extend the core framework to include solutions for even more web development problems without reinventing the wheel or wasting time implementing what other developers have already created.

Django has also one of the most extensive and useful documentation on the web which can gets you up and running with Django in no time.

As a conclusion, if you are looking for a web framework full of features that makes building web applications fun and easy and that has all what you can expect from a modern framework. Django is the right choice for you if you are a Python developer.

In this tutorial part, we are going to see how to install Python and Django on the major available operating systems i.e Windows, Linux and MAC.

Installing Python

Depending on your operating system you may or may not need to install Python. In Linux and MAC OS Python is included by default. You may only need to update it if the installed version is outdated.

Installing Python On Windows

Python is not installed by default on Windows, so you'll need to grab the official installer from the official Python website at Next launch the installer and follow the wizard to install Python just like any other Windows program.

Also make sure to add Python root folder to system path environment variable so you can execute the Python executable from any directory using the command prompt.

Next open a command prompt and type python. You should be presented with a Python Interactive Shell printing the current version of Python and prompting you to enter your Python commands (Python is an interpreted language)

Installing Python on Linux

If you are using a Linux system, there is a great chance that you already have Python installed but you may have an old version. In this case you can very easily update it via your terminal depending on your Linux distribution.

For Debian based distributions, like Ubuntu you can use the apt package manager

sudo apt-get install python

This will update your Python version to the latest available version.

For other Linux distributions you should look for equivalent commands to install or update Python which is not a daunting task if you already use a package manager to install packages for your system then you should follow the same process to install or update Python.

Installing Python on MAC OS

Just like Linux, Python is included by default on MAC but in case you have an old version you should be able to update it by going to []( and grab a Python installer for MAC.

Now if you managed to install or update Python on your own system or in case you have verified that you already have an updated version of Python installed on your system let's continue by installing Django.

Installing PIP

PIP is a Python package manager which's used to install Python packages from Python Package Index which is more advanced than easy_install the default Python package manager that's installed by default when you install Python.

You should use PIP instaed of easy_install whenever you can but for installing PIP itself you should use easy_install. So let's first install PIP:

Open your terminal and enter:

$ sudo easy_install pip

You can now install Django on your system using pip

$ sudo pip install django

While you can do this to install Django, globally on your system, it's strongly not recommend. Instead you need to use a virtual environement to install packages.


virtualenv is a tool that allows you to work with multiple Python projects with different or the same (often different and conflicting versions) requirements on the same system without any problems by creating multiple and isolated virtual environments for Python packages.

Now lets first install virtualenv using pip :

$ sudo pip install virtualenv      

Or you can install virtualenv before even installing pip from its official website.

In this case, you don't need to install pip because it comes installed with virtualenv and gets copied into any virtual environment you create.

Creating a Virtual Environment

After installing virtualenv you can now create your first virtual environment using your terminal:

$ cd ~/where-ever-you-want 
$ virtualenv env 

Next you should activate your virtual environment:

$ source env/bin/activate 

Now you can install any Python package using pip inside your created virtual environment.

Lets install Django!

Installing Django

After creating a new virtual environment and activating it. It's time to install Django using pip

$ pip install django 

Django will only be installed on the activated virtual environment not globally.

Now lets summarize what we have done:

Now that we have installed required development tools including Django framework. It's time for the first real step to start building our real estate application while learning Django essentials from scratch.

Django framework includes a bunch of very useful utilities to create and manage projects that can be accessed from a Python file called that becomes available when we first installed Django.

In this section we are going to see how to:

Create a new Django project

Creating a new Django project is easy and quick so open your terminal or command prompt then enter:

$ startproject crm

This command will take care of creating a bunch of necessary files for the project.

Executing the tree command in the root of our created project will show us the files that were created.

    ├── crm
    │   ├──
    │   ├──
    │   ├──
    │   └──

__init__ is the Python way to mark the containing folder as a Python package which means a Django project is a Python package. is the project configuration file. You can use this file to specify every configuration option of your project such as the installed apps, site language and database options etc. is a special Django file which maps all your web app urls to the views. is necessary for starting a wsgi application server. is another Django utility to manage the project including creating database and starting the local development server.

These are the basic files that you will find in every Django project. Now the next step is to set up and create the database.

Setting Up the Database

Using your favorite code editor or IDE, open your project file and lets configure the database.

        'default': {
            'ENGINE': 'django.db.backends.sqlite3',
            'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),

Django works with multiple database systems from simple to advanced systems (both open source and proprietary) such as SQLite, MySQL,PostgreSQL, SQL Server, Oracle etc.

Also you can switch to any database system whenever you want, even after starting developing your web app, without any problems thanks to Django ORM that abstracts how you can work with any database system.

For the sake of simplicity, we'll be using SQLite since it comes already installed with Python so we actually have our database configuration already set up for development. Next for deployment you can use an advanced database system such as MySQL or PostgreSQL by just editing this configuration option.

Finally we need to tell Django to actually create the database and tables. Even if we didn't create actual code or data for our app yet,Django needs to create many tables for its internal use. So lets create the database.

Creating the database and the tables is a matter of issuing this one command.

$ python migrate 

You should get an output like:

    Operations to perform:
    Apply all migrations: admin, auth, contenttypes, sessions
    Running migrations:
    Applying contenttypes.0001_initial... OK
    Applying auth.0001_initial... OK
    Applying admin.0001_initial... OK
    Applying admin.0002_logentry_remove_auto_add... OK
    Applying contenttypes.0002_remove_content_type_name... OK
    Applying auth.0002_alter_permission_name_max_length... OK
    Applying auth.0003_alter_user_email_max_length... OK
    Applying auth.0004_alter_user_username_opts... OK
    Applying auth.0005_alter_user_last_login_null... OK
    Applying auth.0006_require_contenttypes_0002... OK
    Applying auth.0007_alter_validators_add_error_messages... OK
    Applying auth.0008_alter_user_username_max_length... OK
    Applying sessions.0001_initial... OK

Since we are using a SQLite database, you should also find a sqlite file under the current directory:

    ├── db.sqlite3
    ├── crm
    │   ├──
    │   ├── __init__.pyc
    │   ├──
    │   ├── settings.pyc
    │   ├──
    │   ├── urls.pyc
    │   └──

Starting the local development server

Django has a local development server that can be used while developing your project. It's a simple and primitive server which's suitable only for development not for production.

To start the local server for your project, you can simply issue the following command inside your project root directory:

$ python runserver

Next navigate to http://localhost:8000/ with a web browser.

You should see a web page with a message:

It worked!
Congratulations on your first Django-powered page.

Next, start your first app by running python startapp [app_label].

You're seeing this message because you have DEBUG = True in your Django settings file and you haven't configured any URLs. Get to work!


To conclude this tutorial, lets summarize what we have done: We have created a new Django project, created and migrated a SQLite database and started a local development server. In the next tutorial we are going to start creating our crm prototype.

April 24, 2018 12:00 AM

April 23, 2018

Yasoob Khalid

Reverse Engineering Facebook: Public Video Downloader

In the last post we took a look at downloading songs from Soundcloud. In this post we will take a look at Facebook and how we can create a downloader for Facebook videos. It all started with me wanting to download a video from Facebook which I had the copyrights to. I wanted to automate the process so that I could download multiple videos with just one command. Now there are tools like youtube-dl which can do this job for you but I wanted to explore Facebook’s API myself. Without any further ado let me show you step by step how I approached this project. In this post we will cover downloading public videos. In the next post I will take a look at downloading private videos.

Step 1: Finding a Video

Find a video which you own and have copyrights to. Now there are two types of videos on Facebook. The main type is the public videos which can be accessed by anyone and then there are private videos which are accessible only by a certain subset of people on Facebook. Just to keep things easy, I initially decided to use a public video with plans on expanding the system for private videos afterwards.

Step 2: Recon

In this step we will open up the video in a new tab where we aren’t logged in just to see whether we can access these public videos without being logged in or not. I tried doing it for the video in question and this is what I got:

Apparently we can’t access the globally shared video as well without logging in. However, I remembered that I recently saw a video without being logged in and that piqued my interest. I decided to explore the original video a bit more.

I right-clicked on the original video just to check it’s source and to figure out whether the video url was reconstructable using the original page url. Instead of finding the video source, I found a different url which can be used to share this video. Take a look at these pictures to get a better understanding of what I am talking about:

I tried opening this url in a new window without being logged in and boom! The video opened! Now I am not sure whether it worked just by sheer luck or it really is a valid way to view a video without being logged in. I tried this on multiple videos and it worked each and every single time. Either Way, we have got a way to access the video without logging in and now it’s time to intercept the requests which Facebook makes when we try to play the video.

Open up Chrome developer tools and click on the XHR button just like this:

XHR stands for XMLHttpRequest and is used by the websites to request additional data using Javascript once the webpage has been loaded. Mozilla docs has a good explanation of it:

Use XMLHttpRequest (XHR) objects to interact with servers. You can retrieve data from a URL without having to do a full page refresh. This enables a Web page to update just part of a page without disrupting what the user is doing. XMLHttpRequest is used heavily in Ajax programming.

Filtering requests using XHR allows us to cut down the number of requests we would have to look through. It might not work always so if you don’t see anything interesting after filtering out requests using XHR, take a look at the “all” tab.

The XHR tab was interesting, it did not contain any API request. Instead the very first requested link was the mp4 video itself.

This was surprizing because usually companies like Facebook like to have an intermediate server so that they don’t have to hardcore the mp4 links in the webpage. However, if it is easy for me this way then who am I to complain?

My very next step was to search for this url in the original source of the page and luckily I found it:

This confirmed my suspicions. Facebook hardcores the video url in the original page if you view the page without signing in. We will late see how this is different when you are signed in. The url in current case is found in a <script> tag.


Step 3: Automating it

Now let’s write a Python script to download public videos. The script is pretty simple. Here is the code:

import requests as r
import re
import sys

url = sys.argv[-1]
html = r.get(url)
video_url ='hd_src:"(.+?)"', html.text).group(1)

Save the above code in a file and use it like this:

$ python video_url

Don’t forget to replace video_url with actual video url of this form:

The script gets the video url from the command line. It then opens up the video page using requests and then uses regular expressions to parse the video url from the page. This might not work if the video isn’t available in HD. I leave that up to you to figure out how to handle that case.

That is all for today. I will cover the downloading of your private videos in the next post. That is a bit more involved and requires you logging into Facebook. Follow the blog and stay tuned! If you have any questions/comments/suggestions please use the comment form or email me.

Have a great day!

April 23, 2018 10:09 PM

Reverse Engineering Facebook API: Private Video Downloader

Welcome back! This is the third post in the reverse engineering series. The first post was reverse engineering Soundcloud API and the second one was reverse engineering Facebook API to download public videos. In this post we will take a look at downloading private videos. We will reverse engineer the API calls made by Facebook and will try to figure out how we can download videos in the HD format (when available).

Step 1: Recon

The very first step is to open up a private video in an incognito tab just to make sure we can not access it without logging it. This should be the response from Facebook:

This confirms that we can not access the video without logging in. Sometimes this is pretty obvious but it doesn’t hurt to check.

We know of our first step. It is to figure out a way to log-into Facebook using Python. Only after that can we access the video. Let’s login using the browser and check what information is required to log-in.

I won’t go into much detail for this step. The gist is that while logging in, the desktop website and the mobile website require roughly the same POST parameters but interestingly if you log-in using the mobile website you don’t have to supply a lot of additional information which the desktop website requires. You can get away with doing a POST request to the following URL with your username and password:

We will later see that the subsequent API requests will require a fb_dtsg parameter. The value of this parameter is embedded in the HTML response and can easily be extracted using regular expressions or a DOM parsing library.

Let’s continue exploring the website and the video API and see what we can find.

Just like what we did in the last post, open up the video, monitor the XHR requests in the Developer Tools and search for the MP4 request.

Next step is to figure out where the MP4 link is coming from. I tried searching the original HTML page but couldn’t find the link. This means that Facebook is using an XHR API request to get the URL from the server. We need to search through all of the XHR API requests and check their responses for the video URL. I did just that and the response of the third API request contained the MP4 link:

The API request was a POST request and the url was:

I tried to deconstruct the URL. The major dynamic parts of the URL seem to be the originalmediaid and storyidentifier. I searched the original HTML page and found that both of these were there in the original video page. We also need to figure out the POST data sent with this request. These are the parameters which were sent:

__user: <---redacted-->
__a: 1
__dyn: <---redacted-->
__req: 3
__be: 1
__rev: <---redacted-->
fb_dtsg: <---redacted-->
jazoest: <---redacted-->
__spin_r:  <---redacted-->
__spin_b:  <---redacted-->
__spin_t:  <---redacted-->

I have redacted most of the stuff so that my personal information is not leaked. But you get the idea. I again searched the HTML page and was able to find most of the information in the page. There was certain information which was not in the HTML page like jazoest but as we move along you will see that we don’t really need it to download the video. We can simply send an empty string in its place.

It seems like we have all the pieces we need to download a video. Here is an outline:

  1. Open the Video after logging in
  2. Search for the parameters in the HTML response to craft the API url
  3. Open the API url with the required POST parameters
  4. Search for hd_src or sd_src in the response of the API request

Now lets create a script to automate these tasks for us.

Step 2: Automate it

The very first step is to figure out how the login takes place. In the recon phase I mentioned that you can easily log-in using the mobile website. We will do exactly that. We will log-in using the mobile website and then open the homepage using the authenticated cookies so that we can extract the fb_dtsg parameter from the homepage for subsequent requests.

import requests 
import re
import urllib.parse

email = ""
password = ""

session = requests.session()
  'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:39.0) Gecko/20100101 Firefox/39.0'
response = session.get('')
response ='', data={
  'email': email,
  'pass': password
}, allow_redirects=False)

Replace the email and password variable with your email and password and this script should log you in. How do we know whether we have successfully logged in? We can check for the presence of ‘c_user’ key in the cookies. If it exists then the login has been successful.

Let’s check that and extract the fb_dtsg from the homepage. While we are at that let’s extract the user_id from the cookies as well because we will need it later.

if 'c_user' in response.cookies:
    # login was successful
    homepage_resp = session.get('')
    fb_dtsg ='name="fb_dtsg" value="(.+?)"', homepage_resp.text).group(1)
    user_id = response.cookies['c_user']

So now we need to open up the video page, extract all of the required API POST arguments from it and do the POST request.

if 'c_user' in response.cookies:
    # login was successful
    homepage_resp = session.get('')
    fb_dtsg ='name="fb_dtsg" value="(.+?)"', homepage_resp.text).group(1)
    user_id = response.cookies['c_user']
    video_url = ""
    video_id ='videos/(.+?)/', video_url).group(1)

    video_page = session.get(video_url)
    identifier ='ref=tahoe","(.+?)"', video_page.text).group(1)
    final_url = "{0}/?chain=true&isvideo=true&originalmediaid={0}&playerorigin=permalink&playersuborigin=tahoe&ispermalink=true&numcopyrightmatchedvideoplayedconsecutively=0&storyidentifier={1}&dpr=2".format(video_id,identifier)
    data = {'__user': user_id,
            '__a': '',
            '__dyn': '',
            '__req': '',
            '__be': '',
            '__pc': '',
            '__rev': '',
            'fb_dtsg': fb_dtsg,
            'jazoest': '',
            '__spin_r': '',
            '__spin_b': '',
            '__spin_t': '',
    api_call =, data=data)
        final_video_url ='hd_src":"(.+?)",', api_call.text).group(1)
    except AttributeError:
        final_video_url ='sd_src":"(.+?)"', api_call.text).group(1)

You might be wondering what the data dictionary is doing and why there are a lot of keys with empty values. Like I said during the recon process, I tried making successful POST requests using the minimum amount of data. As it turns out Facebook only cares about fb_dtsg and the __user key. You can let everything else be an empty string. Make sure that you do send these keys with the request though. It doesn’t work if the key is entirely absent.

At the very end of the script we first search for the HD source and then the SD source of the video. If HD source is found we output that and if not then we output the SD source.

Our final script looks something like this:

import requests 
import re
import urllib.parse
import sys

email = sys.argv[-2]
password = sys.argv[-1]

print("Email: "+email)
print("Pass:  "+password)

session = requests.session()
  'User-Agent': 'Mozilla/5.0 (X11; Linux i686; rv:39.0) Gecko/20100101 Firefox/39.0'
response = session.get('')
response ='', data={
  'email': email,
  'pass': password
}, allow_redirects=False)

if 'c_user' in response.cookies:
    # login was successful
    homepage_resp = session.get('')
    fb_dtsg ='name="fb_dtsg" value="(.+?)"', homepage_resp.text).group(1)
    user_id = response.cookies['c_user']
    video_url = sys.argv[-3]
    print("Video url:  "+video_url)
    video_id ='videos/(.+?)/', video_url).group(1)

    video_page = session.get(video_url)
    identifier ='ref=tahoe","(.+?)"', video_page.text).group(1)
    final_url = "{0}/?chain=true&isvideo=true&originalmediaid={0}&playerorigin=permalink&playersuborigin=tahoe&ispermalink=true&numcopyrightmatchedvideoplayedconsecutively=0&storyidentifier={1}&dpr=2".format(video_id,identifier)
    data = {'__user': user_id,
            '__a': '',
            '__dyn': '',
            '__req': '',
            '__be': '',
            '__pc': '',
            '__rev': '',
            'fb_dtsg': fb_dtsg,
            'jazoest': '',
            '__spin_r': '',
            '__spin_b': '',
            '__spin_t': '',
    api_call =, data=data)
        final_video_url ='hd_src":"(.+?)",', api_call.text).group(1)
    except AttributeError:
        final_video_url ='sd_src":"(.+?)"', api_call.text).group(1)


I made a couple of changes to the script. I used sys.argv to get video_url, email and password from the command line. You can hardcore your username and password if you want.

Save the above file as and run it like this:

$ python video_url email password

Replace video_url with the actual video url like this and replace the email and password with your actual email and password.

After running this script, it will output the source url of the video to the terminal. You can open the URL in your browser and from there you should be able to right-click and download the video easily.

I hope you guys enjoyed this quick tutorial on reverse engineering the Facebook API for making a video downloader. If you have any questions/comments/suggestions please put them in the comments below or email me. I will look at reverse engineering a different website for my next post. Follow my blog to stay updated!

Thanks! Have a great day!


April 23, 2018 10:08 PM

Tryton News

New Tryton release 4.8

We are proud to announce the 4.8 release of Tryton. This is the last release that will support Python2, as decided on the last Tryton Unconference, next versions will be only Python3 compatible.

In this way we introduced a new way of dynamic reporting. For now it's only available on sale module, but the plan is to extend it to more modules in newer releases. The effort to make all the Desktop client features available on the Web client has continued on this release. This resulted in fixing many small details and adding some missing features on the web client. Of course this release also includes many bug fixes and performance improvements. We added Persian as an official language for Tryton.

As usual the migration from previous series is fully supported. Some manual operation may be required, see migration from 4.6 to 4.8.

Major changes for the user

  • The clients show a toggle button next to the search input for all models that can be deactivated. This allows the user to search for deactivated records and to know that the model is deactivable.
Sao list with inactive records option
  • Until now, when changes on a record from a pop-up window were cancelled, the client resets it using the stored value from the server or deletes it if it was not yet stored. Now, the clients will restore the record to the state it had before opening the pop-up, which is a more expected behaviour for the user.
  • It's no longer possible to expand a node that has too much records. This is needed to prevent to consume all the resources of the client. In such case the client will switch to the form view where normally the children will be displayed in a list view which supports loading records on the fly when scrolling.
  • To help companies to be compliant with the right to erasure from the GDPR, a new wizard to erase a party has been developed. It erases personal information linked to the party like the name, addresses, contact mechanisms etc. It removes also those data from the history tables. Each module adds checks to prevent erasure if pending documents for the party still exist.
  • A name has been added to the contact mechanism. It can be used for example to indicate the name of the recipient of an email address or to distinguish between the phone number of reception, accounting and warehouse.
  • The default search on party will now also use contact mechanism values.
  • Similar to the design of the addresses which can be flagged for invoice or delivery usage, the contact mechanism received the same feature. So the code may now request a contact mechanism of a specific type. For example, it is now possible to define which email address of a party should be used to send the invoices.
  • All the matching criteria against product categories have been unified between all the modules. Any product category will match against itself or any parent category. This is the chosen behavior because it is the least astonishing.


  • The desktop client already has mnemonic for all button but now they are also added to all field labels. This allow to jump quickly to any visible field by pressing ALT + <the underlined key>.
Tryton form with mnemonic
  • The desktop client has a new option to check if a new bug fix version has been published. It takes care of the notification on Windows and MacOS.
Tryton with the notification of a new version available
  • The Many2One fields in editable tree now show the icons to open the related record or clear the selection. This unifies the behaviour with the form view.
Tryton with Many2One icons on editable tree


  • Numeric values are now formatted with the user locale and use an input of type 'number' for editing. This allows to have the right virtual keyboard on mobile devices.
  • The web client finally receives the label that positions the selected record in the list and shows the number of records in the list.
Sao toolbar with selected record label
  • The spell checking is now activated by the browser, so fields with the spell attribute defined will have spell checking activated.
  • The buttons of widgets are now skipped from tab navigation in the web client. The actions of those buttons are available via keyboard shortcuts.
  • The management of editable list/tree has been completely reworked. Now the full row becomes editable on the first click. The focus is kept on the line if it is not valid. The editing is stopped when the user clicks anywhere outside the view.
  • The sum list feature has been implemented in sao.
Sao sum of move's credit and debit
  • The same shortcuts of the Date and DateTime widgets available on tryton can now be used on web client.
  • Many2One fields are displayed on tree view as clickable link which opens the related record form view to allow quick edition.

We have pushed many small improvements which fix small jump of elements of the page:


  • The general ledger accounts are now opened from the Income Statement rows. This allows to see the details of the computed amounts.
  • It happens that users need to customize the configuration of the chart of account that comes from a template. Until now, this would prevent any further update without loosing this customization. Now, the records that are synchronized with a template are read-only by default. A check box allows to edit the value and thus remove the record from the update process.
  • Users may create a second chart of account by mistake. There are very rare cases when such creation is valid. As it is a complex task to correct such mistake, we added a warning when creating a second chart of account.
  • Now an error is raised when closing a period if there are still asset lines running for it.
  • Until now, only one tax code was allowed per tax. This was too restrictive. For some country it was needed to create null children taxes to allow more complex reporting. Now, tax codes are no longer defined on the tax but instead they contain a list of tax lines. Those lines can define the base or the tax amount. On the report, the lines of each tax code are summed per period. All chart of accounts have been updated to follow this design.

Tax report on cash basis

The new account_tax_cash module allows to report taxes based on cash. The groups of taxes to report on cash basis are defined on the Fiscal Year or Period. But they can also be defined on the invoices per supplier.

The implementation of this new module also improved existing modules. The tax lines of closed period are verified against modification. The registration of payment on the invoice is limited to the amount of the invoice.

Spanish chart of account

The module, which was published for the first time in the last series 4.6, needs a deep cleaning. The last changes in the accounting modules raised concerns about choices made for this chart. So it was decided to temporary exclude the module from the release process and to not guarantee a migration path. The work to fix the module has started and we expect to be able to release a fixed version soon.


  • The description on invoice line is now optional. If a product is set the invoice report will show the product name instead of the line description.
  • The reconciliation date is now shown on the invoice instead of the Boolean reconciled. This provides more information for a single field.
  • The Move lines now show which invoice they pay.
  • An error is raised when trying to overpay an invoice.


A cron task has been added that will post automatically the clearing moves after a delay. The delay is configured on the payment journal.

Stripe Payments

  • If the customer disputes the payment, a dispute status will be update on the Tryton payment. When the dispute is closed, the payment is updated according if the company wins or loses.
  • Some missing charge events has been added. In particular, the refund event which may update the payment amount when it is partial.



This new module adds the automatic import of OFX file as statement. The OFX format is a common format which is supported in various countries, like the United States.


  • The stock quantity was only computed per product because it is the column stored in the stock move. But it may be useful for some cases to compute the stock quantity per template. Indeed products from the same template share the same default unit of measure so their quantities can be summed. This release adds on the products_by_location method the possibility to group by product columns like the template, but also a relate action from the template which show quantity per locations.
  • When there is a very large number of locations, the tree Locations Quantity becomes difficult to use. Especially if you are searching in which location a product is. So we added the option to open this view as a list, this way it is possible to search location by quantity of the product.
List of locations with product stock
  • We found that there are two different expectation from users about the default behavior of the inventory when the quantity is not explicitly set. Some expect that the product quantity should be considered as 0. And others expect that the product quantity is not changed. So we added an option on the inventory to choose the behavior when an inventory line has no quantity.
  • Until now, the assignation process for supplier return shipment was not using children location but this did not work if the location was a view. Now we assign using the children if the location is a view.
  • The supplier shipment support to receive the goods directly in the storage location. This way the inventory step is skipped.


  • Until now, only sub-projects having the same party were invoiced. Now an invoice for each different party will be created.


  • The description on sale line is now optional. This prevents to copy the product name to sale description as it is now shown on the sale report.
  • In case a different product is shipped to the customer if the invoice method is based on shipment, this shipped product will be used for the invoice. Previously only the initially sold product was always used.
  • Now it is possible to edit the header fields thanks to the new Wizard which takes care of recompute the lines according to the changes.
Sale with modify wizard running
  • Reports on aggregated data has been added to the sale module. The report engine allows to browse the Revenue and Number of sale per:

    • Customer
    • Product
    • Category
    • Country > Subdivision

    Those data are over Period, Company and Warehouse. The reports also show a sparkline for the revenue trend which can be drilled down.

Sales per customer with sparklines Sale revenue per customer graph
  • The sale with the shipment method On Invoice Paid will create the purchase requests and/or drop shipments when the lines are fully paid. Before they were created directly on validation.
  • The shipment cost is not more computed when returning goods.


This new module allows to create coupons that are used to apply a promotion on the sale. The coupon can be configured to be usable only a specific number of times globally or per party.


  • The product supplier can be used now on the purchase line. This allows to display the supplier definition of this product.
  • Now it is possible to edit the header fields thanks to the new Wizard which takes care of recompute the lines according to the changes.
  • The description on purchase line is now optional. This prevents to copy the product name to purchase description as it now shown on the purchase report. The same change have been applied on purchase requests and requisitions.
  • In case a different product is received from the supplier if the invoice method is based on shipment the received product will be used on the invoice. Previously the purchased product was always used.
  • The user is warned if he tries to confirm a purchase order for a different warehouse than the warehouse of the linked purchase request.

Purchase request quotation

  • This new module allows to manage requests for quotation to different suppliers. Each request will collect quotation information from the supplier. The preferred quotation will be used to create the purchase.


  • Now it is possible to filter which type of email to use for sending the notification.
  • The email notification skip the recipients if the target field is empty. For example if a notification is defined on the Invoice with the Party as recipient and the Party has not an email address, then the invoice will not be sent. Adding a fallback recipients the email is sent to specific user email which could be a secretary which will be in charge of sending it or a mailbox for a printer which will print it automatically etc.


The following modules have received tooltips:

  • account_credit_limit
  • account_dunning
  • carrier
  • carrier_weight
  • currency
  • product_attribute

Major changes for the developer

  • Starting from this release the tryton client will only support the version 3 of GTK+-3. This will allow to migrate it to Python3.
  • The group widget can be defined as expandable by adding the attribute expandable. If the value is 1, it starts expanded and if the value is 0, it starts unexpanded. Both clients support it.
  • To ensure that all buttons may have their access rights configured a new test has been added. We added also the string, help and confirm attributes to ir.model.button. So they can be shared between different views.
  • The monetary format is now defined on the language instead of the language. According to User Experience best practices the amount should be displayed in the user language format event if it's a foreign currency.
  • It's now possible to manually define an exceptional parent of a language. This allows to use a custom monetary format for each country of the Latin American language.
  • Dict fields are now stored using it's canonical representation. This allows to make equity comparison between them.
  • The language formatting has been simplified to expose the instance methods: format, currency and strftime. A classmethod get is added to return the language instance of the code or the transaction language.
  • The previous API for session management was based on the ORM methods. This makes more complicated to implement alternative session manager. We created a simplified API agnostic to the ORM: new, remove, check and reset.
  • If the database has the required features (for PostgreSQL: the unaccent function), the ilike search will be performed on unaccented strings per default on all Char. This can be deactivated by setting the attribute Char.search_unaccented to False.
  • We have added the support for EXCLUDE constraints. An EXCLUDE constraint is a kind of extension to the UNIQUE constraint which can be applied on a subset of the rows and on expression instead of only columns. For more information, please read the EXCLUDE documentation of PostgreSQL.
  • It is now possible for a module to register classes to the pool only if a specified sets of modules is activated. This replaces the previous silent skip. Existing modules that were relying on the old behaviour must be updated to use the depends keyword otherwise they will crash at start up.
  • Sometimes a module depends optionally on another but it may need to fill from the XML record a value for a field that is defined on the optional module. We added a depends keyword on the field which ignores it if the list of modules is not activated.
  • The clients now support the definition of a specific order and context when searching from a field like Many2One, Reference, One2Many etc. This is useful to have preferred record on top of the list of found records.
  • A new mixin has been added to add logical suppression to a Model. But also we ensure that the client is aware that the model is deactivable. All the modules have been updated to use this new mixin.
  • The context model name is now available on the screen context. This allows for example to change the behaviour of a wizard depending on the context model.
  • Tryton prevents by default to modify records that are part of the setup (which are created by XML file in the modules). This requires to make a query on the ModelData table on each modification or deletion. But usually only a few models have such records, so we now put in memory the list of models that should be checked. This allows to skip the query for most of the models and thus save some queries.
  • Buttons can be defined with PYSON expressions for the invisible or readonly attributes. Some times, the developer wants to be sure that the field used in the PYSON expressions are read by the client. A depends attributes have been added which ensure that the listed fields will be included in all the view where the button is displayed.
  • The administrator can now reset the password of the user with a simple click. The server will generate a random reset password which is available for 1 day by default and send it by email to the user. This reset password is only valid until the user has set a new password. It is also possible to reset this way the admin password using the trytond-admin command line tool.
  • The context has a new optional key _request which contains some information like the remote address, the HTTP host etc. Those values are correctly setup if the server run behind a proxy which set the X-Forwarded headers.
  • A malicious hacker could flood the LoginAttempt table by sending failing request for different logins. Even if the size of the record is limited and the records are purged frequently. The server now limits also the number of attempt per IP network. The size of the network can be configured for each version (IPv4 and IPv6). This are only the last level of protection, it is still recommended to use a proxy and to set up IDS.
  • The name attribute of the tag image can now be a field. In this case, it will display the icon from the value of the field.
  • Now it is possible to extend with the same Mixin all the existing pool objects that are subclasses of a target. An usage example is to extend the Report.convert method of all existing reports to add support for another engine.
  • We have decided to remove the MySQL backend from the core of Tryton. The back-end was not tested on our continuous integration server and so it has many failures like not supporting Python3. The current code has been move to its own repository. This module will not be part of the release process until some volunteer make it green on the test server.
  • The current form buttons are automatically added to the toolbar under the action menu for fast access. Now the developer can define under which toolbar menu the button will appear.
  • Tryton now uses LibreOffice instead of unoconv for converting between report formats. There were some issues with unoconv, which where fixed by using libreoffice directly. Now we publish also docker images with the suffix -office which contains all the requirements for the report conversion.
  • A new Currency.currency_rate_sql method has been added which returns a SQL query that produces for each currency the rate, start_date and end_date. This is useful to get a currency rate in a larger SQL query. This method uses the window functions if available on the database back-end to produce optimized query.
  • Since the introduction of context management in proteus, the client library, the context management was taken from different places in an inconsistent way. We changed the library to always use the context and configuration at the time the instance was created. Some testing scenario may need some adjustment as they could rely on the previous behavior.


  • The previous API to reconcile lines allowed only to create one reconciliation at a time. But as this can trigger the processing of the invoice for example, it can be a bottleneck if you are reconciling a lot of different lines like a statement can do. So the API has been improved in the most backward compatible way to allow to create many reconciliation at once.
  • The invoice now has a method to add and remove payments which should always be used.


  • The limit of authentication request per per network is also applied to the web users.
  • Thanks to the implementation of the exclude constraint, the uniqueness of the email of web user only applies to the active users.

April 23, 2018 06:00 PM

Sandipan Dey

Implementing a Soft-Margin Kernelized Support Vector Machine Binary Classifier with Quadratic Programming in R and Python

In this article, couple of implementations of the support vector machine binary classifier with quadratic programming libraries (in R and python respectively) and application on a few datasets are going to be discussed.  The following video lectures / tutorials / links have been very useful for the implementation: this one from MIT AI course this … Continue reading Implementing a Soft-Margin Kernelized Support Vector Machine Binary Classifier with Quadratic Programming in R and Python

April 23, 2018 04:51 PM

Real Python

Python 3's pathlib Module: Taming the File System

Have you struggled with file path handling in Python? In Python 3.4 and above, the struggle is now over! You no longer need to scratch your head over code like:

>>> path.rsplit('\\', maxsplit=1)[0]

Or cringe at the verbosity of:

>>> os.path.isfile(os.path.join(os.path.expanduser('~'), 'realpython.txt'))

In this tutorial, you will see how to work with file paths—names of directories and files—in Python. You will learn new ways to read and write files, manipulate paths and the underlying file system, as well as see some examples of how to list files and iterate over them. Using the pathlib module, the two examples above can be rewritten using elegant, readable, and Pythonic code like:

>>> path.parent
>>> (pathlib.Path.home() / 'realpython.txt').is_file()

Free PDF Download: Python 3 Cheat Sheet

The Problem With Python File Path Handling

Working with files and interacting with the file system are important for many different reasons. The simplest cases may involve only reading or writing files, but sometimes more complex tasks are at hand. Maybe you need to list all files in a directory of a given type, find the parent directory of a given file, or create a unique file name that does not already exist.

Traditionally, Python has represented file paths using regular text strings. With support from the os.path standard library, this has been adequate although a bit cumbersome (as the second example in the introduction shows). However, since paths are not strings, important functionality is spread all around the standard library, including libraries like os, glob, and shutil. The following example needs three import statements just to move all text files to an archive directory:

import glob
import os
import shutil

for file_name in glob.glob('*.txt'):
    new_path = os.path.join('archive', file_name)
    shutil.move(file_name, new_path)

With paths represented by strings, it is possible, but usually a bad idea, to use regular string methods. For instance, instead of joining two paths with + like regular strings, you should use os.path.join(), which joins paths using the correct path separator on the operating system. Recall that Windows uses \ while Mac and Linux use / as a separator. This difference can lead to hard-to-spot errors, such as our first example in the introduction working for only Windows paths.

The pathlib module was introduced in Python 3.4 (PEP 428) to deal with these challenges. It gathers the necessary functionality in one place and makes it available through methods and properties on an easy-to-use Path object.

Early on, other packages still used strings for file paths, but as of Python 3.6, the pathlib module is supported throughout the standard library, partly due to the addition of a file system path protocol. If you are stuck on legacy Python, there is also a backport available for Python 2.

Time for action: let us see how pathlib works in practice.

Creating Paths

All you really need to know about is the pathlib.Path class. There are a few different ways of creating a path. First of all, there are classmethods like .cwd() (Current Working Directory) and .home() (your user’s home directory):

>>> import pathlib
>>> pathlib.Path.cwd()

Note: Throughout this tutorial, we will assume that pathlib has been imported, without spelling out import pathlib as above. As you will mainly be using the Path class, you can also do from pathlib import Path and write Path instead of pathlib.Path.

A path can also be explicitly created from its string representation:

>>> pathlib.Path(r'C:\Users\gahjelle\realpython\file.txt')

A little tip for dealing with Windows paths: on Windows, the path separator is a backslash, \. However, in many contexts, backslash is also used as an escape character in order to represent non-printable characters. To avoid problems, use raw string literals to represent Windows paths. These are string literals that have an r prepended to them. In raw string literals the \ represents a literal backslash: r'C:\Users'.

A third way to construct a path is to join the parts of the path using the special operator /. The forward slash operator is used independently of the actual path separator on the platform:

>>> pathlib.Path.home() / 'python' / 'scripts' / ''

The / can join several paths or a mix of paths and strings (as above) as long as there is at least one Path object. If you do not like the special / notation, you can do the same thing with the .joinpath() method:

>>> pathlib.Path.home().joinpath('python', 'scripts', '')

Note that in the preceding examples, the pathlib.Path is represented by either a WindowsPath or a PosixPath. The actual object representing the path depends on the underlying operating system. (That is, the WindowsPath example was run on Windows, while the PosixPath examples have been run on Mac or Linux.) See the section Operating System Differences for more information.

Reading and Writing Files

Traditionally, the way to read or write a file in Python has been to use the built-in open() function. This is still true as the open() function can use Path objects directly. The following example finds all headers in a Markdown file and prints them:

path = pathlib.Path.cwd() / ''
with open(path, mode='r') as fid:
    headers = [line.strip() for line in fid if line.startswith('#')]

An equivalent alternative is to call .open() on the Path object:

with'r') as fid:

In fact, is calling the built-in open() behind the scenes. Which option you use is mainly a matter of taste.

For simple reading and writing of files, there are a couple of convenience methods in the pathlib library:

Each of these methods handles the opening and closing of the file, making them trivial to use, for instance:

>>> path = pathlib.Path.cwd() / ''
>>> path.read_text()
<the contents of the>

Paths can also be specified as simple file names, in which case they are interpreted relative to the current working directory. The following example is equivalent to the previous one:

>>> pathlib.Path('').read_text()
<the contents of the>

The .resolve() method will find the full path. Below, we confirm that the current working directory is used for simple file names:

>>> path = pathlib.Path('')
>>> path.resolve()
>>> path.resolve().parent == pathlib.Path.cwd()

Note that when paths are compared, it is their representations that are compared. In the example above, path.parent is not equal to pathlib.Path.cwd(), because path.parent is represented by '.' while pathlib.Path.cwd() is represented by '/home/gahjelle/realpython/'.

Picking Out Components of a Path

The different parts of a path are conveniently available as properties. Basic examples include:

Here are these properties in action:

>>> path
>>> path.stem
>>> path.suffix
>>> path.parent
>>> path.parent.parent
>>> path.anchor

Note that .parent returns a new Path object, whereas the other properties return strings. This means for instance that .parent can be chained as in the last example or even combined with / to create completely new paths:

>>> path.parent.parent / ('new' + path.suffix)

The excellent Pathlib Cheatsheet provides a visual representation of these and other properties and methods.

Moving and Deleting Files

Through pathlib, you also have access to basic file system level operations like moving, updating, and even deleting files. For the most part, these methods do not give a warning or wait for confirmation before information or files are lost. Be careful when using these methods.

To move a file, use either .rename() or .replace(). The difference between the two methods is that the latter will overwrite the destination path if it already exists, while the behavior of .rename() is more subtle. An existing file will be overwritten if you have permission to overwrite it.

When you are renaming files, useful methods might be .with_name() and .with_suffix(). They both return the original path but with the name or the suffix replaced, respectively.

For instance:

>>> path
>>> path.with_suffix('.py')

Directories and files can be deleted using .rmdir() and .unlink() respectively. (Again, be careful!)


In this section, you will see some examples of how to use pathlib to deal with simple challenges.

Counting Files

There are a few different ways to list many files. The simplest is the .iterdir() method, which iterates over all files in the given directory. The following example combines .iterdir() with the collections.Counter class to count how many files there are of each filetype in the current directory:

>>> import collections
>>> collections.Counter(p.suffix for p in pathlib.Path.cwd().iterdir())
Counter({'.md': 2, '.txt': 4, '.pdf': 2, '.py': 1})

More flexible file listings can be created with the methods .glob() and .rglob() (recursive glob). For instance, pathlib.Path.cwd().glob('*.txt') returns all files with a .txt suffix in the current directory. The following only counts filetypes starting with p:

>>> import collections
>>> collections.Counter(p.suffix for p in pathlib.Path.cwd().glob('*.p*'))
Counter({'.pdf': 2, '.py': 1})

Display a Directory Tree

The next example defines a function, tree(), that will print a visual tree representing the file hierarchy, rooted at a given directory. Here, we want to list subdirectories as well, so we use the .rglob() method:

def tree(directory):
    print(f'+ {directory}')
    for path in sorted(directory.rglob('*')):
        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth
        print(f'{spacer}+ {}')

Note that we need to know how far away from the root directory a file is located. To do this, we first use .relative_to() to represent a path relative to the root directory. Then, we count the number of directories (using the .parts property) in the representation. When run, this function creates a visual tree like the following:

>>> tree(pathlib.Path.cwd())
+ /home/gahjelle/realpython
    + directory_1
    + directory_2
        + file_b.pdf
    + file_1.txt
    + file_2.txt

Note: The f-strings only work in Python 3.6 and later. In older Pythons, the expression f'{spacer}+ {}' can be written '{0}+ {1}'.format(spacer,

Find the Last Modified File

The .iterdir(), .glob(), and .rglob() methods are great fits for generator expressions and list comprehensions. To find the file in a directory that was last modified, you can use the .stat() method to get information about the underlying files. For instance, .stat().st_mtime gives the time of last modification of a file:

>>> from datetime import datetime
>>> time, file_path = max((f.stat().st_mtime, f) for f in directory.iterdir())
>>> print(datetime.fromtimestamp(time), file_path)
2018-03-23 19:23:56.977817 /home/gahjelle/realpython/test001.txt

You can even get the contents of the file that was last modified with a similar expression:

>>> max((f.stat().st_mtime, f) for f in directory.iterdir())[1].read_text()
<the contents of the last modified file in directory>

The timestamp returned from the different .stat().st_ properties represents seconds since January 1st, 1970. In addition to datetime.fromtimestamp, time.localtime or time.ctime may be used to convert the timestamp to something more usable.

Create a Unique File Name

The last example will show how to construct a unique numbered file name based on a template. First, specify a pattern for the file name, with room for a counter. Then, check the existence of the file path created by joining a directory and the file name (with a value for the counter). If it already exists, increase the counter and try again:

def unique_path(directory, name_pattern):
    counter = 0
    while True:
        counter += 1
        path = directory / name_pattern.format(counter)
        if not path.exists():
            return path

path = unique_path(pathlib.Path.cwd(), 'test{:03d}.txt')

If the directory already contains the files test001.txt and test002.txt, the above code will set path to test003.txt.

Operating System Differences

Earlier, we noted that when we instantiated pathlib.Path, either a WindowsPath or a PosixPath object was returned. The kind of object will depend on the operating system you are using. This feature makes it fairly easy to write cross-platform compatible code. It is possible to ask for a WindowsPath or a PosixPath explicitly, but you will only be limiting your code to that system without any benefits. A concrete path like this can not be used on a different system:

>>> pathlib.WindowsPath('')
NotImplementedError: cannot instantiate 'WindowsPath' on your system

There might be times when you need a representation of a path without access to the underlying file system (in which case it could also make sense to represent a Windows path on a non-Windows system or vice versa). This can be done with PurePath objects. These objects support the operations discussed in the section on Path Components but not the methods that access the file system:

>>> path = pathlib.PureWindowsPath(r'C:\Users\gahjelle\realpython\file.txt')
>>> path.parent
>>> path.exists()
AttributeError: 'PureWindowsPath' object has no attribute 'exists'

You can directly instantiate PureWindowsPath or PurePosixPath on all systems. Instantiating PurePath will return one of these objects depending on the operating system you are using.

Paths as Proper Objects

In the introduction, we briefly noted that paths are not strings, and one motivation behind pathlib is to represent the file system with proper objects. In fact, the official documentation of pathlib is titled pathlib — Object-oriented filesystem paths. The Object-oriented approach is already quite visible in the examples above (especially if you contrast it with the old os.path way of doing things). However, let me leave you with a few other tidbits.

Independently of the operating system you are using, paths are represented in Posix style, with the forward slash as the path separator. On Windows, you will see something like this:

>>> pathlib.Path(r'C:\Users\gahjelle\realpython\file.txt')

Still, when a path is converted to a string, it will use the native form, for instance with backslashes on Windows:

>>> str(pathlib.Path(r'C:\Users\gahjelle\realpython\file.txt'))

This is particularly useful if you are using a library that does not know how to deal with pathlib.Path objects. This is a bigger problem on Python versions before 3.6. For instance, in Python 3.5, the configparser standard library can only use string paths to read files. The way to handle such cases is to do the conversion to a string explicitly:

>>> from configparser import ConfigParser
>>> path = pathlib.Path('config.txt')
>>> cfg = ConfigParser()
>>>                     # Error on Python < 3.6
TypeError: 'PosixPath' object is not iterable
>>>                # Works on Python >= 3.4

In Python 3.6 and later it is recommended to use os.fspath() instead of str() if you need to do an explicit conversion. This is a little safer as it will raise an error if you accidently try to convert an object that is not pathlike.

Possibly the most unusual part of the pathlib library is the use of the / operator. For a little peek under the hood, let us see how that is implemented. This is an example of operator overloading: the behavior of an operator is changed depending on the context. You have seen this before. Think about how + means different things for strings and numbers. Python implements operator overloading through the use of double underscore methods (a.k.a. dunder methods).

The / operator is defined by the .__truediv__() method. In fact, if you take a look at the source code of pathlib, you’ll see something like:

class PurePath(object):

    def __truediv__(self, key):
        return self._make_child((key,))


Since Python 3.4, pathlib has been available in the standard library. With pathlib, file paths can be represented by proper Path objects instead of plain strings as before. These objects make code dealing with file paths:

In this tutorial, you have seen how to create Path objects, read and write files, manipulate paths and the underlying file system, as well as some examples of how to iterate over many file paths.

Free PDF Download: Python 3 Cheat Sheet

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

April 23, 2018 02:00 PM

Mike Driscoll

PyDev of the Week: Stacy Morse

This week we welcome Stacy Morse (@geekgirlbeta) as our PyDev of the Week! Stacy loves Python and has been writing about it on her blog as well as giving talks at various user groups and conferences. You can catch her at PyCon 2018 in Ohio this year where she will be talking about code reviews. Let’s take a few moments to get to know her better!

Can you tell us a little about yourself (hobbies, education, etc):

I have a degree in Art, concentration in Photography and design. I like to spend as much time as I can hiking and taking macro photographs of moss and the natural life cycle of the forest.

I also like to build. Anything from projects using micro-controllers to elaborate sewing projects.

Why did you start using Python?

I started using Python as a way to light my photography out in the woods. I need a lot of control to illuminate tiny scenes. Micro Python allowed me to make small custom LED arrays and have a lot of control over them.

What other programming languages do you know and which is your favorite?

JavaScript, Python, and I’m dabbling in Clojure. I have to say, Python is by far my favorite. The language and community has everything to do with it. I’ve made some amazing friends all over the world because of Python.

What projects are you working on now?

One of the more interesting and fun projects I’m working on is a Bluetooth controller for presentations. I’m hoping to have it finished by the time I give my talk about code reviews at PyCon 2018. When it’s finished I’ll install the programmed micro-controllers into a Lightsaber hilt. I’ll have the ability to control the forward, backward clicks as well as turn on and off sound effects that will be triggered by a gyroscope. Time permitting I’ll throw in a laser pointer.

There are other projects, but this is the one I’m most excited to talk about.

Which Python libraries are your favorite (core or 3rd party)?

I really enjoyed using TensorFlow and matplotlib. I would like to get to use them in more projects.

I’d have to also mention the Hashids open source library. I went as far as refactoring some of my first Python code just to use it and write a blog post about it. It’s one of those topics I’d like to see covered more, especially for the newcomers to Python.

Is there anything else you’d like to say?

I’d like to thank the entire Python community, they are a very inspiring group. I’ve always felt very welcome and encouraged within it.

Thanks for doing the interview!

April 23, 2018 12:30 PM

Martin Fitzpatrick


Calculators are one of the simplest desktop applications, found by default on every window system. Over time these have been extended to support scientific and programmer modes, but fundamentally they all work the same.

In this short write up we implement a working standard desktop calculator using PyQt. This implementation uses a 3-part logic — including a short stack, operator and state. Basic memory operations are also implemented.

While this is implemented for Qt, you could easily convert the logic to work in hardware with MicroPython or a Raspberry Pi.


The full source code for Calculon is available in the 15 minute apps repository. You can download/clone to get a working copy, then install requirements using:

pip3 install -r requirements.txt

You can then run Calculon with:


Read on for a walkthrough of how the code works.

User interface

The user interface for Calculon was created in Qt Designer. The layout of the mainwindow uses a QVBoxLayout with the LCD display added to the top, and a QGridLayout to the bottom.

We use the grid layout is used to position all the buttons for the calculator. Each button takes a single space on the grid, except for the equals sign which is set to span two squares.

Calculon in Qt Designer.

Each button is defined with a keyboard shortcut to trigger a .pressed signal — e.g. 3 for the 3 key. The actions for each button are defined in code and connected to this signal.

If you want to edit the design in Qt Designer, remember to regenerate the file using pyuic5 mainwindow.ui -o


To make the buttons do something we need to connect them up to specific handlers. The connections defined are shown first below, and then the handlers covered in detail.

First we connect all the numeric buttons to the same handler. In Qt Designer we named all the buttons using a standard format, as pushButton_nX where X is the number. This makes it simple to iterate over each one and connect it up.

We use a function wrapper on the signal to send additional data with each trigger — in this case the number which was pressed.

    for n in range(0, 10):
        getattr(self, 'pushButton_n%s' % n).pressed.connect(lambda v=n: self.input_number(v))

The next block of signals to connect are for standard calculator operations, including add, multiply, subtraction and divide. Again these are hooked up to the same slot, and consist of a wrapped signal to transmit the operation (an specific Python operator type).

    self.pushButton_add.pressed.connect(lambda: self.operation(operator.add))
    self.pushButton_sub.pressed.connect(lambda: self.operation(operator.sub))
    self.pushButton_mul.pressed.connect(lambda: self.operation(operator.mul))
    self.pushButton_div.pressed.connect(lambda: self.operation(operator.truediv))  # operator.div for Python2.7

In addition to the numbers and operators, we have a number of custom behaviours to wire up — percentage (to convert the previously typed number to a percentage amount), equals, reset and memory actions.




Now the buttons and actions are wired up, we can implement the logic in the slot methods for handling these events.


Calculator operations are handled using three components — the stack, the state and the current operation.

The stack

The stack is a short memory store of maximum 2 elements, which holds the numeric values with which we're currently calculating. When the user starts entering a new number it is added to the end of the stack (which, if the stack is empty, is also the beginning). Each numeric press multiplies the current stack end value by 10, and adds the value pressed.

def input_number(self, v):
    if self.state == READY:
        self.state = INPUT
        self.stack[-1] = v
        self.stack[-1] = self.stack[-1] * 10 + v


This has the effect of numbers filling from the right as expected, e.g.

Value pressed Calculation Stack
2 0 * 10 + 2 2
3 2 * 10 + 3 23
5 23 * 10 + 5 235

The state

A state flag, to toggle between ready and input states. This affects the behaviour while entering numbers. In ready mode, the value entered is set direct onto the stack at the current position. In input mode the above shift+add logic is used.

This is required so it is possible to type over a result of a calculation, rather than have new numbers added to the result of the previous calculation.

def input_number(self, v):
    if self.state == READY:
        self.state = INPUT
        self.stack[-1] = v
        self.stack[-1] = self.stack[-1] * 10 + v


You'll see switches between READY and INPUT states elsewhere in the code.

The current_op

The current_op variable stores the currently active operation, which will be applied when the user presses equals. If an operation is already in progress, we first calculate the result of that operation, pushing the result onto the stack, and then apply the new one.

Starting a new operation also pushes 0 onto the stack, making it now length 2, and switches to INPUT mode. This ensures any subsequent number input will start from zero.

def operation(self, op):
    if self.current_op:  # Complete the current operation

    self.state = INPUT
    self.current_op = op

The operation handler for percentage calculation works a little differently. This instead operates directly on the current contents of the stack. Triggering the operation_pc takes the last value in the stack and divides it by 100.

def operation_pc(self):
    self.state = INPUT
    self.stack[-1] *= 0.01


The core of the calculator is the handler which actually does the maths. All operations (with the exception of percentage) are handled by the equals handler, which is triggered either by pressing the equals key, Enter or another operation key while an op is in progress.

The equals handler takes the current_op and applies it to the values in the stack (2 values, unpacked using *self.stack) to get the result. The result is put back in the stack as a single value, and we return to a READY state.

def equals(self):
    # Support to allow '=' to repeat previous operation
    # if no further input has been added.
    if self.state == READY and self.last_operation:
        s, self.current_op = self.last_operation

    if self.current_op:
        self.last_operation = self.stack[-1], self.current_op

        self.stack = [self.current_op(*self.stack)]
        self.current_op = None
        self.state = READY
# end::equals[]

Support has also been added for repeating previous operations by pressing the equals key again. This is done by storing the value and operator when equals is triggered, and re-using them if equals is pressed again without leaving READY mode (no user input).


Finally, we can define the handlers for the memory actions. For Calculon we've defined only two memory actions — store and recall. Store takes the current value from the LCD display, and copies it to self.memory. Recall takes the value in self.memory and puts in the final place on our stack.

def memory_store(self):
    self.memory = self.lcdNumber.value()

def memory_recall(self):
    self.state = INPUT
    self.stack[-1] = self.memory

By setting the mode to INPUT and updating the display this behaviour is the same as for entering a number by hand.

Future ideas

The current implementation of Calculon only supports basic math operations. Most GUI desktop calculators also include support for scientific (and sometimes programmer) modes, which add a number or alternative functions.

In Calculon you could define these additional operations as a set of lambdas, which each accept the two parameters to operate on.

Switching modes (e.g. between normal and scientific) on the calculator will be tricky with the current QMainWindow-based layout. You may be able to rework the calculator layout in QtDesigner to use a QWidget base. Each view is just a widget, and switching modes can be performed by swapping out the central widget on your running main window.

April 23, 2018 06:00 AM

April 22, 2018

Ian Ozsvald

AHL Python Data Hackathon

Yesterday I got to attend Man AHL’s first London Python Data hackathon (21-22 April). I went with the goal of publishing my ipython_memory_usage tool from GitHub to PyPI (success!), updating the docs (success!) and starting to work on the YellowBrick project (partial-success).

This is AHL’s first crack at running a public Python hackathon – from my perspective it went flawlessly. They use Python internally and they’ve been hosting my PyDataLondon meetup for a couple of years (and, all going well, for years to come), they support the Python ecosystem with public open source contributions and this hackathon was another method for them to contribute back. This is lovely (since so many companies aren’t so good at contributing and only consume from open source) and should be encouraged.

Here’s Bernd of AHL introducing the hackathon. We had 85 or so folk (10% women) in the room:

Bernd introducing Python Data hackathon at AHL

I (and 10 or so others) then introduced our projects. I was very happy to have 6 new contributors volunteer to my project. I introduced the goals, got everyone up to speed and then we split the work to fix the docs and to publish to the test PyPI server and then finally to the official PyPI public server.

This took around 3 hours, most of the team had some knowledge of a git workflow but none had seen my project before. With luck one of my colleagues will post a conda-forge recipe soon too. Here’s my team in action (photo taken by AHL’s own CTO Gary Collier):

Team at AHL hackathon

Many thanks to Hetal, Takuma, Robin, Lucija, Preyesh and Pav.

Robin had recently published his own project to PyPI so he had some handy links. Specifically we used twine and these notes. In addition the Pandas Sprint guide was useful for things like pulling the upstream master between our collaborative efforts (along with Robin’s notes).

This took about 3 hours. Next we had a crack at the sklearn-visualiser YellowBrick – first to get it running and tested and then to fix the docs on a recent code contribution I’d made (making a sklearn-compatible wrapper for statsmodels’ GLM) with some success. It turns out that we might need to work on the “get the tests” running process, they didn’t work well for a couple of us – this alone will make for a nice contribution once we’ve fixed it.

Overall this effort helped 6 people contribute to two new projects, where 5 of the collaborators had only some prior experience (as best I remember!) with making an open source contribution. I’m very happy with our output – thanks everyone!

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

The post AHL Python Data Hackathon appeared first on Entrepreneurial Geekiness.

April 22, 2018 07:27 PM


Code of Conduct Updates for PyCon 2018

With PyCon 2018 approaching, I’m excited to share some work that the PyCon staff, volunteers, and Python Software Foundation have undertaken in preparation!
The Python community and thus the PyCon community is always evolving and growing. In light of this it is crucial that we periodically reconsider and adjust the way that we interact within these communities.
In 2012 PyCon adopted a Code of Conduct to foster a more welcoming environment at the conference and provide policies and procedures for addressing violations. Each year since, changes have been adopted to work towards our goal of an enjoyable and fulfilling conference for all attendees.
For 2018, we took a larger step and worked with a third party expert from Otter Tech to reevaluate and improve our Code of Conduct, reporting procedures, and staff response procedures. Sage from Otter Tech came with a resounding recommendation from North Bay Python organizers.
In this post, I’d like to summarize the changes we’re bringing this year. If you have any questions or concerns please contact and/or

Staff and Volunteer Incident Response Training

The first undertaking was to ensure that the PyCon Staff and volunteers are better equipped to respond to any incident that may arise during the event. Volunteer incident responders, Python Software Foundation board members, and the PyCon staff agreed to take part in training ahead of the event; this training included a mixture of classroom instruction and role playing.
In addition, an orientation session will be held on site at PyCon ahead of the conference to reinforce and prepare our team.

Procedures and Guides

We also worked to update and overhaul our Attendee Procedure For Reporting Code of Conduct Incidents and Staff Procedure For Incident Response. These documents codify and make clear the expectations, responsibilities, and timelines for all parties involved in an incident.
In addition, a complete Incident Response Guide was prepared for volunteers and staff to ensure that the resources they need to best respond are available, that protocols and procedures are clearly stated, and that they understand the specific roles of our entire team in supporting attendees who report an incident.

Code of Conduct

The changes this year are the biggest update to our Code of Conduct since its introduction. Drawing heavily on the experience and collective work of the community at large, we included language and themes from Contributor Covenant, Django Project Code of Conduct, Rust Code of Conduct, Citizen Code of Conduct, and Affect Conf Code of Conduct.
In totality, the refreshed document better represents and protects the full diversity of our community. It is our goal that we can do everything possible to make PyCon welcoming to every person.

Conflicts of Interest

New this year is also a neutral third-party as part of our Lead Incident Response team. In the past the Chair of PyCon and Director of Operations for the Python Software Foundation acted as the only decision makers.
With the intent of ensuring that a conflict of interest does not arise in handling a report, we’ve asked Sage of Otter Tech to be on site for PyCon 2018 to participate in any incident response.

Incident Reporting

Finally, we revamped how people can report incidents when our Code of Conduct has been violated. In the past, there were multiple phone numbers and email addresses made available for reporting.
To reduce this confusion, and once again drawing from the community, we’ll be deploying PyCascade’s coc-hotline application to offer a single number for people to reach one of our lead responders via telephone or SMS, and to keep track of our response. We’ve also rolled out a single email address that will notify all lead responders.
In the event that an attendee has any reason to know who they will reach, individual contacts for the lead incident responders will still be made available.

April 22, 2018 03:06 PM

April 21, 2018

Weekly Python StackOverflow Report

(cxxii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2018-04-21 19:50:03 GMT

  1. Why is a False value (0) smaller in bytes than True (1)? - [17/1]
  2. Finding longest run in a list - [13/6]
  3. Efficiently accessing arbitrarily deep dictionaries - [10/4]
  4. stop python script without losing data - [10/2]
  5. Elegant alternative to long exception chains? - [8/3]
  6. Efficient random generator for very large range (in python) - [8/3]
  7. How do I keep track of the time the CPU is used vs the GPUs for deep learning? - [8/1]
  8. Can I make type aliases for type constructors in python using the typing module? - [8/1]
  9. How to make "keyword-only" fields with dataclasses? - [8/1]
  10. How to generate random numbers to satisfy a specific mean and median in python? - [7/4]

April 21, 2018 07:51 PM

Import Python

171 - Python This week

Worthy Read

Docker and Kubernetes provide the platform for organizations to get software to market quickly. In this webinar, you will get a practical guide in designing a Docker based CD pipeline on Kubernetes with GoCD.

In this article, I’d like to share with you the articles I found most interesting and insightful (inspiring) last year and this year (so far). My other goal was to create a comprehensive list of the most valuable pieces for my Python students.
python articles

Python 3.7 is set to be released this summer, let’s have a sneak peek at some of the new features! If you’d like to play along at home with PyCharm, make sure you get PyCharm 2018.1 (or later if you’re reading this from the future). There are many new things in Python 3.7: various character set improvements, postponed evaluation of annotations, and more. One of the most exciting new features is support for the dataclass decorator.
data classes

PySide2 – the bindings from Python to Qt – changes skin this spring. We have re-branded it as Qt for Python on a solution level, as we wanted the name to reflect the use of Qt in Python applications. Under the hood it is still PySide2 – just better.

4 Ways to Improve Your DevOps Testing: 4-part eBook to learn how to detect problems earlier in your DevOps processes.

tl;dr Python comprehensions can have duplicate function calls (e.g. [foo(x) for x in ... if foo(x)]). If these function calls are expensive, we need to rewrite our comprehensions to avoid the cost of calling them multiple times. In this post, we solve this by writing a decorator that converts a function in to AST, optimizes away duplicate function calls and compiles it at runtime in ~200 lines of code.

If you look at the new Datadog Agent, you might notice most of the codebase is written in Go, although the checks we use to gather metrics are still written in Python. This is possible because the Datadog Agent, a regular Go binary, embeds a CPython interpreter that can be called whenever it needs to execute Python code. This process can be made transparent using an abstraction layer so that you can still write idiomatic Go code even when there’s Python running under the hood.

We describe some essential hacks and tricks for practicing machine learning with Python.
machine learning

In this tutorial, we provide step-by-step instructions to go from loading a pre-trained Convolutional Neural Network model to creating a containerized web application that is hosted on Kubernetes cluster with GPUs on Azure Container Service (AKS). AKS makes it quick and easy to deploy and manage containerized applications without much expertise in managing Kubernetes environment. It eliminates complexity and operational overhead of maintaining the cluster by provisioning, upgrading, and scaling resources on demand, without taking the applications offline. AKS reduces the cost and complexity of using a Kubernetes cluster by managing the master nodes for which the user does no incur a cost.
deep learning



I don’t particularly enjoy writing tests, but having a proper testing suite is one of the fundamental building blocks that differentiate hacking from software engineering. Sort of like sending your application to the gym, if you do it right, it might not be a pleasant experience, but you’ll reap the benefits continuously. At work we are especially big fans of the testing pyramid, and having dozens of unit tests give us the support that we need to deliver high quality software with rapid delivery to production.


Copy into your site-packages directory or straight into your project. Don't bother using pip, requirements.txt and all that crap.

BoomMine - 63 Stars, 22 Fork
BoomMine - An CV-Based Minesweeper Cheat

tweet-generator - 29 Stars, 3 Fork
Train a neural network optimized for generating tweets based off of any number of Twitter users.

PyKoSpacing - 27 Stars, 4 Fork
Automatic Korean word spacing with Python

rnn-text-classification-tf - 21 Stars, 4 Fork
Tensorflow Implementation of Recurrent Neural Network (LSTM, GRU) for Text Classification

certificates - 14 Stars, 1 Fork
script to generate event certificates easily

QtPyConvert - 11 Stars, 2 Fork
An automatic Python Qt binding transpiler to the abstraction layer.

louisPy - 10 Stars, 1 Fork
A collection of handy python utilities

pygrape - 8 Stars, 2 Fork
pygrape is a python library for updating terminal output in realtime

MyPythonCNN - 4 Stars, 0 Fork
Writing some cnn layers ans the computation graph in python

progress_bar - 3 Stars, 0 Fork
A simple python progress bar tool to help show your job's progress

Search the most similar strings against the query in Python 3. State-of-the-art algorithm and data structure are adopted for best efficiency. For both flexibility and efficiency, only set-based similarities are supported right now, including Jaccard and Tversky.

April 21, 2018 07:29 PM

April 20, 2018

Python Sweetness

Crowfunding Mitogen: day 46

This is the second update on the status of developing the Mitogen extension for Ansible, only 2 weeks late!

Too long, didn’t read

Gearing up to remove the scary warning labels and release a beta! Running a little behind, but not terribly. Every major risk is solved except file transfer, which should be addressed this week.

23 days, 257 commits, 186 files changed, 7292 insertions(+), 1503 deletions(-)

Just tuning in?

Started: Python 3 Support

A very rough branch exists for this, and I’m landing volleys of fixes when I have downtime between bigger pieces of work. Ideally this should have been ready for the end of April, but it may take a few weeks more.

I originally hoped to have a clear board before starting this, instead it is being interwoven as busywork when I need a break from whatever else I’m working on.

Done: multiplexer throughput

The situation has improved massively. Hybrid TTY/socketpair mode is a thing and as promised it significantly helps, but just not quite as much as I hoped.

Today on a 2011-era Macbook Pro Mitogen can pump an SSH client/daemon at around 13MB/sec, whereas scp in the same configuration hits closer to 19MB/sec. In the case of SSH, moving beyond this is not possible without a patched SSH installation, since SSH hard-wires its buffer sizes around 16KB, with no ability to override them at runtime.

With multiple SSH connections that 13MB should cleanly multiply up, since every connection can be served in a single IO loop iteration.

A bunch of related performance fixes were landed, including removal of yet another special case for handling deferred function calls, only taking locks when necessary, and reducing the frequency of the stream implementations modifying the status of their descriptors’ readability/writeability.

As we’re in the ballpark of existing tools, I’m no longer considering this as much of a priority as before. There is definitely more low-hanging fruit, but out-of-the-box behaviour should no longer raise eyebrows.

Done: task isolation

As before, by default each script is compiled once, however it is now re-executed in a spotless namespace prior to each invocation, working around any globals/class variable sharing issues that may be present. The cost of this is negligible, on the order of 100 usec.

When this is insufficient, a mitogen_task_isolation=fork per-task variable exists to allow explicitly forcing a particular module to run in a new process. Enabling this by default causes something on the order of a 33% slowdown, which is much better than expected, but still not good enough to enable forking by default.

Aside from building up a blacklist of modules that should always be forked, task isolation is pretty much all done, with just a few performance regressions remaining to fix in the forking case.

Done: exotic module support

Every style of Ansible module is supported aside from the prehistorical “module replacer” type. That means today all of these work and are covered by automated tests:

Python module support was updated to remove the monkey-patching in use before. Instead, sys.stdin, sys.stdout and sys.stderr are redirected to StringIO objects, allowing a much larger variety of custom user scripts to be run in-process even when they don’t use the new-style Ansible module APIs.

Done: free strategy support

The “free” strategy can now be used by specifying ANSIBLE_STRATEGY=mitogen_free. The mitogen strategy is now an alias of mitogen_linear.

Done: temporary file handling

This should be identical to Ansible’s handling in all cases.

Done: interpreter recycling

An upper bound exists to prevent a remote machine from being spammed with thousands of Python interpreters, which was previously possible when e.g. using a with_items loop that templatized become_user.

Once 20 interpreters exist, the extension shuts down the most recently created interpreter before starting a new one. This strategy isn’t perfect, but it should suffice to avoid raised eyebrows in most common cases for the time being.

Done: precise standard IO emulation

Ansible’s complex semantics for when it does/does not merge stdout and stderr during module runs are respected in every case, including emulation of extraneous \r characters. This may seem like a tiny and pointless nit, however it is almost certainly the difference between a tested real-world playbook succeeding under the extension or breaking horribly.

Done: async tasks

We’re on the third iteration of asynchronous tasks, and I really don’t want to waste any more time on it. The new implementation works a lot more like Ansible’s existing implementaion, for as much as that implementation can be said to “work” at all.

Done: better error messages

Connection errors no longer crash with an inscrutible stack trace, but trigger Ansible’s internal error handling by raising the right exception types.

Mitogen’s logging integration with the Ansible display framework is much improved, and errors and warnings correctly show up on the console in red without having to specify -vvv.

Still more work to do on this when internal RPCs fail, but that’s less likely to be triggered than a connection error.

New debugging mode

An “emergency” debugging mode has been added, in the form of MITOGEN_DUMP_THREAD_STACKS=1. When this is present, every interpreter will dump the stack of every thread into the logging framework every 5 seconds, allowing hangs to be more easily diagnosed directly from the controller machine’s logs.

While adding this, it struck me that there is a really sweet piece of functionality missing here that would be easy to add – an interactive debugger. This might turn up in the form of an in-process web server allowing viewing the full context hierarchy, and running code snippets against remotely executing stacks, much like Werkzeug’s interactive debugger.

Performance regressions

In addition to simply not being my focus recently, a lot of the new functionality has introduced import statements that impact code running in the target, and so performance has likely slipped a little from the original posted benchmarks, most likely during run startup in the presence of a high latency network.

I will be back to investigate these problems (and fix those for which no investigation is required – the module loader!) once all remaining functionality is stable.

File Transfer

This seemingly simple function has required the greatest deal of thought out of every issues I’ve encountered so far. The initial problem relates to flow control, and the absense of any natural mechanism to block a producer (file server) while intermediary pipe buffers (i.e. the SSH connection) are filled.

Even when flow control exists, an additional problem arises since with Mitogen there is no guarantee that one SSH connection = one target machine, especially once connection delegation is implemented. Some kind of bandwidth sharing mechanism must also exist, without poorly reimplementing the entirety of TCP/IP in a Python script.

For the initial release I have settled on basic design that should ensure the available bandwidth is fully utilized, with each upload target having its file data served on a first-come-first-served basis.

When any file transfer is active, one of the service threads in the associated connection multiplexer process (the same ones used for setting up connections) will be dedicated to a long-running loop that monitors every connected stream’s transmit queue size, enqueuing additional file chunks as the queue drains.

Files are served one-at-a-time to make it more likely that if a run is interrupted, rather than having every partial file transfer thrown away, at least a few targets will have received the full file, allowing that copy to be skipped when the play is restarted.

The initial implementation will almost certainly be replaced eventually, but this basic design should be sufficient for what is needed today, and should continue to suffice when connection delegation is implemented.

Testing / CI

The smattering of unit and integration tests that exist are running and passing under Travis CI. In preparation for a release, master is considered always-healthy and my development has moved to a new dmw branch.

I’m taking a “mostly top down” approach to testing, written in the form of Ansible playbooks, as this gives the widest degree of coverage, ensuring that high level Ansible behaviour is matched with/without the extension installed. For each new test written, the result must pass under regular Ansible in addition to Ansible with the extension.

“Bottom up” type tests are written as needs arise, usually when Ansible’s user interface doesn’t sufficiently expose whatever is being tested.

Also visible in Travis is a debops_common target: this is running all 255 tasks from DebOps common.yml against a Docker instance. It’s the first in what should be 4-5 similar DebOps jobs, deploying real software with the final extension.

I have begun exploring integrating the extension with Ansible’s own integration tests, but it looks likely this is too large a job for Travis. Work here is ongoing.


A few items have been chipped off the list.

Notably absent is unidirectional routing mode. I will make time to finish that shortly.

User bug fixes


Super busy, slightly behind! Until next time..

April 20, 2018 05:41 PM

Weekly Python Chat

itertools and more

The itertools library is one of my favorite standard library modules.

During this chat I'll answer your questions about iterables, iterators, itertools, and related libraries and helper functions for working with lazy iterables in Python.

All experience levels are welcome in this chat. Don't be afraid of asking bad/dumb/silly/easy/hard questions. This chat is for you and your questions belong here.

April 20, 2018 05:00 PM

Python Does What?!

DISappearing and

Python has a very rich set of operators that can be overloaded.  From __get__ to __getattr__, __repr__ to __format__, and __complex__ to __iadd__ you can modify almost every behavior of your type.  Conspicuously absent however, are the boolean operators.

This is why Django ORM and SQLAlchemy use bitwise & and | to represent SQL and / or.

Let's take a closer look at how the Python compiler treats these operators:

>>> import dis
>>> dis.dis(lambda: a & b)
  1           0 LOAD_GLOBAL              0 (a)
              3 LOAD_GLOBAL              1 (b)
              6 BINARY_AND
              7 RETURN_VALUE
>>> dis.dis(lambda: a and b)
  1           0 LOAD_GLOBAL              0 (a)
              3 JUMP_IF_FALSE_OR_POP     9
              6 LOAD_GLOBAL              1 (b)
        >>    9 RETURN_VALUE

Not only can you not override the and operator, the Python VM doesn't even have an opcode for it.

In return, Python gives you the semantics that a or b returns not True or False, but either or b (or False if neither is truthy).

April 20, 2018 03:54 PM

Roberto Alsina

Código Charla PyDay "Como Hacer una API REST en Python, spec first"

El 4/4/2018 di una charla en un PyDay sobre como implementar una API REST a partir de una especificación hecha en Swagger/OpenAPI usando Connexion


Foto tomada por Yamila Cuestas

Si bien no pude grabar la charla (alguien en la audiencia si lo hizo, pero no me dio el video! Pasame el video, persona de la audiencia!) y no hay slides, acá está el código que mostré, que es relativamente sencillo y fácil de seguir.

Código de la charla

Cualquier cosa pregunten.

PD: Sí, podría hacer la charla en un video nuevo. Sí, me da mucha pereza.

April 20, 2018 12:55 PM

Talk Python to Me

#158 Quantum Computing and Python

You've surely heard of quantum computing and quantum computers. They are based on the (often) non-intuitive nature of very small particles described by quantum mechanics. So how do they work and what will they mean for us as a society and as developers?

April 20, 2018 08:00 AM

Artem Golubin

Writing a simple SOCKS server in Python

This article explains how to write a tiny and basic SOCKS 5 server in Python 3.6. I am assuming that you already have a basic understanding of proxy servers.


SOCKS is a generic proxy protocol that relays TCP connections from one point to another using intermediate connection (socks server). Originally, SOCKS proxies were mostly used as a circuit-level gateways, that is, a firewall between local and external resources (the internet). However, nowadays it is also popular in censorship circumvention and web scraping.

Throughout the article, I will be referring the RFC 1928 specification which describes SOCKS protocol.

Before reading this article, I recommend you to clone a completed version of the implementation so you can see the full picture.

TCP sessions handling

The SOCKS protocol is implemented on top of the TCP stack, in such way that the client must establish a separate TCP connection with the SOCKS server for each remote server it wants to exchange data with.

So, first of all, we need to create a regular TCP session handler. Python has a built-in

April 20, 2018 12:56 AM

April 19, 2018

Peter Bengtsson

Best EXPLAIN ANALYZE benchmark script

tl;dr; Use to benchmark a SQL query in Postgres.

I often benchmark SQL by extracting the relevant SQL string, prefix it with EXPLAIN ANALYZE, putting it into a file (e.g. benchmark.sql) and then running psql mydatabase < benchmark.sql. That spits out something like this:

                                                           QUERY PLAN
 Index Scan using main_song_ca949605 on main_song  (cost=0.43..237.62 rows=1 width=4) (actual time=1.586..1.586 rows=0 loops=1)
   Index Cond: (artist_id = 27451)
   Filter: (((name)::text % 'Facing The Abyss'::text) AND (id 
   Rows Removed by Filter: 170
 Planning time: 3.335 ms
 Execution time: 1.701 ms
(6 rows)

Cool. So you study the steps of the query plan and look for "Seq Scan" and various sub-optimal uses of heaps and indices etc. But often, you really want to just look at the Execution time milliseconds number. Especially if you might have to slightly different SQL queries to compare and contrast.

However, as you might have noticed, the number on the Execution time varies between runs. You might think nothing's changed but Postgres might have warmed up some internal caches or your host might be more busy or less busy. To remedy this, you run the EXPLAIN ANALYZE select ... a couple of times to get a feeling for an average. But there's a much better way!

Check this out:

Download it into your ~/bin/ and chmod +x ~/bin/ I wrote it just this morning so don't judge!

Now, when you run it it runs that thing 10 times (by default) and reports the best Execution time, its mean and its median. Example output:

▶ songsearch dummy.sql
    BEST    1.229ms
    MEAN    1.489ms
    MEDIAN  1.409ms
    BEST    1.994ms
    MEAN    4.557ms
    MEDIAN  2.292ms

The "BEST" is an important metric. More important than mean or median.

Raymond Hettinger explains it better than I do. His context is for benchmarking Python code but it's equally applicable:

"Use the min() rather than the average of the timings. That is a recommendation from me, from Tim Peters, and from Guido van Rossum. The fastest time represents the best an algorithm can perform when the caches are loaded and the system isn't busy with other tasks. All the timings are noisy -- the fastest time is the least noisy. It is easy to show that the fastest timings are the most reproducible and therefore the most useful when timing two different implementations."

April 19, 2018 05:55 PM

Reuven Lerner

Announcing my new Python course, “Comprehending Comprehensions”

What’s the hardest part of Python to understand?

For nearly 20 years, I’ve been teaching Python to engineers at companies around the world. And if I had to say what most confuses my students, it’s list comprehensions.

And yes, if you’re wondering, set and dict comprehensions are equally confusing.

Comprehensions are both powerful and compact. The syntax, however, is far from obvious. Moreover, it’s not always clear just when or why you should use comprehensions, and when you should instead use a regular “for” loop.

Experienced Python developers know that there is a difference between comprehensions and “for” loops, even if it isn’t obvious to newcomers. Those experts also know that comprehensions are an essential part of a Python developer’s toolbox. It’s a rare day on which I won’t use a comprehension in my Python programming.

Among other things, comprehensions can:

So, how can we help newcomers, who see comprehensions as some combination of weird, unnecessary, and hard to understand? If you know me, then you already know the answer: Clear explanations, followed by lots of practice.

I’m thus delighted to announce the launch of “Comprehending comprehensions,” an online course that teaches Python developers how and why to use comprehensions.

This is an Internet version of a class which I have taught more than 100 times at companies around the world. Through nearly two hours of video and more than 15 exercises, you’ll learn how, when, and why to write and use comprehensions.

If you’re familiar with Python’s basic data types (strings, lists, tuples, dicts, and sets), reading from files and writing simple functions, but don’t yet feel comfortable using comprehensions, then this course is for you: You’ll come out of the course knowing how to write faster, cleaner, and more robust Python code. You’ll know how to handle common situations with data structures and files. And you’ll be better prepared to read and debug code written by others.

In addition to the videos and exercises, this course comes with the slides I use at my in-person training, and the input files you’ll need to solve the exercises.

My goal, as always, is to make you a more fluent Python programmer. If you take this course, you will have made a major step forward on that journey.

As always, I offer discounts to students, pensioners, and people living outside of the 30 wealthiest countries in the world — just e-mail me to request an appropriate coupon code. There are also group discounts, if you want to buy five or more copies for your organization.

Click here to sign up for “Comprehending Comprehensions”!

Questions? Comments? Just leave a comment on this blog or send me e-mail; I’ll respond with an answer right away.

The post Announcing my new Python course, “Comprehending Comprehensions” appeared first on Lerner Consulting Blog.

April 19, 2018 08:00 AM