skip to navigation
skip to content

Planet Python

Last update: August 16, 2018 01:48 AM UTC

August 15, 2018

Continuum Analytics Blog

Introducing Skein: Deploy Python on Apache YARN the Easy Way

By Jim Crist *This post is reprinted with permission from Jim Crist’s blog. The original post can be found here.In this post, I introduce Skein, a new tool and library for deploying applications on Apache YARN. I provide background on why this work was necessary, and demonstrate deploying a simple Python application on a YARN cluster. Introduction …
Read more →

The post Introducing Skein: Deploy Python on Apache YARN the Easy Way appeared first on Anaconda.

August 15, 2018 07:15 PM UTC

Peter Bengtsson

django-pipeline and Zopfli

tl;dr; I wrote my own extension to django-pipeline that uses Zopfli to create .gz files from static assets collected in Django. Here's the code.

Nginx and Gzip

What I wanted was to continue to use django-pipeline which does a great job of reading a settings.BUNDLES setting and generating things like /static/js/myapp.min.a206ec6bd8c7.js. It has configurable options to not just make those files but also generate /static/js/myapp.min.a206ec6bd8c7.js.gz which means that with gzip_static in Nginx, Nginx doesn't have to Gzip compress static files on-the-fly but can basically just read it from disk. Nginx doesn't care how the file got there but an immediate advantage of preparing the file on disk is that the compression can be higher (smaller .gz files). That means smaller responses to be sent to the client and less CPU work needed from Nginx. Your job is to set gzip_static on; in your Nginx config (per location) and make sure every compressable file exists on disk with the same name but with the .gz suffix.

In other words, when the client does GET Nginx quickly does a read on the file system to see if there exists a ROOT/static/foo.js.gz and if so, return that. If the files doesn't exist, and you have gzip on; in your config, Nginx will read the ROOT/static/foo.js into memory, compress it (usually with a lower compression level) and return that. Nginx takes care of figuring out whether to do this, at all, dynamically by reading the Accept-Encoding header from the request.


The best solution today to generate these .gz files is Zopfli. Zopfli is slower than good old regular gzip but the files get smaller. To manually compress a file you can install the zopfli executable (e.g. brew install zopfli or apt install zopfli) and then run zopfli $ROOT/static/foo.js which creates a $ROOT/static/foo.js.gz file.

So your task is to build some pipelining code that generates .gz version of every static file your Django server creates.
At first I tried django-static-compress which has an extension to regular Django staticfiles storage. The default staticfiles storage is and that's what django-static-compress extends.

But I wanted more. I wanted all the good bits from django-pipeline (minification, hashes in filenames, concatenation, etc.) Also, in django-static-compress you can't control the parameters to zopfli such as the number of iterations. And with django-static-compress you have to install Brotli which I can't use because I don't want to compile my own Nginx.


So I wrote my own little mashup. I took some ideas from how django-pipeline does regular gzip compression as a post-process step. And in my case, I never want to bother with any of the other files that are put into the settings.STATIC_ROOT directory from the collectstatic command.

Here's my implementation: Check it out. It's very tailored to my personal preferences and usecase but it works great. To use it, I have this in my STATICFILES_STORAGE = ""

I know what you're thinking

Why not try to get this into django-pipeline or into django-compress-static. The answer is frankly laziness. Hopefully someone else can pick up this task. I have fewer and fewer projects where I use Django to handle static files. These days most of my projects are single-page-apps that are 100% static and using Django for XHR requests to get the data.

August 15, 2018 04:04 PM UTC

Real Python

The Ultimate Guide to Django Redirects

When you build a Python web application with the Django framework, you’ll at some point have to redirect the user from one URL to another.

In this guide, you’ll learn everything you need to know about HTTP redirects and how to deal with them in Django. At the end of this tutorial, you’ll:

This tutorial assumes that you’re familiar with the basic building blocks of a Django application, like views and URL patterns.

Django Redirects: A Super Simple Example

In Django, you redirect the user to another URL by returning an instance of HttpResponseRedirect or HttpResponsePermanentRedirect from your view. The simplest way to do this is to use the function redirect() from the module django.shortcuts. Here’s an example:

from django.shortcuts import redirect

def redirect_view(request):
    response = redirect('/redirect-success/')
    return response

Just call redirect() with a URL in your view. It will return a HttpResponseRedirect class, which you then return from your view.

A view returning a redirect has to be added to your, like any other view:

from django.urls import path

from .views import redirect_view

urlpatterns = [
    path('/redirect/', redirect_view)
    # ... more URL patterns here

Assuming this is the main of your Django project, the URL /redirect/ now redirects to /redirect-success/.

To avoid hard-coding the URL, you can call redirect() with the name of a view or URL pattern or a model to avoid hard-coding the redirect URL. You can also create a permanent redirect by passing the keyword argument permanent=True.

This article could end here, but then it could hardly be called “The Ultimate Guide to Django Redirects.” We will take a closer look at the redirect() function in a minute and also get into the nitty-gritty details of HTTP status codes and different HttpRedirectResponse classes, but let’s take a step back and start with a fundamental question.

Why Redirect

You might wonder why you’d ever want to redirect a user to a different URL in the first place. To get an idea where redirects make sense, have a look at how Django itself incorporates redirects into features that the framework provides by default:

What would an alternative implementation without redirects look like? If a user has to log in to view a page, you could simply display a page that says something like “Click here to log in.” This would work, but it would be inconvenient for the user.

URL shorteners like are another example of where redirects come in handy: you type a short URL into the address bar of your browser and are then redirected to a page with a long, unwieldy URL.

In other cases, redirects are not just a matter of convenience. Redirects are an essential instrument to guide the user through a web application. After performing some kind of operation with side effects, like creating or deleting an object, it’s a best practice to redirect to another URL to prevent accidentally performing the operation twice.

One example of this use of redirects is form handling, where a user is redirected to another URL after successfully submitting a form. Here’s a code sample that illustrates how you’d typically handle a form:

from django import forms
from django.http import HttpResponseRedirect
from django.shortcuts import redirect, render

def send_message(name, message):
    # Code for actually sending the message goes here

class ContactForm(forms.Form):
    name = forms.CharField()
    message = forms.CharField(widget=forms.Textarea)

def contact_view(request):
    # The request method 'POST' indicates
    # that the form was submitted
    if request.method == 'POST':  # 1
        # Create a form instance with the submitted data
        form = ContactForm(request.POST)  # 2
        # Validate the form
        if form.is_valid(): # 3
            # If the form is valid, perform some kind of
            # operation, for example sending a message
            # After the operation was successful,
            # redirect to some other page
            return redirect('/success/')  # 4
    else:  # 5
        # Create an empty form instance
        form = ContactForm()

    return render(request, 'contact_form.html', {'form': form})

The purpose of this view is to display and handle a contact form that allows the user to send a message. Let’s follow it step by step:

  1. First the view looks at the request method. When the user visits the URL connected to this view, the browser performs a GET request.

  2. If the view is called with a POST request, the POST data is used to instantiate a ContactForm object.

  3. If the form is valid, the form data is passed to send_message(). This function is not relevant in this context and therefore not shown here.

  4. After sending the message, the view returns a redirect to the URL /success/. This is the step we are interested in. For simplicity, the URL is hard-coded here. You’ll see later how you can avoid that.

  5. If the view receives a GET request (or, to be precise, any kind of request that is not a POST request), it creates an instance of ContactForm and uses django.shortcuts.render() to render the contact_form.html template.

If the user now hits reload, only the /success/ URL is reloaded. Without the redirect, reloading the page would re-submit the form and send another message.

Behind the Scenes: How an HTTP Redirect Works

Now you know why redirects make sense, but how do they work? Let’s have a quick recap of what happens when you enter a URL in the address bar of your web browser.

A Quick Primer on HTTP

Let’s assume you’ve created a Django application with a “Hello World” view that handles the path /hello/. You are running your application with the Django development server, so the complete URL is

When you enter that URL in your browser, it connects to port 8000 on the server with the IP address and sends an HTTP GET request for the path /hello/. The server replies with an HTTP response.

HTTP is text-based, so it’s relatively easy to look at the back and forth between the client and the server. You can use the command line tool curl with the option --include to have a look at the complete HTTP response including the headers, like this:

$ curl --include
HTTP/1.1 200 OK
Date: Sun, 01 Jul 2018 20:32:55 GMT
Server: WSGIServer/0.2 CPython/3.6.3
Content-Type: text/html; charset=utf-8
X-Frame-Options: SAMEORIGIN
Content-Length: 11

Hello World

As you can see, an HTTP response starts with a status line that contains a status code and a status message. The status line is followed by an arbitrary number of HTTP headers. An empty line indicates the end of the headers and the start of the response body, which contains the actual data the server wants to send.

HTTP Redirects Status Codes

What does a redirect response look like? Let’s assume the path /redirect/ is handled by redirect_view(), shown earlier. If you access with curl, your console looks like this:

$ curl --include
HTTP/1.1 302 Found
Date: Sun, 01 Jul 2018 20:35:34 GMT
Server: WSGIServer/0.2 CPython/3.6.3
Content-Type: text/html; charset=utf-8
Location: /redirect-success/
X-Frame-Options: SAMEORIGIN
Content-Length: 0

The two responses might look similar, but there are some key differences. The redirect:

The primary differentiator is the status code. The specification of the HTTP standard says the following:

The 302 (Found) status code indicates that the target resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client ought to continue to use the effective request URI for future requests. The server SHOULD generate a Location header field in the response containing a URI reference for the different URI. The user agent MAY use the Location field value for automatic redirection. (Source)

In other words, whenever the server sends a status code of 302, it says to the client, “Hey, at the moment, the thing you are looking for can be found at this other location.”

A key phrase in the specification is “MAY use the Location field value for automatic redirection.” It means that you can’t force the client to load another URL. The client can choose to wait for user confirmation or decide not to load the URL at all.

Now you know that a redirect is just an HTTP response with a 3xx status code and a Location header. The key takeaway here is that an HTTP redirect is like any old HTTP response, but with an empty body, 3xx status code, and a Location header.

That’s it. We’ll tie this back into Django momentarily, but first let’s take a look at two types of redirects in that 3xx status code range and see why they matter when it comes to web development.

Temporary vs. Permanent Redirects

The HTTP standard specifies several redirect status codes, all in the 3xx range. The two most common status codes are 301 Permanent Redirect and 302 Found.

A status code 302 Found indicates a temporary redirect. A temporary redirect says, “At the moment, the thing you’re looking for can be found at this other address.” Think of it like a store sign that reads, “Our store is currently closed for renovation. Please go to our other store around the corner.” As this is only temporary, you’d check the original address the next time you go shopping.

Note: In HTTP 1.0, the message for status code 302 was Temporary Redirect. The message was changed to Found in HTTP 1.1.

As the name implies, permanent redirects are supposed to be permanent. A permanent redirect tells the browser, “The thing you’re looking for is no longer at this address. It’s now at this new address, and it will never be at the old address again.”

A permanent redirect is like a store sign that reads, “We moved. Our new store is just around the corner.” This change is permanent, so the next time you want to go to the store, you’d go straight to the new address.

Note: Permanent redirects can have unintended consequences. Finish this guide before using a permanent redirect or jump straight to the section “Permanent redirects are permanent.”

Browsers behave similarly when handling redirects: when a URL returns a permanent redirect response, this response is cached. The next time the browser encounters the old URL, it remembers the redirect and directly requests the new address.

Caching a redirect saves an unnecessary request and makes for a better and faster user experience.

Furthermore, the distinction between temporary and permanent redirects is relevant for Search Engine Optimization.

Redirects in Django

Now you know that a redirect is just an HTTP response with a 3xx status code and a Location header.

You could build such a response yourself from a regular HttpResponse object:

def hand_crafted_redirect_view(request):
  response = HttpResponse(status=302)
  response['Location'] = '/redirect/success/'
  return response

This solution is technically correct, but it involves quite a bit of typing.

The HTTPResponseRedirect Class

You can save yourself some typing with the class HttpResponseRedirect, a subclass of HttpResponse. Just instantiate the class with the URL you want to redirect to as the first argument, and the class will set the correct status and Location header:

def redirect_view(request):
  return HttpResponseRedirect('/redirect/success/')

You can play with the HttpResponseRedirect class in the Python shell to see what you’re getting:

>>> from django.http import HttpResponseRedirect
>>> redirect = HttpResponseRedirect('/redirect/success/')
>>> redirect.status_code
>>> redirect['Location']

There is also a class for permanent redirects, which is aptly named HttpResponsePermanentRedirect. It works the same as HttpResponseRedirect, the only difference is that it has a status code of 301 (Moved Permanently).

Note: In the examples above, the redirect URLs are hard-coded. Hard-coding URLs is bad practice: if the URL ever changes, you have to search through all your code and change any occurrences. Let’s fix that!

You could use django.urls.reverse() to build a URL, but there is a more convenient way as you will see in the next section.

The redirect() Function

To make your life easier, Django provides the versatile shortcut function you’ve already seen in the introduction: django.shortcuts.redirect().

You can call this function with:

It will take the appropriate steps to turn the arguments into a URL and return an HTTPResponseRedirect. If you pass permanent=True, it will return an instance of HttpResponsePermanentRedirect, resulting in a permanent redirect.

Here are three examples to illustrate the different use cases:

  1. Passing a model:

    from django.shortcuts import redirect
    def model_redirect_view(request):
        product = Product.objects.filter(featured=True).first()
        return redirect(product)

    redirect() will call product.get_absolute_url() and use the result as redirect target. If the given class, in this case Product, doesn’t have a get_absolute_url() method, this will fail with a TypeError.

  2. Passing a URL name and arguments:

    from django.shortcuts import redirect
    def fixed_featured_product_view(request):
        product_id = settings.FEATURED_PRODUCT_ID
        return redirect('product_detail', product_id=product_id)

    redirect() will try to use its given arguments to reverse a URL. This example assumes your URL patterns contain a pattern like this:

    path('/product/<product_id>/', 'product_detail_view', name='product_detail')
  3. Passing a URL:

    from django.shortcuts import redirect
    def featured_product_view(request):
        return redirect('/products/42/')

    redirect() will treat any string containing a / or . as a URL and use it as redirect target.

The RedirectView Class-Based View

If you have a view that does nothing but returning a redirect, you could use the class-based view django.views.generic.base.RedirectView.

You can tailor RedirectView to your needs through various attributes.

If the class has a .url attribute, it will be used as a redirect URL. String formatting placeholders are replaced with named arguments from the URL:

from django.urls import path
from .views import SearchRedirectView

urlpatterns = [
    path('/search/<term>/', SearchRedirectView.as_view())

from django.views.generic.base import RedirectView

class SearchRedirectView(RedirectView):
  url = ''

The URL pattern defines an argument term, which is used in SearchRedirectView to build the redirect URL. The path /search/kittens/ in your application will redirect you to

Instead of subclassing RedirectView to overwrite the url attribute, you can also pass the keyword argument url to as_view() in your urlpatterns:
from django.views.generic.base import RedirectView

urlpatterns = [

You can also overwrite get_redirect_url() to get a completely custom behavior:

from random import choice
from django.views.generic.base import RedirectView

class RandomAnimalView(RedirectView):

     animal_urls = ['/dog/', '/cat/', '/parrot/']
     is_permanent = True

     def get_redirect_url(*args, **kwargs):
        return choice(self.animal_urls)

This class-based view redirects to a URL picked randomly from .animal_urls.

django.views.generic.base.RedirectView offers a few more hooks for customization. Here is the complete list:

Note: Class-based views are a powerful concept but can be a bit difficult to wrap your head around. Unlike regular function-based views, where it’s relatively straightforward to follow the flow of the code, class-based views are made up of a complex hierarchy of mixins and base classes.

A great tool to make sense of a class-based view class is the website Classy Class-Based Views.

You could implement the functionality of RandomAnimalView from the example above with this simple function-based view:

from random import choice
from django.shortcuts import redirect

def random_animal_view(request):
    animal_urls = ['/dog/', '/cat/', '/parrot/']
    return redirect(choice(animal_urls))

As you can see, the class-based approach does not provide any obvious benefit while adding some hidden complexity. That raises the question: when should you use RedirectView?

If you want to add a redirect directly in your, using RedirectView makes sense. But if you find yourself overwriting get_redirect_url, a function-based view might be easier to understand and more flexible for future enhancements.

Advanced Usage

Once you know that you probably want to use django.shortcuts.redirect(), redirecting to a different URL is quite straight-forward. But there are a couple of advanced use cases that are not so obvious.

Passing Parameters with Redirects

Sometimes, you want to pass some parameters to the view you’re redirecting to. Your best option is to pass the data in the query string of your redirect URL, which means redirecting to a URL like this:

Let’s assume you want to redirect from some_view() to product_view(), but pass an optional parameter category:

from django.urls import reverse
from urllib.parse import urlencode

def some_view(request):
    base_url = reverse('product_view')  # 1 /products/
    query_string =  urlencode({'category':})  # 2 category=42
    url = '{}?{}'.format(base_url, query_string)  # 3 /products/?category=42
    return redirect(url)  # 4

def product_view(request):
    category_id = request.GET.get('category')  # 5
    # Do something with category_id

The code in this example is quite dense, so let’s follow it step by step:

  1. First, you use django.urls.reverse() to get the URL mapping to product_view().

  2. Next, you have to build the query string. That’s the part after the question mark. It’s advisable to use urllib.urlparse.urlencode() for that, as it will take care of properly encoding any special characters.

  3. Now you have to join base_url and query_string with a question mark. A format string works fine for that.

  4. Finally, you pass url to django.shortcuts.redirect() or to a redirect response class.

  5. In product_view(), your redirect target, the parameter will be available in the request.GET dictionary. The parameter might be missing, so you should use requests.GET.get('category') instead of requests.GET['category']. The former returns None when the parameter does not exist, while the latter would raise an exception.

Note: Make sure to validate any data you read from query strings. It might seem like this data is under your control because you created the redirect URL.

In reality, the redirect could be manipulated by the user and must not be trusted, like any other user input. Without proper validation, an attacker might be able gain unauthorized access.

Special Redirect Codes

Django provides HTTP response classes for the status codes 301 and 302. Those should cover most use cases, but if you ever have to return status codes 303, 307, or 308, you can quite easily create your own response class. Simply subclass HttpResponseRedirectBase and overwrite the status_code attribute:

class HttpResponseTemporaryRedirect(HttpResponseRedirectBase):
    status_code = 307

Alternatively, you can use the django.shortcuts.redirect() method to create a response object and change the return value. This approach makes sense when you have the name of a view or URL or a model you want to redirect to:

def temporary_redirect_view(request):
    response = redirect('success_view')
    response.status_code = 307
    return response

Note: There is actually a third class with a status code in the 3xx range: HttpResponseNotModified, with the status code 304. It indicates that the content URL has not changed and that the client can use a cached version.

One could argue that 304 Not Modified response redirects to the cached version of a URL, but that’s a bit of a stretch. Consequently, it is no longer listed in the “Redirection 3xx” section of the HTTP standard.


Redirects That Just Won’t Redirect

The simplicity of django.shortcuts.redirect() can be deceiving. The function itself doesn’t perform a redirect: it just returns a redirect response object. You must return this response object from your view (or in a middleware). Otherwise, no redirect will happen.

But even if you know that just calling redirect() is not enough, it’s easy to introduce this bug into a working application through a simple refactoring. Here’s an example to illustrate that.

Let’s assume you are building a shop and have a view that is responsible for displaying a product. If the product does not exist, you redirect to the homepage:

def product_view(request, product_id):
        product = Product.objects.get(pk=product_id)
    except Product.DoesNotExist:
        return redirect('/')
    return render(request, 'product_detail.html', {'product': product})

Now you want to add a second view to display customer reviews for a product. It should also redirect to the homepage for non-existing products, so as a first step, you extract this functionality from product_view() into a helper function get_product_or_redirect():

def get_product_or_redirect(product_id):
        return Product.objects.get(pk=product_id)
    except Product.DoesNotExist:
        return redirect('/')

def product_view(request, product_id):
    product = get_product_or_redirect(product_id)
    return render(request, 'product_detail.html', {'product': product})

Unfortunately, after the refactoring, the redirect does not work anymore.

The result of redirect() is returned from get_product_or_redirect(), but product_view() does not return it. Instead, it is passed to the template.

Depending on how you use the product variable in the product_detail.html template, this might not result in an error message and just display empty values.

Redirects That Just Won’t Stop Redirecting

When dealing with redirects, you might accidentally create a redirect loop, by having URL A return a redirect that points to URL B which returns a redirect to URL A, and so on. Most HTTP clients detect this kind of redirect loop and will display an error message after a number of requests.

Unfortunately, this kind of bug can be tricky to spot because everything looks fine on the server side. Unless your users complain about the issue, the only indication that something might be wrong is that you’ve got a number of requests from one client that all result in a redirect response in quick succession, but no response with a 200 OK status.

Here’s a simple example of a redirect loop:

def a_view(request):
    return redirect('another_view')

def another_view(request):
    return redirect('a_view')

This example illustrates the principle, but it’s overly simplistic. The redirect loops you’ll encounter in real-life are probably going to be harder to spot. Let’s look at a more elaborate example:

def featured_products_view(request):
    featured_products = Product.objects.filter(featured=True)
    if len(featured_products == 1):
        return redirect('product_view', kwargs={'product_id': featured_products[0].id})
    return render(request, 'featured_products.html', {'product': featured_products})

def product_view(request, product_id):
        product = Product.objects.get(pk=product_id, in_stock=True)
    except Product.DoesNotExist:
        return redirect('featured_products_view')
    return render(request, 'product_detail.html', {'product': product})

featured_products_view() fetches all featured products, in other words Product instances with .featured set to True. If only one featured product exists, it redirects directly to product_view(). Otherwise, it renders a template with the featured_products queryset.

The product_view looks familiar from the previous section, but it has two minor differences:

This logic works fine until your shop becomes a victim of its own success and the one featured product you currently have goes out of stock. If you set .in_stock to False but forget to set .featured to False as well, then any visitor to your feature_product_view() will now be stuck in a redirect loop.

There is no bullet-proof way to prevent this kind of bug, but a good starting point is to check if the view you are redirecting to uses redirects itself.

Permanent Redirects Are Permanent

Permanent redirects can be like bad tattoos: they might seem like a good idea at the time, but once you realize they were a mistake, it can be quite hard to get rid of them.

When a browser receives a permanent redirect response for a URL, it caches this response indefinitely. Any time you request the old URL in the future, the browser doesn’t bother loading it and directly loads the new URL.

It can be quite tricky to convince a browser to load a URL that once returned a permanent redirect. Google Chrome is especially aggressive when it comes to caching redirects.

Why can this be a problem?

Imagine you want to build a web application with Django. You register your domain at As a first step, you install a blog app at to build a launch mailing list.

Your site’s homepage at is still under construction, so you redirect to You decide to use a permanent redirect because you heard that permanent redirects are cached and caching make things faster, and faster is better because speed is a factor for ranking in Google search results.

As it turns out, you’re not only a great developer, but also a talented writer. Your blog becomes popular, and your launch mailing list grows. After a couple of months, your app is ready. It now has a shiny homepage, and you finally remove the redirect.

You send out an announcement email with a special discount code to your sizeable launch mailing list. You lean back and wait for the sign-up notifications to roll in.

To your horror, your mailbox fills with messages from confused visitors who want to visit your app but are always being redirected to your blog.

What has happened? Your blog readers had visited when the redirect to was still active. Because it was a permanent redirect, it was cached in their browsers.

When they clicked on the link in your launch announcement mail, their browsers never bothered to check your new homepage and went straight to your blog. Instead of celebrating your successful launch, you’re busy instructing your users how to fiddle with chrome://net-internals to reset the cache of their browsers.

The permanent nature of permanent redirects can also bite you while developing on your local machine. Let’s rewind to the moment when you implemented that fateful permanent redirect for

You start the development server and open As intended, your app redirects your browser to Satisfied with your work, you stop the development server and go to lunch.

You return with a full belly, ready to tackle some client work. The client wants some simple changes to their homepage, so you load the client’s project and start the development server.

But wait, what is going on here? The homepage is broken, it now returns a 404! Due to the afternoon slump, it takes you a while to notice that you’re being redirected to, which doesn’t exist in the client’s project.

To the browser, it doesn’t matter that the URL now serves a completely different application. All that matters to the browser is that this URL once in the past returned a permanent redirect to

The takeaway from this story is that you should only use permanent redirects on URLs that you’ve no intention of ever using again. There is a place for permanent redirects, but you must be aware of their consequences.

Even if you’re confident that you really need a permanent redirect, it’s a good idea to implement a temporary redirect first and only switch to its permanent cousin once you’re 100% sure everything works as intended.

Unvalidated Redirects Can Compromise Security

From a security perspective, redirects are a relatively safe technique. An attacker cannot hack a website with a redirect. After all, a redirect just redirects to a URL that an attacker could just type in the address bar of their browser.

However, if you use some kind of user input, like a URL parameter, without proper validation as a redirect URL, this could be abused by an attacker for a phishing attack. This kind of redirect is called an open or unvalidated redirect.

There are legitimate use cases for redirecting to URL that is read from user input. A prime example is Django’s login view. It accepts a URL parameter next that contains the URL of the page the user is redirected to after login. To redirect the user to their profile after login, the URL might look like this:

Django does validate the next parameter, but let’s assume for a second that it doesn’t.

Without validation, an attacker could craft a URL that redirects the user to a website under their control, for example:

The website might then display an error message and trick the user into entering their credentials again.

The best way to avoid open redirects is to not use any user input when building a redirect URL.

If you cannot be sure that a URL is safe for redirection, you can use the function django.utils.http.is_safe_url() to validate it. The docstring explains its usage quite well:

is_safe_url(url, host=None, allowed_hosts=None, require_https=False)

Return True if the url is a safe redirection (i.e. it doesn’t point to a different host and uses a safe scheme). Always return False on an empty url. If require_https is True, only ‘https’ will be considered a valid scheme, as opposed to ‘http’ and ‘https’ with the default, False. (Source)

Let’s look at some examples.

A relative URL is considered safe:

>>> # Import the function first.
>>> from django.utils.http import is_safe_url
>>> is_safe_url('/profile/')

A URL pointing to another host is generally not considered safe:

>>> is_safe_url('')

A URL pointing to another host is considered safe if its host is provided in allowed_hosts:

>>> is_safe_url('',
...             allowed_hosts={''})

If the argument require_https is True, a URL using the http scheme is not considered safe:

>>> is_safe_url('',
...             allowed_hosts={''},
...             require_https=True)


This wraps up this guide on HTTP redirects with Django. Congratulations: you have now touched on every aspect of redirects all the way from the low-level details of the HTTP protocol to the high-level way of dealing with them in Django.

You learned how an HTTP redirect looks under the hood, what the different status codes are, and how permanent and temporary redirects differ. This knowledge is not specific to Django and is valuable for web development in any language.

You can now perform a redirect with Django, either by using the redirect response classes HttpResponseRedirect and HttpResponsePermanentRedirect, or with the convenience function django.shortcuts.redirect(). You saw solutions for a couple of advanced use cases and know how to steer clear of common pitfalls.

If you have any further question about HTTP redirects leave a comment below and in the meantime, happy redirecting!


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

August 15, 2018 02:00 PM UTC

Import Python

ImportPython - Issue 182

Worthy Read

This blog series from Sheroy Marker cover the principles of CD of microservices. Get a practical guide on designing CD workflows for microservices, testing strategies, trunk based development, feature toggles and environment plans.

Several modern programming languages have so-called "null-coalescing" or "null- aware" operators, including C# , Dart, Perl, Swift, and PHP (starting in version 7). These operators provide syntactic sugar for common patterns involving null references.

In this review, we’ll be taking a look at our favorite options and explain which ones to use.
static analysis

Recently, I was given a dataset that contained sensitive information about customers and that should not under any circumstance be made public. The dataset resided on one of our servers which I deem to be a reasonably secure location. I wanted to copy the data to my local drive, in order to work with the data more comfortably and at the same time not having to fear that the data is less save. So, I wrote a little script that changes the data, while still preserving some key information. I will detail all the steps that I have taken, and highlight some handy tricks along the way.

I am creating a series of blog posts to help you develop, deploy and run (mostly) Python applications on AWS Lambda using Serverless Framwork.
aws lamda

Implementer’s Guide to Scalable and Robust Internet Telephony with Session Initiation Protocol in ClientServer and Peer-to-Peer modes in Python

Python extends its lead, and Assembly enters the Top Ten

We illustrate the application of two linear compression algorithms in python: Principal component analysis (PCA) and least-squares feature selection. Both can be used to compress a passed array, and they both work by stripping out redundant columns from the array. The two differ in that PCA operates in a particular rotated frame, while the feature selection solution operates directly on the original columns. As we illustrate below, PCA always gives a stronger compression. However, the feature selection solution is often comparably strong, and its output has the benefit of being relatively easy to interpret — a virtue that is important for many applications.
data science

In this tutorial you will learn how to build a “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time.
image processing

I have been experimenting with keyword extraction techniques against the NIPS Papers dataset, consisting of titles, abstracts and full text of all papers from the Neural Information Processing Systems (NIPS) conference from 1987-2017, and contributed by Ben Hamner. The collection has 7239 papers written by 9785 authors. The reason I preferred this dataset to others such as Reuters or Medline is because it is smaller, and I can be both programmer and domain expert, and because I might learn interesting things while combing through the text of the papers looking for patterns to exploit.
topic modeling

In this article, we will be going through building queries for Wikidata with Python and SPARQL by taking a look where mayors in Europe are born.


Deep-Learning-World - 1373 Stars, 95 Fork
Organized Resources for Deep Learning Researchers and Developers

kefir - 288 Stars, 21 Fork
Kefir is a natural language processing kit for Turkic languages

zalo_landmark - 139 Stars, 19 Fork
Zalo landmark identification challenge, 103 classes, > 100k images (PyTorch)

SMBetray - 135 Stars, 15 Fork
SMB MiTM tool with a focus on attacking clients through file content swapping, lnk swapping, as well as compromising any data passed over the wire in cleartext.

img_term - 76 Stars, 5 Fork
Display image and video camera in your ANSI terminal!

PaperTTY - 72 Stars, 3 Fork
PaperTTY - Python module to render a TTY on e-ink

gluon-reid - 72 Stars, 4 Fork
A code gallery for person re-identification with mxnet-gluon, and I will reproduce many STOA algorithm.

fagan - 36 Stars, 11 Fork
A variant of the Self Attention GAN named: FAGAN (Full Attention GAN)

django-vue-template - 32 Stars, 3 Fork
Django Rest + Vue JS Template

django-deployment-book - 20 Stars, 4 Fork
The Unix system administration guide for Django developers

decli - 9 Stars, 0 Fork
Minimal, easy-to-use, declarative cli tool

aira - 8 Stars, 0 Fork
Aira is a simple script language based on python3

csv-position-reader - 5 Stars, 0 Fork
A custom CSV reader implementation with direct file access

cookiecutter-django-shop - 3 Stars, 0 Fork
Cookiecutter django-SHOP is a blueprint for an e-commerce site based on django-CMS.

ews - 3 Stars, 1 Fork
Ethereum Web Service

August 15, 2018 12:13 PM UTC

Python Bytes

#91 Will there be a PyBlazor?

August 15, 2018 08:00 AM UTC


Taming Snakes inside a Container

In this post, let's talk about taming snakes inside a container. The article is a summary of lessons learned while dockerizing python microservices. In case you want to see a detailed...

August 15, 2018 06:33 AM UTC


How Promotions work in Large Corporations

We are stoked to have Cristian Medina ( deliver our first soft skills article. He will go into depth on the topic of promotions and how to better position yourself as a developer. He will discuss performance reviews, the role your manager can play, networking and much more. Enjoy and keep challening yourself! Enter Cris ...


I'm going on year 16 of my professional engineering career. Most of it spent in large and mid-size corporations. In this time, I was exposed to a number of interesting situations and processes related to performance reviews and promotions. And while I was at Pycon 2018, the topic came up in some hallway conversations. Specifically, what does one needs to do to get promoted?

Anthony Shaw actually covered the basics in his Can we talk about tech salaries article about negotiating better pay. But the hallway discussion left me thinking that there's a few more things I can add to the list. Perhaps some that might help explain the process better for folks that have never been through it. Especially how it works in larger corporations.

When the guys from Pybites suggested collaborating on a new post on soft-skills, I thought this would be a good topic to cover. So here we are.

The Not-So-Obvious Obvious

First thing is first: Be good at what you do (i.e. your primary job), otherwise the conversation is over before it even started. Were it gets tricky though, is how you measure "good". Each company has multiple ways of doing that, usually different between business units.

Second thing: Don't just stop at your job responsibilities, make it your mission to learn about how to improve your environment. I know this is broad, but that's the point. If it means learning a new programming language, a new tool, or new methodologies, it's on you to stay up to date with your chosen profession and how you can apply recent developments to your environment. Not only is it good for you, but keeping your department and organization up-to-date can even make it resilient to future complications.

Some businesses have a core set of values against which they'll evaluate what you're delivering. These tend to be "esoteric" things like "innovation that matters", usually very abstract and hard to measure (this is likely on purpose). To use this phrase as an example, who is the innovation supposed to matter to? your clients? your coworkers? the "business"?

Other organizations will rate you on specific measurable criteria. This brings the problem of understanding which criteria matters to your job role. For example, while it might mean a lot to you personally that you've made 5000 commits (more than anyone else) to the most important codebase the company owns, maybe the company wants to optimize dollars spent building code. In this case you just cost the company more money than everyone else, and therefore had the worst job performance.

Other companies will instead have a list of specific criteria for each of the "steps" in your career ladder. Keep in mind though that this ladder is for the career that your job category falls into, not necessarily the one that YOU want to climb. The criteria could also be abstract concepts, like step 1 would be "implements the vision", while step 2 is "interprets the vision", and step 3 is "has vision". And yes, there are all kinds of jokes you can make about what you need to do to have visions, but this is a real thing I've seen in several places.

It's very important that you first understand how to provide value inside your company. Which is not to say that it works the same inside your organization, or even your department. Usually there's other "flavoring" added to each of those items depending on where you work, who you work for, who they report to, and even who they report to.

Make sure you find mentors or other folks in the organization that have gone through several steps in the ladder WHILE WORKING AT YOUR COMPANY. They can help you understand what matters. Your manager can help point you to these folks, and if not, a peer manager should have some input as well.

Performance Reviews

Ok, now that you you've determined how value is measured, it's important to make your performance reviews, especially the written ones, all about how you deliver on that value. Sometimes you won't be able to put things in the same terms, but that's where you talk to your manager and ask for advice. Don't forget to mention any research or studies you completed, even if their conclusions were not what you expected.

Performance reviews are not a place to hold back, this is where you get to be a rockstar. Being humble will not help you here. You don't need to write an essay, bullet lists are usually better. You could categorize the bullets by your organization's criteria, if it helps.

Your Manager

Managing people is NOT easy. In general, management chains tend to get a bad rep for bad decisions, but they hardly ever talk about the good ones, though I suppose this is how it should be. I haven't been a manager myself, but I have been in team-lead or "coaching" roles in different organizations, and even outside of business environments. You learn real quick that people aren't easy. Keeping track of who has what problem, when, why, who can fix it, where they can fix it and with what kind of help is NOT a simple thing.

YOUR job is not only to be good at what you do, but to make your managers job easy. If your manager gets a ping from someone external to your department about something dumb you did, that's yet one more thing that they have to deal with in their day. If they show up at some higher-up meeting and are asked something that you did not prep them for, they might look stupid without a good answer. That's one more thing they have to worry about next time they present your project.

YOUR job is not only to be good at what you do, but to make your managers job easy. If you can't point to things you do to make your manager's job easier, then moving up the chain gets a little harder.


Promotions DO NOT imply more money. Sometimes they do, sometimes they don't. When you're looking for a promotion, make sure to understand what you're getting yourself into. Sometimes there's a promise of money, but it winds up being a 1% raise for 50% more responsibility. If you don't ask, you'll find out the hard way. Don't make your life more complicated than it needs to be.

Back to an earlier point about the steps in the career ladder, it's important to have an understanding of salary ranges for each of those steps. Usually there's a very strict range for each step, and where you are on that range is very important. If you're on the lower end, then your higher priority is to keep doing what your doing and look for a raise. If you're on the higher end, there's no point in having a conversation about a raise, because you need to be promoted first before you can get one. Your manager can usually tell you where you stand, some companies even require them to do so.


On top of all this, there's "tribal knowledge". The grape vine is a real thing and it always has information about what certain management chains may or may not want, who might be leaving their position soon, who might be wanting to come in, who might be on the outs with their manager. This is NOT about gossip or hearsay, instead it's about taking the pulse for your organization. You need to understanding it such that you can gain insight into the opportunities that may or may not interest you.

Sometimes it's not until you have these conversations with your coworkers that you realize that things aren't heading in the direction you need them to go. This can help you determine whether it's better to spend time vying for a promotion, or to start looking for another job.

Networking also helps you find other jobs within the same company that may have the career ladders you'd prefer to climb. Or different environments where you think you can better excel. They might even be in departments with peer managers, which makes life easier because you already understand the organization.

The Meeting

Large corporations don't tend to go around thinking: "Oh! This guy did a great job! Promote him!" It doesn't matter how much they want you to think they do, that's not how it works. It's all a numbers game.

For example, the business may have a percentage of the budget set aside for promotions, which they usually equate to a count of how many people they can promote for the year. Then that number gets distributed amongst all the business units, which then divide it by the different steps in the ladders (i.e. we can do 50 step 0-to-1 promotions, 25 1-to-2, 10 2-to-3, etc.) They then trickle it down to the organizations and departments, normally stopping at the 2nd-line manager level. The distribution method varies greatly between companies, some base it on how the business units did against their goals for the year, some base it on % revenue generated by the units, etc.

At this point, there's usually a meeting where your 2nd-line gathers his troops (your manager and his peers), to decide who gets what promotion to which step. Now comes the tricky part. Each manager brings a list of his candidates, and they all discuss each candidate and their accomplishments.

Some folks don't really understand the significance of this point: Every peer-manager in your organization will likely have a say on whether you get a promotion or not. On top of that, as we discussed earlier, a promotion to each step of the ladder has its own set of rules, which also involves approvals. The higher the step in the ladder the higher up the management chain you go for approvals. That's why it's sometimes easy to get the very first promotion which only takes your direct manager and his manager approve.

What does this mean to you? It's not enough to do a good job for YOUR manager, you should also do things to help your coworkers in other departments. If the peer-managers haven't even heard of you, how can they be ok with giving up their guy's promotion to you! Back to making your manager's job easy, this is a key aspect of it.

What can you do to improve your chances?

If you help other people, make note of it in your performance reviews. Remember, it's not about bragging, it's about making your manager's job easy. When that peer manager doesn't know who you are, he could say: that's the guy that helped you with the XYZ task you were stuck with last month.

Let's say you were super helpful and your colleagues want to take you out for lunch. They want to thank you for this cool thing you made for them that greatly simplifies their lives. Tell them you'd love to go out for lunch with them, but they should save their money and instead of buying lunch, email your manager AND their manager thanking you for the work. On the flip side, you should do the same for them! When you think someone did a great job at something, email them and copy your managers. It helps everyone.

Finding opportunities to help

Helping other organizations or departments doesn't have to be complicated. As a programmer, this might be simpler than you think. Here's a few quick ideas:

When I was naive and early in my career, I definitely wish someone sat me down and went over these points. It would've saved me lots of frustration and heartache. I hope you find them useful.

Keep Calm and Code in Python!


August 15, 2018 06:12 AM UTC

Mike Driscoll

Face Detection Using Python and OpenCV

Machine Learning, artificial intelligence and face recognition are big topics right now. So I thought it would be fun to see how easy it is to use Python to detect faces in photos. This article will focus on just detecting faces, not face recognition which is actually assigning a name to a face. The most popular and probably the simplest way to detect faces using Python is by using the OpenCV package. OpenCV is a computer vision library that’s written in C++ and had Python bindings. It can be kind of complicated to install depending on which OS you are using, but for the most part you can just use pip:

pip install opencv-python

I have had issues with OpenCV on older versions of Linux where I just can’t get the newest version to install correctly. But this works fine on Windows and seems to work okay for the latest versions of Linux right now. For this article, I am using the 3.4.2 version of OpenCV’s Python bindings.

Finding Faces

There are basically two primary ways to find faces using OpenCV:

  • Haar Classifier
  • LBP Cascade Classifier

Most tutorials use Haar because it is more accurate, but it is also much slower than LBP. I am going to stick with Haar for this tutorial. The OpenCV package actually has all the data you need to use Harr effectively. Basically you just need an XML file with the right face data in it. You could create your own if you knew what you were doing or you can just use what comes with OpenCV. I am not a data scientist, so I will be using the built-in classifier. In this case, you can find it in your OpenCV library that you installed. Just go to the /Lib/site-packages/cv2/data folder in your Python installation and look for the haarcascade_frontalface_alt.xml. I copied that file out and put it in the same folder I wrote my face detection code in.

Haar works by looking at a series of positive and negative images. Basically someone went and tagged the features in a bunch of photos as either relevant or not and then ran it through a machine learning algorithm or a neural network. Haar looks at edge, line and four-rectangle features. There’s a pretty good explanation over on the OpenCV site. Once you have the data, you don’t need to do any further training unless you need to refine your detection algorithm.

Now that we have the preliminaries out of the way, let’s write some code:

The first thing we do here are our imports. The OpenCV bindings are called cv2 in Python. Then we create a function that accepts a path to an image file. We use OpenCV’s imread method to read the image file and then we create a copy of it to prevent us from accidentally modifying the original image. Next we convert the image to gray scale. You will find that computer vision almost always works better with gray than it does in color or at least that is the case with OpenCV.

The next step is to load up the Haar classifier using OpenCV’s XML file. Now we can attempt to find faces in our image using the classifier object’s detectMultiScale method. I print out the number of faces that we found, if any. The classifier object actually returns an iterator of tuples. Each tuple contains the x/y coordinates of the face it found as well as width and height of the face. We use this information to draw a rectangle around the face that was found using OpenCV’s rectangle method. Finally we show the result:

That worked pretty well with a photo of myself looking directly at the camera. Just for fun, let’s try running this royalty free image I found through our code:

When I ran this image in the code, I ended up with the following:

As you can see, OpenCV only found two of the four faces, so that particular cascades file isn’t good enough for finding all the faces in the photo.

Finding Eyes in Photos

OpenCV also has a Haar Cascade eye XML file for finding the eyes in photos. If you do a lot of photography, you probably know that when you do portraiture, you want to try to focus on the eyes. In fact, some cameras even have an eye autofocus capability. For example, I know Sony has been bragging about their eye focus function for a couple of years now and it actually works pretty well in my tests of one of their cameras. It is likely using something like Haars itself to find the eye in real time.

Anyway, we need to modify our code a bit to make an eye finder script:

import cv2
import os
def find_faces(image_path):
    image = cv2.imread(image_path)
    # Make a copy to prevent us from modifying the original
    color_img = image.copy()
    filename = os.path.basename(image_path)
    # OpenCV works best with gray images
    gray_img = cv2.cvtColor(color_img, cv2.COLOR_BGR2GRAY)
    # Use OpenCV's built-in Haar classifier
    haar_classifier = cv2.CascadeClassifier('haarcascade_frontalface_alt.xml')
    eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
    faces = haar_classifier.detectMultiScale(gray_img, scaleFactor=1.1, minNeighbors=5)
    print('Number of faces found: {faces}'.format(faces=len(faces)))
    for (x, y, width, height) in faces:
        cv2.rectangle(color_img, (x, y), (x+width, y+height), (0, 255, 0), 2)
        roi_gray = gray_img[y:y+height, x:x+width]
        roi_color = color_img[y:y+height, x:x+width]
        eyes = eye_cascade.detectMultiScale(roi_gray)
        for (ex,ey,ew,eh) in eyes:
    # Show the faces / eyes found
    cv2.imshow(filename, color_img)
if __name__ == '__main__':

Here we add a second cascade classifier object. This time around, we use OpenCV’s built-in haarcascade_eye.xml file. The other change is in our loop where we loop over the faces found. Here we also attempt to find the eyes and loop over them while drawing rectangles around them. I tried running my original headshot image through this new example and got the following:

This did a pretty good job, although it didn’t draw the rectangle that well around the eye on the right.

Wrapping Up

OpenCV has lots of power to get you started doing computer vision with Python. You don’t need to write very many lines of code to create something useful. Of course, you may need to do a lot more work than is shown in this tutorial training your data and refining your dataset to make this kind of code work properly. My understanding is that the training portion is the really time consuming part. Anyway, I highly recommend checking out OpenCV and giving it a try. It’s a really neat library with decent documentation.

Related Reading

August 15, 2018 05:05 AM UTC

Vasudev Ram

pyperclip, a cool Python clipboard module

By Vasudev Ram

I recently came across this neat Python library, pyperclip, while browsing the net. It provides programmatic copy-and-paste functionality. It's by Al Sweigart.

pyperclip is very easy to use.

I whipped up a couple of simple programs to try it out.

Here's the first one,
from __future__ import print_function
import pyperclip as ppc
import json

d1 = {}
keys = ("TS", "TB")
vals = [
["Tom Sawyer", "USA", "North America"],
["Tom Brown", "England", "Europe"],
for k, v in zip(keys, vals):
d1[k] = v
for k in keys:
print("{}: {}".format(k, d1[k]))

print("Data of dict d1 copied as JSON to clipboard.")
d2 = json.loads(ppc.paste())
print("Data from clipboard copied as Python object to dict d2.")
print("d1 == d2:", d1 == d2)
The program creates a dict, d1, with some values, converts it to JSON and copies that JSON data to the clipboard using pyperclip.
Then it pastes the clipboard data into a Python string and converts that to a Python dict, d2.

Here's a run of the program:
$ python
TS: ['Tom Sawyer', 'USA', 'North America']
TB: ['Tom Brown', 'England', 'Europe']
Data of dict d1 copied as JSON to clipboard.
Data from clipboard copied as Python object to dict d2.
d1 == d2: True
Comparing d1 and d2 shows they are equal, which means the copy from Python program to clipboard and paste back to Python program worked okay.

Here's the next program,
from __future__ import print_function
import pyperclip as ppc

text = ppc.paste()
words = text.split()
print("Text copied from clipboard:")
print("Stats for text:")
print("Words:", len(words), "Lines:", text.count("\n"))

the quick brown fox
jumped over the lazy dog
and then it flew over the rising moon
The program pastes the current clipboard content into a string, then finds and prints the number of words and lines in that string. No copy in this case, just a paste, so your clipboard should already have some text in it.

Here are two runs of the program. Notice the three lines of text in a triple-quoted comment at the end of the program above. That's my test data. For the first run below, I selected the first two lines of that comment in my editor (gvim on Windows) and copied them to the clipboard with Ctrl-C. Then I ran the program. For the second run, I copied all the three lines and did Ctrl-C again. You can see from the results that it worked; it counted the number of lines and words that it pasted from the clipboard text, each time.
$ python
Text copied from clipboard:
the quick brown fox
jumped over the lazy dog

Stats for text:
Words: 9 Lines: 2

$ python
Text copied from clipboard:
the quick brown fox
jumped over the lazy dog
and then it flew over the rising moon

Stats for text:
Words: 17 Lines: 3
So we can see that pyperclip, as used in this second program, can be useful to do a quick word and line count of any text you are working on, such as a blog post or article. You just need that text to be in the clipboard, which can be arranged just by selecting your text in whatever app, and doing a Ctrl-C. Then you run the above program. Of course, this technique will be limited by the capacity of the clipboard, so may not work for large text files. That limit could be found out by trial and error, e.g. by copying successively larger chunks of text to the clipboard, pasting them back somewhere else, comparing the two, and checking whether or not the whole text was preserved across the copy-paste. There could be a workaround, and I thought of a partial solution. It would involve accumulating the stats for each paste, into variables, e.g. total_words += words and total_lines += lines. The user would need to keep copying successive chunks of text to the clipobard. How to sync the two, user and this modified program? Need to think it through, and it might be a bit clunky. Anyway, this was just a proof of concept.

As the pyperclip docs say, it only supports plain text from the clipboard, not rich text or other kinds of data. But even with that limitation, it is a useful library.

The image at the top of the post is a partial screenshot of my vim editor session, showing the menu icons for cut, copy and paste.

You can read about the history and evolution of cut, copy and paste here:

Cut, copy, and paste

New to vi/vim and want to learn its basics fast? Check out my vi quickstart tutorial. I first wrote it for a couple of Windows sysadmin friends of mine, who needed to learn vi to administer Unix systems they were given charge of. They said the tutorial helped them to quickly grasp the basics of text editing with vi.

Of course, vi/vim is present, or just a download away, on many other operating systems by now, including Windows, Linux, MacOS and many others. In fact, it is pretty ubiquitous, which is why vi is a good skill to have - you can edit text files on almost any machine with it.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my main blog (jugad2) by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Are you a blogger with some traffic? Get Convertkit:

Email marketing for professional bloggers

August 15, 2018 01:47 AM UTC

Mike Driscoll

Book Contest: ReportLab: PDF Processing with Python

I recently released a new book entitled ReportLab: PDF Processing with Python. In celebration of a successful launch, I have decided to do a little contest.


  • Post a comment telling me why you would want a copy
  • The most clever or heartfelt commenter will be chosen by me

The contest will run starting now until Friday, August 17th @ 11:59 p.m. CST.

Runners up will receive a free copy of the eBook. The grand prize will be a signed paperback copy + the eBook version!

August 15, 2018 01:24 AM UTC

August 14, 2018

Python Engineering at Microsoft

Python in Visual Studio 2017 version 15.8

We have released the 15.8 update to Visual Studio 2017. You will see a notification in Visual Studio within the next few days, or you can download the new installer from

In this post, we're going to look at some of the new features we have added for Python developers: IntelliSense with type shed definitions, faster debugging, and support for Python 3.7. For a list of all changes in this release, check out the Visual Studio release notes.

Faster debugging, on by default

We first released a preview of our ptvsd 4.0 debug engine in the 15.7 release of Visual Studio, in the 15.8 release this is now the default, offering faster and more reliable debugging for all users.

If you encounter issues with the new debug engine, you can revert back to the previous debug engine by selecting Use legacy debugger from Tools > Options > Python > Debugging.

Richer IntelliSense

We are continuing to make improvements to IntelliSense for Python in Visual Studio 2017. In this release you will notice completions that are faster, more reliable, and have better understanding of the surrounding code, and tooltips with more focused and useful information. Go To Definition and Find All References are better at taking you to the module a value was imported from, and Python packages that include type annotations will provide richer completions. These changes were made as part of our ongoing effort to make our Python analysis from Visual Studio available as an independent Microsoft Python Language Server.

As an example, below shows improved tooltips with richer information when hovering over the os module in 15.8 compared to 15.7:

We have also added initial support for using typeshed definitions to provide more completions for places where our static analysis is unable to infer complete information. We are still working through some known issues with this though, so results may be limited and expect to see better support for typeshed in future releases.

Support for Python 3.7

We have updated our Visual Studio so that all of our features work with Python 3.7, which was recently released. Most functionality of Visual Studio works with Python 3.7 in the 15.7 release, and in the 15.8 release we made specific fixes so that debug attach, profiling, and mixed-mode (cross-language) debugging features work with Python 3.7.

Give Feedback

Be sure to download the latest version of Visual Studio and try out the above improvements. If you encounter any issues, please use the Report a Problem tool to let us know (this can be found under Help, Send Feedback) or continue to use our GitHub page. Follow our Python blog to make sure you hear about our updates first, and thank you for using Visual Studio!

August 14, 2018 04:54 PM UTC


How to build your own blockchain for a financial product

How to build your own blockchain for a financial product

Technologies are changing fast; people are not. – Jakob Nielsen

Blockchain is a relatively new technology that many deem is used only for buying Bitcoins. They try to implement it in whatever sphere comes to mind, whether it is fashion, education or healthcare. I would say it is okay — too little time has passed to determine which area of human activity can benefit the most from applying this technology. To understand the practical application of blockchain, we must first define why it appeared, and then study cases when blockchain can make a significant difference.

Note: This article does not explain the blockchain concepts; instead, it focuses on developing a fintech application using this technology. I will explain why fintech can already adopt the blockchain, and most importantly, focus on developing a decentralized application using this technology.

Industries That Are Ready For Blockchain

Don Norman once wrote that many products failed because they were released at the wrong time. I can remake this statement and say: Many technologies fail to find practical applications. When the Internet became widely available in the beginning of the ‘90s, each sphere tried to apply it to their business. It was a catastrophe, and its consequences are still visible by thousands of never-visited websites with horrible interfaces, clumsily created by anyone who had a computer. We are currently witnessing virtually the same situation—the most potential technology of the decade is associated with speculations on crypto-exchanges. It is widely used for financial scams, although it was initially created for the contrary.

How to build your own blockchain for a financial product Source: Microsoft Azure

An attempt to exclude the human factor from the business was one reason why blockchain appeared. That is why the industries that may have blockchain successfully implemented are those that (1) heavily depend on human activity, and (2) suffer most from human errors, like finance.

Important: Blockchain is being applied to various products from different industries; we just need more daring entrepreneurs who are willing to put a lot at stake.

Fintech deals with a very thorny matter – money. It is exactly where most fraud takes place. The desire to become richer is one fundamental mechanism that pushes people to do things, often bad things. Fintech startups aim to improve the traditional financial institutions, for example, excluding the human factor from the financial activities.

Utilizing blockchain excludes third parties from the financial transactions, like a bank that verifies the person between which the transaction is made. It can be used for managing the inventory and logistics, trading goods, optimizing the person identification, tracking transactions and more.

Read: What you need to consider before building a fintech product

It does not mean that every fintech product can easily adopt blockchain. Here are some cases when you might want to use blockchain:

I do not suggest implementing blockchain in the following cases:

What you can do tomorrow, and even while reading this article, is to build a simple blockchain. It is the focus of part 2. I will tell you about the main components that are required to build a blockchain for fintech products, propose some tools, and show real pieces of code with explanations.

How To Apply Blockchain In Fintech

‘Frameworks’ to use


CryptoNote is an open-source project that allows you to create crypto coins. They have a simple, step-by-step guide to creating a cryptocurrency. To launch it, you will need to have two nodes which will be used to run the Monero server.

Useful links:

How to create a coin

How to create a wallet


A popular open software platform for building decentralized applications. Its focus is running the programming code of your blockchain-based app. Quoting the Ethereum website: “Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference.”


ZeroNet is used for creating decentralized websites. It uses the Bitcoin addressing and verification mechanisms, and the BitTorrent distributed content delivery network to create sites that cannot be censored, forged or blocked.

Build simple blockchain

Now that you know the tools – Cryptonote, Ethereum and ZeroNet – we are moving to building a basic blockchain of our own. I will be using Python in this example, but if it is not your primary coding language, you will still understand the logic and be able to write it in another language.

First, I will explain the fundamental elements required to build a block. I will start with date of creation, nonce, checksum and transaction data. Transaction data in our case could be just a string to simplify the code.

Date of creation

It is the current and time in unix format. It is required for the future development of your blockchain; when there are many running nodes and you add a new block to your branch, the node will decide which block to use based on Date of Creation.


It’s a unique set of symbols that we need to add to the block to build the checksum that fits the requirement. For example, if the nonce value is 5, then we have to add 5 zeros (00000) to the data block to calculate the right checksum.


Also sometimes referred to as hash value, hash code, or simply a hash. It is block data with nonce plus checksum of previous block. SHA256 protects the ledger chain from being rewritten.
How it works: Node calculates the checksum and compares to the one of the new block; if they match, the block is added to the blockchain.


It’s a set of data that will be stored in block and signed. It can contain any sort of data: e.g., bitcoin stores a list of transactions, not only the last transaction; or you can store the information about the computer that created the block, like its MAC address; or you can have a more detailed date of creation, say, adding the time zone.

Proof of work

Proof of work (PoW) is a unique consensus algorithm in a blockchain network. It is used to validate the operations and the creation of new chains in the blockchain network. The main idea of PoW is to add complexity to building a block on the client side and reduce the load on the server side. For example, I say checksum has to have 5 lead zeros; it means that we will increase nonce until checksum will not have 5 lead zeros.

Let’s start with a code

First of all, I will create a class for a block. It will be a very simple class with a constructor, a method for calculating the checksum and property to check that block is valid. We will have two constants, one for number of lead zeros in checksum and a second to identify which symbol we will use with the nonce.

import time  
from hashlib import sha256

class Block:  

    def __init__(self, prev_block, data):
        self._prev_block = prev_block = data
        self.checksum = None
        self.nonce = 0
        self.timestamp = time.time()

    def is_valid(self):
        checksum = self.calculate_checksum()

        return (
            checksum[:self.CHECKSUM_LEAD_ZEROS] == '0' * self.CHECKSUM_LEAD_ZEROS
            and checksum == self.checksum

    def calculate_checksum(self):
        data = '|'.join([
        data += self.NONCE_SYMBOL * self.nonce

        return sha256(bytes(data, 'utf-8')).hexdigest()

Constructor accepts only two parameters; the first is a previous block, and the second is the current block data. Also, constructor creates the time mark and sets nonce to zero as its initial value.

Is valid

A property that calculates checksum and compares if the current one is equal to the calculated and has the right number of zeros.

Calculate checksum

The most complicated method in our code. This method packs the time mark, data, and checksum of the previous block to one string. Then we add a nonce string; in our case, it will be a list of ‘Z’. Then it calculates the checksum of result string.

Now we have a simple yet fully functional block. I will move on to creating a chain of blocks. For now, it will be a simple chain without the ability to store and load data, but it will convey the main idea.

import json

class Chain:

    def __init__(self):
        self._chain = [

    def is_valid(self):
        prev_block = self._chain[0]
        for block in self._chain[1:]:
            assert prev_block.checksum == self._prev_block.checksum
            assert block.is_valid()
            prev_block = block

    def add_block(self, data):
        block = Block(self._chain[-1], data)
        block = self._find_nonce(block)

        return block

    def printchain(self):
        print(json.dumps(self._chain, indent=4))

    def _get_genesis_block():
        genesis_block = Block(None, None)
        genesis_block.checksum = '00000453880b6f9179c0661bdf8ea06135f1575aa372e0e70a19b04de0d4cbc7'

        return genesis_block

    def _find_nonce(self, block):
        beginning = '0' * Block.CHECKSUM_LEAD_ZEROS
        while True:
            checksum = block.calculate_checksum()
            if checksum[:Block.CHECKSUM_LEAD_ZEROS] == beginning:
            block.nonce += 1

        return block

Let’s take a look at methods in our chain class:


I just created a chain with only one block – a genesis block. Genesis block is a first block of the chain and has only a checksum. This block is required for adding the first real block to the chain because the real block requires a checksum of the last block in the chain.

Adding a new block

It has only one parameter — data for a new block. This method creates a new block with the given data and run method to find a correct nonce value. Only then, it will append a new block to the chain.

Find the nonce

This method aims to find the right nonce for a block. It has an infinite loop where I increase the nonce and calculate a new checksum. Then it compares the checksum with the rules — for now, it is only the number of zeros.

Validate the chain

This method tells us if the chain is valid. It goes through all blocks in the chain and checks if each block is valid.

Bottom Line

In this article, I attempted to prove that building a simple yet working blockchain is not as difficult as it may seem. My general advice is to take the lesson you learned here and start playing with blockchain by experimenting with blocks and data. All great products, including blockchain, were experiments once.

If you are from the fintech industry, I suggest you study more about the products that are using blockchain. Some things about them are certain; they are more secure, more attractive for investments, and, if they succeed in the global market, will be called game changers. The first step of adapting the new technology has been taken. The next step is to spread the knowledge and educate people about the features of blockchain.

Stay tuned for the next part about blockchain and fintech, most likely with more complex pieces of code and suggestions on its practical application in fintech.

August 14, 2018 02:32 PM UTC

Peter Bengtsson

Django lock decorator with django-redis

Here's the code. It's quick-n-dirty but it works wonderfully:

import functools
import hashlib

from django.core.cache import cache
from django.utils.encoding import force_bytes

def lock_decorator(key_maker=None):
    When you want to lock a function from more than 1 call at a time.

    def decorator(func):
        def inner(*args, **kwargs):
            if key_maker:
                key = key_maker(*args, **kwargs)
                key = str(args) + str(kwargs)
            lock_key = hashlib.md5(force_bytes(key)).hexdigest()
            with cache.lock(lock_key):
                return func(*args, **kwargs)

        return inner

    return decorator

How To Use It

This has saved my bacon more than once. I use it on functions that really need to be made synchronous. For example, suppose you have a function like this:

def fetch_remote_thing(name):
        return Thing.objects.get(name=name).result
    except Thing.DoesNotExist:
        # Need to go out and fetch this
        result = some_internet_fetching(name)  # Assume this is sloooow
        Thing.objects.create(name=name, result=result)
        return result

That function is quite dangerous because if executed by two concurrent web requests for example, they will trigger
two "identical" calls to some_internet_fetching and if the database didn't have the name already, it will most likely trigger two calls to Thing.objects.create(name=name, ...) which could lead to integrity errors or if it doesn't the whole function breaks down because it assumes that there is only 1 or 0 of these Thing records.

Easy to solve, just add the lock_decorator:

def fetch_remote_thing(name):
        return Thing.objects.get(name=name).result
    except Thing.DoesNotExist:
        # Need to go out and fetch this
        result = some_internet_fetching(name)  # Assume this is sloooow
        Thing.objects.create(name=name, result=result)
        return result

Now, thanks to Redis distributed locks, the function is always allowed to finish before it starts another one. All the hairy locking (in particular, the waiting) is implemented deep down in Redis which is rock solid.

Bonus Usage

Another use that has also saved my bacon is functions that aren't necessarily called with the same input argument but each call is so resource intensive that you want to make sure it only does one of these at a time. Suppose you have a Django view function that does some resource intensive work and you want to stagger the calls so that it only runs it one at a time. Like this for example:

def api_stats_calculations(request, part):
    if part == 'users-per-month':
        data = _calculate_users_per_month()  # expensive
    elif part == 'pageviews-per-week':
        data = _calculate_pageviews_per_week()  # intensive
    elif part == 'downloads-per-day':
        data = _calculate_download_per_day()  # slow
    elif you == 'get' and the == 'idea':

    return http.JsonResponse({'data': data})

If you just put @lock_decorator() on this Django view function, and you have some (almost) concurrent calls to this function, for example from a uWSGI server running with threads and multiple processes, then it will not synchronize the calls.

The solution to this is to write your own function for generating the lock key, like this for example:

    key_maker=lamnbda request, part: 'api_stats_calculations'
def api_stats_calculations(request, part):
    if part == 'users-per-month':
        data = _calculate_users_per_month()  # expensive
    elif part == 'pageviews-per-week':
        data = _calculate_pageviews_per_week()  # intensive
    elif part == 'downloads-per-day':
        data = _calculate_download_per_day()  # slow
    elif you == 'get' and the == 'idea':

    return http.JsonResponse({'data': data})

Now it works.

How Time-Expensive Is It?

Perhaps you worry that 99% of your calls to the function don't have the problem of calling the function concurrently. How much is this overhead of this lock costing you? I wondered that too and set up a simple stress test where I wrote a really simple Django view function. It looked something like this:

@lock_decorator(key_maker=lambda request: 'samekey')
def sample_view_function(request):
    return http.HttpResponse('Ok\n')

I started a Django server with uWSGI with multiple processors and threads enabled. Then I bombarded this function with a simple concurrent stress test and observed the requests per minute. The cost was extremely tiny and almost negligable (compared to not using the lock decorator). Granted, in this test I used Redis on redis://localhost:6379/0 but generally the conclusion was that the call is extremely fast and not something to worry too much about. But your mileage may vary so do your own experiments for your context.

What's Needed

You need to use django-redis as your Django cache backend. I've blogged before about using django-redis, for example Fastest cache backend possible for Django and Fastest Redis configuration for Django.

August 14, 2018 02:08 PM UTC

Ned Batchelder

SQLite data storage for

I’m starting to make some progress on Who Tests What. The first task is to change how records the data it collects during execution. Currently, all of the data is held in memory, and then written to a JSON file at the end of the process.

But Who Tests What is going to increase the amount of data. If your test suite has N tests, you will have roughly N times as much data to store. Keeping it all in memory will become unwieldy. Also, since the data is more complicated, you’ll want a richer way to access the data.

To solve both these problems, I’m switching over to using SQLite to store the data. This will give us a way to write the data as it is collected, rather than buffering it all to write at the end. BTW, there’s a third side-benefit to this: we would be able to measure processes without having to control their ending.

When running with --parallel, coverage adds the process id and a random number to the name of the data file, so that many processes can be measured independently. With JSON storage, we didn’t need to decide on this filename until the end of the process. With SQLite, we need it at the beginning. This has required a surprising amount of refactoring. (You can follow the carnage on the data-sqlite branch.)

There’s one problem I don’t know how to solve: a process can start coverage measurement, then fork, and continue measurement in both of the child processes, as described in issue 56. With JSON storage, the in-memory data is naturally forked when the processes fork, and then each copy proceeds on its way. When each process ends, it writes its data to a file that includes the (new) process id, and all the data is recorded.

How can I support that use case with SQLite? The file name will be chosen before the fork, and data will be dribbled into the file as it happens. After the fork, both child processes will be trying to write to the same database file, which will not work (SQLite is not good at concurrent access).

Possible solutions:

  1. Even with SQLite, buffer all the data in memory. This imposes a memory penalty on everyone just for the rare case of measuring forking processes, and loses the extra benefit of measuring non-ending processes.
  2. Make buffer-it-all be an option. This adds to the complexity of the code, and will complicate testing. I don’t want to run every test twice, with buffering and not. Does pytest offer tools for conveniently doing this only for a subset of tests?
  3. Keep JSON storage as an option. This doesn’t have an advantage over #2, and has all the complications.
  4. Somehow detect that two processes are now writing to the same SQLite file, and separate them then?
  5. Use a new process just to own the SQLite database, with coverage talking to it over IPC. That sounds complicated.
  6. Monkeypatch os.fork so we can deal with the split? Yuck.
  7. Some other thing I haven’t thought of?

Expect to see an alpha of in the next few weeks with SQLite data storage, and please test it. I’m sure there are other use cases that might experience some turbulence...

August 14, 2018 01:41 PM UTC

Semaphore Community

Getting Started with Mocking in Python

This article is brought with ❤ to you by Semaphore.


Mocking is simply the act of replacing the part of the application you are testing with a dummy version of that part called a mock.

Instead of calling the actual implementation, you would call the mock, and then make assertions about what you expect to happen.

What are the benefits of mocking?


You will need to have Python 3.3 or higher installed. Get the correct version for your platform here. I will be using version 3.6.0 for this tutorial.

Once you have that installed, set up a virtual environment:

python3 -m venv mocking

Activate the virtual environment by running:

source mocking/bin/activate

After that, add a file where our code will reside and a file for our tests.


Basic Usage

Imagine a simple class:

class Calculator:
    def sum(self, a, b):
        return a + b

This class implements one method, sum that takes two arguments, the numbers to be added, a and b. It returns a + b;

A simple test case for this could be as follows:

from unittest import TestCase
from main import Calculator

class TestCalculator(TestCase):
    def setUp(self):
        self.calc = Calculator()

    def test_sum(self):
        answer = self.calc.sum(2, 4)
        self.assertEqual(answer, 6)

You can run this test case using the command:

python -m unittest

You should see output that looks approximately like this:


Ran 1 test in 0.003s


Pretty fast, right?

Now, imagine the code looked like this:

import time

class Calculator:
    def sum(self, a, b):
        time.sleep(10) # long running process
        return a + b

Since this is a simple example, we are using time.sleep() to simulate a long running process. The previous test case now produces the following output:


Ran 1 test in 10.003s


That process has just considerably slowed down our tests. It is clearly not a good idea to call the sum method as is every time we run tests. This is a situation where we can use mocking to speed up our tests and avoid an undesired effect at the same time.

Let's refactor the test case so that instead of calling sum every time the test runs, we call a mock sum function with well defined behavior.

from unittest import TestCase
from unittest.mock import patch

class TestCalculator(TestCase):
    @patch('main.Calculator.sum', return_value=9)
    def test_sum(self, sum):
        self.assertEqual(sum(2,3), 9)

We are importing the patch decorator from unittest.mock. It replaces the actual sum function with a mock function that behaves exactly how we want. In this case, our mock function always returns 9. During the lifetime of our test, the sum function is replaced with its mock version. Running this test case, we get this output:


Ran 1 test in 0.001s


While this may seem counter-intuitive at first, remember that mocking allows you to provide a so-called fake implementation of the part of your system you are testing. This gives you a lot of flexibility during testing. You'll see how to provide a custom function to run when your mock is called instead of hard coding a return value in the section titled Side Effects.

A More Advanced Example

In this example, we'll be using the requests library to make API calls. You can get it via pip install.

pip install requests

Our code under test in looks as follows:

import requests

class Blog:
    def __init__(self, name): = name

    def posts(self):
        response = requests.get("")

        return response.json()

    def __repr__(self):
        return '<Blog: {}>'.format(

This code defines a class Blog with a posts method. Invoking posts on the Blog object will trigger an API call to jsonplaceholder, a JSON generator API service.

In our test, we want to mock out the unpredictable API call and only test that a Blog object's posts method returns posts. We will need to patch all Blog objects' posts methods as follows.

from unittest import TestCase
from unittest.mock import patch, Mock

class TestBlog(TestCase):
    def test_blog_posts(self, MockBlog):
        blog = MockBlog()

        blog.posts.return_value = [
                'userId': 1,
                'id': 1,
                'title': 'Test Title',
                'body': 'Far out in the uncharted backwaters of the unfashionable  end  of the  western  spiral  arm  of  the Galaxy\ lies a small unregarded yellow sun.'

        response = blog.posts()
        self.assertIsInstance(response[0], dict)

You can see from the code snippet that the test_blog_posts function is decorated with the @patch decorator. When a function is decorated using @patch, a mock of the class, method or function passed as the target to @patch is returned and passed as an argument to the decorated function.

In this case, @patch is called with the target main.Blog and returns a Mock which is passed to the test function as MockBlog. It is important to note that the target passed to @patch should be importable in the environment @patch is being invoked from. In our case, an import of the form from main import Blog should be resolvable without errors.

Also, note that MockBlog is a variable name to represent the created mock and can be you can name it however you want.

Calling blog.posts() on our mock blog object returns our predefined JSON. Running the tests should pass.


Ran 1 test in 0.001s


Note that testing the mocked value instead of an actual blog object allows us to make extra assertions about how the mock was used.

For example, a mock allows us to test how many times it was called, the arguments it was called with and even whether the mock was called at all. We'll see additional examples in the next section.

Other Assertions We Can Make on Mocks

Using the previous example, we can make some more useful assertions on our Mock blog object.

import main

from unittest import TestCase
from unittest.mock import patch

class TestBlog(TestCase):
    def test_blog_posts(self, MockBlog):
        blog = MockBlog()

        blog.posts.return_value = [
                'userId': 1,
                'id': 1,
                'title': 'Test Title,
                'body': 'Far out in the uncharted backwaters of the unfashionable  end  of the  western  spiral  arm  of  the Galaxy\ lies a small unregarded yellow sun.'

        response = blog.posts()
        self.assertIsInstance(response[0], dict)

        # Additional assertions
        assert MockBlog is main.Blog # The mock is equivalent to the original

        assert MockBlog.called # The mock wasP called

        blog.posts.assert_called_with() # We called the posts method with no arguments

        blog.posts.assert_called_once_with() # We called the posts method once with no arguments

        # blog.posts.assert_called_with(1, 2, 3) - This assertion is False and will fail since we called blog.posts with no arguments

        blog.reset_mock() # Reset the mock object

        blog.posts.assert_not_called() # After resetting, posts has not been called.

As stated earlier, the mock object allows us to test how it was used by checking the way it was called and which arguments were passed, not just the return value.

Mock objects can also be reset to a pristine state i.e. the mock object has not been called yet. This is especially useful when you want to make multiple calls to your mock and want each one to run on a fresh instance of the mock.

Side Effects

These are the things that you want to happen when your mock function is called. Common examples are calling another function or raising exceptions.

Let us revisit our sum function. What if, instead of hard coding a return value, we wanted to run a custom sum function instead? Our custom function will mock out the undesired long running time.sleep call and only remain with the actual summing functionality we want to test. We can simply define a side_effect in our test.

from unittest import TestCase
from unittest.mock import patch

def mock_sum(a, b):
    # mock sum function without the long running time.sleep
    return a + b

class TestCalculator(TestCase):
    @patch('main.Calculator.sum', side_effect=mock_sum)
    def test_sum(self, sum):
        self.assertEqual(sum(2,3), 5)
        self.assertEqual(sum(7,3), 10)

Running the tests should pass:


Ran 1 test in 0.001s


Continous Integration Using Semaphore CI

Adding Continous Integration with Semaphore is very easy. Once you have everything committed and pushed to Github or Bitbucket, go here and create a new account or sign into an existing account. We'll be using a Github repo containing the Blog class example and test.

From your dashboard, click Add New Project.

add new

You will be asked to select either Github or Bitbucket as a source. Pick a source as per your preference.

add source

After selecting a source, select the repository.

select repo

Next, select the branch to build from.

select branch

Semaphore will analyze the project and show you the build settings:


Customize your plan to look as follows:

build settings

After that, click Build with these settings at the bottom of that page.


Once your build passes, that's it. You have successfully set up continuous integration on Semaphore CI.


In this article, we have gone through the basics of mocking with Python. We have covered using the @patch decorator and also how to use side effects to provide alternative behavior to your mocks. We also covered how to run a build on Semaphore.

You should now be able to use Python's inbuilt mocking capabilities to replace parts of your system under test to write better and faster tests.

For more detailed information, the official docs are a good place to start.

Please feel free to leave your comments and questions in the comments section below.

This article is brought with ❤ to you by Semaphore.

August 14, 2018 11:41 AM UTC

Mike Driscoll

Python 101: Episode #20 – The sys module

In this screencast, you will learn the basics of Python’s sys module from the standard library. You can also read the chapter this screencast is based on here or on Leanpub

Check out the entire Python 101 Screencast play list

August 14, 2018 05:05 AM UTC

Spyder IDE

Spyder 3.3.0 and 3.3.1 released!

We're pleased to release the next significant update in the stable Spyder 3 line, 3.3.0, along with its follow-on bugfix point release, 3.3.1, which is now live on PyPI and conda. As always, you can update with conda update spyder in the Anaconda Prompt/Terminal/command line (on Windows/macOS/Linux, respectively) if on Anaconda (recommended), or pip update spyder otherwise. If you run into any trouble, please carefully read our new installation documentation and consult our Troubleshooting Guide, which contains straightforward solutions to the vast majority of install-related issues users have reported.

As a new minor version (3.3), it makes several substantial changes to Spyder's underpinnings that deserve some explanation, particularly the newly modular and portable console system that's now separated into its own spyder-kernels package, opening up several new options for users running Spyder in different environments. There's also a brand-new error reporting process, new options in the IPython console, usability and performance improvements for the Variable Explorer, multiple new and changed dependency requirements and more, so there's plenty to go over. Finally, we'd like to briefly share a few final notes on this release and the latest on our plans going forward.

Modular, flexible Console architecture

The biggest single change with version 3.3.0/3.3.1 is a major overhaul of how IPython Consoles are started and managed in Spyder. More precisely, we've moved all the kernel-related code from the Spyder core into a new modular package, spyder-kernels, available on the same distribution channels as Spyder itself (and installed automatically when updating to >=3.3.0). While the most dramatic differences are under the hood, there's plenty for everyone to like (and a few things to be aware of).

Most importantly, for our everyday users, this makes Spyder much more flexible and powerful when working with multiple Python environments. With the changes, Spyder itself does not need to be present in every environment you'd like to launch a kernel in; you can install the full IDE in whatever manner you prefer, and then set it to run code and consoles in any Anaconda environment, virtualenv/venv, or even a totally separate Python installation, so long as it has spyder-kernels package available. Just set the path under Tools -> Preferences -> Python interpreter -> Use the following Python interpreter to the desired Python executable, and any new Console you open will start in the selected environment. Check out our new wiki page on using environments with Spyder, for more details and tips on the subject, and keep an eye out for the further improvements coming in Spyder 4, which will greatly simplify the process and include full GUI-based project, package and environment management functionality built right in.

Python interpreter pane of the Spyder preferences dialog, with the

Furthermore, the new package allows you to independently launch a kernel from anywhere (on your local computer, or a remote machine, server or even supercomputing cluster), connect to it with Spyder, and use it just like a "natively" started one. After installing spyder-kernels on the host environment, you can start one with python -m spyder_kernels.console, and then enter the kernel's 4-digit ID (and SSH connection details, if a remote machine) in the Spyder Connect to an existing kernel dialog under the IPython Console pane context- or "gear"-menu). For more information on the process, see the Connecting to a Console section in our new documentation.

A remote kernel running in a system console alongside Spyder's connect to kernel dialog

Best of all, no matter how or where a kernel is started, every console now supports the full suite of Spyder's features, including completion, the Variable Explorer, interactive Help and more, unlike before. You can even mix and match internal, external and remote kernels in different environments, all in the same Spyder session, by either changing the Python interpreter preferences setting between starting a console, or starting and connecting to multiple consoles externally—or both! Finally, for those of us (and those of you!) who help develop Spyder, the changes also make it easier to maintain and improve the code, and opens the door to one of the biggest features coming in Spyder 4: a new, full-featured debugging kernel that many of you have been asking for.

The one key thing to remember: make sure you install the appropriate version of spyder-kernels for your version of Spyder. For most users, that will be spyder-kernels 0.x (currently 0.2.6) to match Spyder 3 on our stable 3.x branch; if testing a Spyder 4 beta or Github clone of the master branch, you'll want the latest 1.x version of spyder-kernels (currently 1.1.0). To install the correct build, you can use the following conda command,

conda install spyder-kernels=<0 or 1>.*

or with pip,

pip install spyder-kernels==<0 or 1>.*

replacing <0 or 1> with the major version number (0 or 1) to match your Spyder version. Further details specific to installing a development build can be found in our Contributing Guide or our install documentation.

New IPython Console completion and plotting features

Advanced tab of the IPython console pane of Spyder's preferences, with the new Jedi completion section highlighted

Spyder's IPython Consoles can now use an advanced jedi-based completion engine that, similar to the Editor, analyzes your code without actually having to run it first. This allows for advanced completion functionality on objects not yet assigned to a variable, similar to the existing "greedy" completion option, but without the need for dynamic evaluation. It can be slow if working with very large Pandas DataFrames so it is disabled by default, but you can activate it under Tools -> Preferences -> IPython console -> Advanced Settings -> Jedi completion. The descriptive text for the "greedy" completion option (also off by default) was also clarified, particularly to explain an IPython bug (not present in the jedi completer) with the feature and a consequent workaround.

Graphics tab of the IPython console pane of Spyder's preferences, with the new

We've also added a new plotting setting, Use a tight layout for inline plots, for the Inline Matplotlib graphics backend. The default behavior (as in previous Spyder versions) sets bbox_inches to "tight" in Matplotlib calls when drawing to the inline backend. However, if you prefer your own bbox_inches argument be respected even when plots are rendered in the Console, you can now do so by unchecking this option under Tools -> Preferences -> IPython console -> Graphics -> Inline backend.

Comparison of inline plots in Spyder's IPython Console with and without the

Better Variable Explorer usability and performance

We've made several changes and optimizations to greatly improve performance and efficiency in the Variable Explorer, to make it much faster and use less memory when opening and editing large objects. In particular, we've fixed several major memory leaks when saving edited objects and closing the Variable Explorer dialogs through better length validation and garbage collection, and now skip the whole saving process entirely if the object was not modified (or cannot be modified). We've also changed the names and functions of the Cancel and Ok buttons in Variable Explorer dialogs to be easier to understand and use. They now feature a Close button which exits the dialog without saving any edits to the object's contents, and a Save and Close button—automatically enabled once modifications are made—that commits the changes back to the kernel.

A Variable Explorer DataFrame editor dialog, showing the new

Streamlined error reporting experience

While we hope you never need to use it, Spyder 3.3.0 includes a brand-new error handling backend that can submit bug reports directly through the Github API. Based off Colin Duquesnoy's excellent QCrash framework, this is a major improvement in speed, functionality, reliability and user convenience over the old approach (essentially just opening a link in a web browser). Just as before, we won't send anything without your explicit consent, you need a Github account (or create one for free), and you can view and edit the report on Github at any time.

The new authentication dialogs for submitting a Github report, with a username/password and a token option

You will need to enter your Github credentials the first time you submit a report. For this, you can create an app token which only grants the very limited permissions needed to create a public issue report, can be easily revoked and re-created, and works with two-factor authentication (which you should be using); however, if you have not yet enabled 2FA, it also offers the option to enter your Github username and password. Either way, Spyder can securely remember your login using the keyring package, so you only have to do this once on any given machine (if you select the "remember" option).

The new error reporting interface, with a title field, more descriptive text, and a polished UI

The dialog itself has also been made more functional and user-friendly, designed to help encourage high-quality, useful reports, and with more accessible, descriptive text. The reports themselves also contain more useful data about the problem, and there is now a --safe-mode command-line option for Spyder to start in a clean, temporary config directly, so you can test to see if the problem reoccurs without the hassle of a spyder --reset, and play around with other settings without impacting your main configuration. Finally, we've fixed over 40 bugs in this release and further improved our documentation and troubleshooting material, so hopefully you'll see this less often.

Cleaner under the hood and more

Alongside the aforementioned internal changes, we've also made a number of other under-the-hood changes to clean out old cruft and improve maintainability, readability and performance of our codebase. In particular, we've officially dropped support for Python 3.3, PyQt4, and PyQt5 < 5.5, all versions which have been end-of-life for years, and (aside from PyQt4) have minimal or no remaining Spyder users. Furthermore, dropping PyQt4 in particular allows us to avoid or resolve a number of unfixable bugs specific to that version that have been causing problems for users, and opens the door to easier development in the future. Finally, we moved our legacy documentation (and its many associated images) from the main Spyder codebase to its own repo, executed a major overhaul to greatly modernize and expand the text, images, style, and presentation, and now deploy them onto their own subdomain of our site, all of which we will discuss in a separate post coming soon.

Even more fixes and refinements with Spyder 3.3.1

As a quick follow-on to the 3.3.0 release, Spyder 3.3.1 fixed a handful of bugs and minor issues with the new functionality and cleaned up several other existing ones, as well made a number of lower-level maintenance and development-oriented changes—over two dozen in all. Furthermore, several user-visible enhancements made it into the release, primary aimed at improving usability. To make it easier for users to manage multiple environments, the selection UI under Preferences > Python interpreter > Use the following Python interpreter remembers the executables you've previously selected and allows quick switching between them.

Python interpreter pane of Spyder's preferences, showing the new environment selection UI

In the Console, mundane ipdb commands are automatically filtered from the history, and the Editor now supports syntax highlighting for the new numeric literal syntax introduced in Python 3.6. Spyder's tutorial has been re-written for modern Spyder as well as to be clearer and more understandable, and overhauled for better and more consistent formatting and visuals with the rest of our documentation. Finally, our update checker now consults the Anaconda defaults channel rather than PyPI to determine if an update is available, so it doesn't bug the majority of our users on Anaconda until they can actually aquire the package.

What to know and what's next

If you have any questions, problems or feedback, we'd love to hear from you (just make sure you read our documentation, Troubleshooting Guide and the other previously-mentioned resources first)! For general questions or install issues that aren't addressed by the above, our Google Group and Gitter live chat are a good place to ask, while our Github is the place to report bugs, request features, or help develop Spyder itself (though make sure to search our issues list to ensure it hasn't already been submitted). Finally, you can follow our Facebook and Twitter for the latest Spyder news, releases, previews and tips, and help support Spyder development on OpenCollective.

There's plenty to look forward to in the coming days, with the official release of our all-new documentation (that's already live now), Spyder 4 beta 1 having just been released on PyPI, conda-forge and our own spyder-ide channel (with a blog post coming soon), an upcoming article on our official Spyder 4 feature roadmap and more, so stay tuned! In the meantime, happy Spydering and enjoy the new 3.3.1!

August 14, 2018 12:00 AM UTC

August 13, 2018


How to Build a Data Science Portfolio

How do you get a job in data science? Knowing enough statistics, machine learning, programming, etc to be able to get a job is difficult. One thing I have found lately is quite a few people may have the required skills to get a job, but no portfolio. While a resume matters, having a portfolio of public evidence of your data science skills can do wonders for your job prospects. Even if you have a referral, the ability to show potential employers what you can do instead of just telling them you can do something is important.

August 13, 2018 11:08 PM UTC

Peter Bengtsson

django-html-validator now supports Django 2.x

django-html-validator is a Django project that can validate your generated HTML. It does so by sending the HTML to or you can start your own Java server locally with vnu.jar from here. The output is that you can have validation errors printed to stdout or you can have them put as .txt files in a temporary directory. You can also include it in your test suite and make it so that tests fail if invalid HTML is generated during rendering in Django unit tests.

The project seems to have become a lot more popular than I thought it would. It started as a one-evening-hack and because there was interest I wrapped it up in a proper project with "docs" and set up CI for future contributions.

I kinda of forgot the project since almost all my current projects generate JSON on the server and generates the DOM on-the-fly with client-side JavaScript but apparently a lot of issues and PRs were filed related to making it work in Django 2.x. So I took the time last night to tidy up the tox.ini etc. and the necessary compatibility fixes to make it work with but Django 1.8 up to Django 2.1. Pull request here.

Thank you all who contributed! I'll try to make a better job noticing filed issues in the future.

August 13, 2018 03:21 PM UTC

Real Python

Advanced Git Tips for Python Developers

If you’ve done a little work in Git and are starting to understand the basics we covered in our introduction to Git, but you want to learn to be more efficient and have more control, then this is the place for you!

In this tutorial, we’ll talk about how to address specific commits and entire ranges of commits, using the stash to save temporary work, comparing different commits, changing history, and how to clean up the mess if something doesn’t work out.

This article assumes you’ve worked through our first Git tutorial or at a minimum understand the basics of what Git is and how it works.

There’s a lot of ground to cover, so let’s get going.

Revision Selection

There are several options to tell Git which revision (or commit) you want to use. We’ve already seen that we can use a full SHA (25b09b9ccfe9110aed2d09444f1b50fa2b4c979c) and a short SHA (25b09b9cc) to indicate a revision.

We’ve also seen how you can use HEAD or a branch name to specify a particular commit as well. There are a few other tricks that Git has up its sleeve, however.

Relative Referencing

Sometimes it’s useful to be able to indicate a revision relative to a known position, like HEAD or a branch name. Git provides two operators that, while similar, behave slightly differently.

The first of these is the tilde (~) operator. Git uses tilde to point to a parent of a commit, so HEAD~ indicates the revision before the last one committed. To move back further, you use a number after the tilde: HEAD~3 takes you back three levels.

This works great until we run into merges. Merge commits have two parents, so the ~ just selects the first one. While that works sometimes, there are times when you want to specify the second or later parent. That’s why Git has the caret (^) operator.

The ^ operator moves to a specific parent of the specified revision. You use a number to indicate which parent. So HEAD^2 tells Git to select the second parent of the last one committed, not the “grandparent.” It can be repeated to move back further: HEAD^2^^ takes you back three levels, selecting the second parent on the first step. If you don’t give a number, Git assumes 1.

Note: Those of you using Windows will need to escape the ^ character on the DOS command line by using a second ^.

To make life even more fun and less readable, I’ll admit, Git allows you to combine these methods, so 25b09b9cc^2~3^3 is a valid way to indicate a revision if you’re walking back a tree structure with merges. It takes you to the second parent, then back three revisions from that, and then to the third parent.

Revision Ranges

There are a couple of different ways to specify ranges of commits for commands like git log. These don’t work exactly like slices in Python, however, so be careful!

Double Dot Notation

The “double dot” method for specifying ranges looks like it sounds: git log b05022238cdf08..60f89368787f0e. It’s tempting to think of this as saying “show me all commits after b05022238cdf08 up to and including 60f89368787f0e” and, if b05022238cdf08 is a direct ancestor of 60f89368787f0e, that’s exactly what it does.

Note: For the rest of this section, I will be replacing the SHAs of individual commits with capital letters as I think that makes the diagrams a little easier to follow. We’ll use this “fake” notation later as well.

It’s a bit more powerful than that, however. The double dot notation actually is showing you all commits that are included in the second commit that are not included in the first commit. Let’s look at a few diagrams to clarify:

Branch1-A->B->C, Branch2 A->D->E->FB->C, Branch2 A->D->E->F" />B->C, Branch2 A->D->E->F" />

As you can see, we have two branches in our example repo, branch1 and branch2, which diverged after commit A. For starters, let’s look at the simple situation. I’ve modified the log output so that it matches the diagram:

$ git log --oneline D..F
E "Commit message for E"
F "Commit message for F"

D..F gives you all of the commits on branch2 after commit D.

A more interesting example, and one I learned about while writing this tutorial, is the following:

$ git log --oneline C..F
D "Commit message for D"
E "Commit message for E"
F "Commit message for F"

This shows the commits that are part of commit F that are not part of commit C. Because of the structure here, there is not a before/after relationship to these commits because they are on different branches.

What do you think you’ll get if you reverse the order of C and F?

$ git log --oneline F..C
B "Commit message for B"
C "Commit message for C"

Triple Dot

Triple dot notation uses, you guessed it, three dots between the revision specifiers. This works in a similar manner to the double dot notation except that it shows all commits that are in either revision that are not included in both revisions. For our diagram above, using C...F shows you this:

$ git log --oneline C...F
D "Commit message for D"
E "Commit message for E"
F "Commit message for F"
B "Commit message for B"
C "Commit message for C"

Double and triple dot notation can be quite powerful when you want to use a range of commits for a command, but they’re not as straightforward as many people think.

Branches vs. HEAD vs. SHA

This is probably a good time to review what branches are in Git and how they relate to SHAs and HEAD.

HEAD is the name Git uses to refer to “where your file system is pointing right now.” Most of the time, this will be pointing to a named branch, but it does not have to be. To look at these ideas, let’s walk through an example. Suppose your history looks like this:

Four Commits With No Branches

At this point, you discover that you accidentally committed a Python logging statement in commit B. Rats. Now, most people would add a new commit, E, push that to master and be done. But you are learning Git and want to fix this the hard way and hide the fact that you made a mistake in the history.

So you move HEAD back to B using git checkout B, which looks like this:

Four Commits, HEAD Points to Second Commit

You can see that master hasn’t changed position, but HEAD now points to B. In the Intro to Git tutorial, we talked about the “detached HEAD” state. This is that state again!

Since you want to commit changes, you create a new branch with git checkout -b temp:

New Branch temp Points To Second Commit

Now you edit the file and remove the offending log statement. Once that is done, you use git add and git commit --amend to modify commit B:

New Commit B' Added

Whoa! There’s a new commit here called B'. Just like B, it has A as its parent, but C doesn’t know anything about it. Now we want master to be based on this new commit, B'.

Because you have a sharp memory, you remember that the rebase command does just that. So you get back to the master branch by typing git checkout master:

HEAD Moved Back To master

Once you’re on master, you can use git rebase temp to replay C and D on top of B:

master Rebased On B'

You can see that the rebase created commits C' and D'. C' still has the same changes that C has, and D' has the same changes as D, but they have different SHAs because they are now based on B' instead of B.

As I mentioned earlier, you normally wouldn’t go to this much trouble just to fix an errant log statement, but there are times when this approach could be useful, and it does illustrate the differences between HEAD, commits, and branches.


Git has even more tricks up its sleeve, but I’ll stop here as I’ve rarely seen the other methods used in the wild. If you’d like to learn about how to do similar operations with more than two branches, checkout the excellent write-up on Revision Selection in the Pro Git book.

Handling Interruptions: git stash

One of the Git features I use frequently and find quite handy is the stash. It provides a simple mechanism to save the files you’re working on but are not ready to commit so you can switch to a different task. In this section, you’ll walk through a simple use case first, looking at each of the different commands and options, then you will wrap up with a few other use cases in which git stash really shines.

git stash save and git stash pop

Suppose you’re working on a nasty bug. You’ve got Python logging code in two files, file1 and file2, to help you track it down, and you’ve added file3 as a possible solution.

In short, the changes to the repo are as follows:

You do a git status to confirm the condition of the repo:

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

   modified:   file1

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:   file2

Untracked files:
  (use "git add <file>..." to include in what will be committed)


Now a coworker (aren’t they annoying?) walks up and tells you that production is down and it’s “your turn.” You know you can break out your mad git stash skills to save you some time and save the day.

You haven’t finished with the work on files 1, 2, and 3, so you really don’t want to commit those changes but you need to get them off of your working directory so you can switch to a different branch to fix that bug. This is the most basic use case for git stash.

You can use git stash save to “put those changes away” for a little while and return to a clean working directory. The default option for stash is save so this is usually written as just git stash.

When you save something to stash, it creates a unique storage spot for those changes and returns your working directory to the state of the last commit. It tells you what it did with a cryptic message:

$ git stash save
Saved working directory and index state WIP on master: 387dcfc adding some files
HEAD is now at 387dcfc adding some files

In that output, master is the name of the branch, 387dcfc is the SHA of the last commit, adding some files is the commit message for that commit, and WIP stands for “work in progress.” The output on your repo will likely be different in those details.

If you do a status at this point, it will still show file3 as an untracked file, but file1 and file2 are no longer there:

$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)


nothing added to commit but untracked files present (use "git add" to track)

At this point, as far as Git is concerned, your working directory is “clean,” and you are free to do things like check out a different branch, cherry-pick changes, or anything else you need to.

You go and check out the other branch, fix the bug, earn the admiration of your coworkers, and now are ready to return to this work.

How do you get the last stash back? git stash pop!

Using the pop command at this point looks like this:

$ git stash pop
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:   file1
   modified:   file2

Untracked files:
  (use "git add <file>..." to include in what will be committed)


no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (71d0f2469db0f1eb9ee7510a9e3e9bd3c1c4211c)

Now you can see at the bottom that it has a message about “Dropped refs/stash@{0}”. We’ll talk more about that syntax below, but it’s basically saying that it applied the changes you had stashed and got rid of the stash itself. Before you ask, yes, there is a way to use the stash and not get rid of it, but let’s not get ahead of ourselves.

You’ll notice that file1 used to be in the index but no longer is. By default, git stash pop doesn’t maintain the status of changes like that. There is an option to tell it to do so, of course. Add file1 back to the index and try again:

$ git add file1
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

   modified:   file1

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:   file2

Untracked files:
  (use "git add <file>..." to include in what will be committed)


$ git stash save "another try"
Saved working directory and index state On master: another try
HEAD is now at 387dcfc adding some files
$ git stash pop --index
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

   modified:   file1

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:   file2

Untracked files:
  (use "git add <file>..." to include in what will be committed)


Dropped refs/stash@{0} (aed3a02aeb876c1137dd8bab753636a294a3cc43)

You can see that the second time we added the --index option to the git pop command, which tells it to try to maintain the status of whether or not a file is in the index.

In the previous two attempts, you probably noticed that file3 was not included in your stash. You might want to keep file3 together with those other changes. Fortunately, there is an option to help you with that: --include-untracked.

Assuming we’re back to where we were at the end of the last example, we can re-run the command:

$ git stash save --include-untracked "third attempt"
Saved working directory and index state On master: third attempt
HEAD is now at 387dcfc adding some files
$ git status
On branch master
nothing to commit, working directory clean

This put the untracked file3 into the stash with our other changes.

Before we move on, I just want to point out that save is the default option for git stash. Unless you’re specifying a message, which we’ll discuss later, you can simply use git stash, and it will do a save.

git stash list

One of the powerful features of git stash is that you can have more than one of them. Git stores stashes in a stack, which means that by default it always works with the most recently saved stash. The git stash list command will show you the stack of stashes in your local repo. Let’s create a couple of stashes so we can see how this works:

$ echo "editing file1" >> file1
$ git stash save "the first save"
Saved working directory and index state On master: the first save
HEAD is now at b3e9b4d adding file3
$ # you can see that stash save cleaned up our working directory
$ # now create a few more stashes by "editing" files and saving them
$ echo "editing file2" >> file2
$ git stash save "the second save"
Saved working directory and index state On master: the second save
HEAD is now at b3e9b4d adding file3
$ echo "editing file3" >> file3
$ git stash save "the third save"
Saved working directory and index state On master: the third save
HEAD is now at b3e9b4d adding file3
$ git status
On branch master
nothing to commit, working directory clean

You now have three different stashes saved. Fortunately, Git has a system for dealing with stashes that makes this easy to deal with. The first step of the system is the git stash list command:

$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the second save
stash@{2}: On master: the first save

List shows you the stack of stashes you have in this repo, the newest one first. Notice the stash@{n} syntax at the start of each entry? That’s the name of that stash. The rest of the git stash subcommand will use that name to refer to a specific stash. Generally if you don’t give a name, it always assumes you mean the most recent stash, stash@{0}. You’ll see more of this in a bit.

Another thing I’d like to point out here is that you can see the message we used when we did the git stash save "message" command in the listing. This can be quite helpful if you have a number of things stashed.

As we mentioned above, the save [name] portion of the git stash save [name] command is not required. You can simply type git stash, and it defaults to a save command, but the auto-generated message doesn’t give you much information:

$ echo "more editing file1" >> file1
$ git stash
Saved working directory and index state WIP on master: 387dcfc adding some files
HEAD is now at 387dcfc adding some files
$ git stash list
stash@{0}: WIP on master: 387dcfc adding some files
stash@{1}: On master: the third save
stash@{2}: On master: the second save
stash@{3}: On master: the first save

The default message is WIP on <branch>: <SHA> <commit message>., which doesn’t tell you much. If we had done that for the first three stashes, they all would have had the same message. That’s why, for the examples here, I use the full git stash save <message> syntax.

git stash show

Okay, so now you have a bunch of stashes, and you might even have meaningful messages describing them, but what if you want to see exactly what’s in a particular stash? That’s where the git stash show command comes in. Using the default options tells you how many files have changed, as well as which files have changed:

$ git stash show stash@{2}
 file1 | 1 +
 1 file changed, 1 insertion(+)

The default options do not tell you what the changes were, however. Fortunately, you can add the -p/--patch option, and it will show you the diffs in “patch” format:

$ git stash show -p stash@{2}
diff --git a/file1 b/file1
index e212970..04dbd7b 100644
--- a/file1
+++ b/file1
@@ -1 +1,2 @@
+editing file1

Here it shows you that the line “editing file1” was added to file1. If you’re not familiar with the patch format for displaying diffs, don’t worry. When you get to the git difftool section below, you’ll see how to bring up a visual diff tool on a stash.

git stash pop vs. git stash apply

You saw earlier how to pop the most recent stash back into your working directory by using the git stash pop command. You probably guessed that the stash name syntax we saw earlier also applies to the pop command:

$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the second save
stash@{2}: On master: the first save
$ git stash pop stash@{1}
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
while read line; do echo -n "$line" | wc -c; done<
   modified:   file2

no changes added to commit (use "git add" and/or "git commit -a")
Dropped stash@{1} (84f7c9890908a1a1bf3c35acfe36a6ecd1f30a2c)
$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the first save

You can see that the git stash pop stash@{1} put “the second save” back into our working directory and collapsed our stack so that only the first and third stashes are there. Notice how “the first save” changed from stash@{2} to stash@{1} after the pop.

It’s also possible to put a stash onto your working directory but leave it in the stack as well. This is done with git stash apply:

$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the first save
$ git stash apply stash@{1}
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

   modified:   file1
   modified:   file2

no changes added to commit (use "git add" and/or "git commit -a")
$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the first save

This can be handy if you want to apply the same set of changes multiple times. I recently used this while working on prototype hardware. There were changes needed to get the code to work on the particular hardware on my desk, but none of the others. I used git stash apply to apply these changes each time I brought down a new copy of master.

git stash drop

The last stash subcommand to look at is drop. This is useful when you want to throw away a stash and not apply it to your working directory. It looks like this:

$ git status
On branch master
nothing to commit, working directory clean
$ git stash list
stash@{0}: On master: the third save
stash@{1}: On master: the first save
$ git stash drop stash@{1}
Dropped stash@{1} (9aaa9996bd6aa363e7be723b4712afaae4fc3235)
$ git stash drop
Dropped refs/stash@{0} (194f99db7a8fcc547fdd6d9f5fbffe8b896e2267)
$ git stash list
$ git status
On branch master
nothing to commit, working directory clean

This dropped the last two stashes, and Git did not change your working directory. There are a couple of things to notice in the above example. First, the drop command, like most of the other git stash commands, can use the optional stash@{n} names. If you don’t supply it, Git assumes stash@{0}.

The other interesting thing is that the output from the drop command gives you a SHA. Like other SHAs in Git, you can make use of this. If, for example, you really meant to do a pop and not a drop on stash@{1} above, you can create a new branch with that SHA it showed you (9aaa9996):

$ git branch tmp 9aaa9996
$ git status
On branch master
nothing to commit, working directory clean
$ # use git log <branchname> to see commits on that branch
$ git log tmp
commit 9aaa9996bd6aa363e7be723b4712afaae4fc3235
Merge: b3e9b4d f2d6ecc
Author: Jim Anderson <>
Date:   Sat May 12 09:34:29 2018 -0600

    On master: the first save
[rest of log deleted for brevity]

Once you have that branch, you can use the git merge or other techniques to get those changes back to your branch. If you didn’t save the SHA from the git drop command, there are other methods to attempt to recover the changes, but they can get complicated. You can read more about it here.

git stash Example: Pulling Into a Dirty Tree

Let’s wrap up this section on git stash by looking at one of its uses that wasn’t obvious to me at first. Frequently when you’re working on a shared branch for a longer period of time, another developer will push changes to the branch that you want to get to your local repo. You’ll remember that we use the git pull command to do this. However, if you have local changes in files that the pull will modify, Git refuses with an error message explaining what went wrong:

error: Your local changes to the following files would be overwritten by merge:
   <list of files that conflict>
Please, commit your changes or stash them before you can merge.

You could commit this and then do a pull , but that would create a merge node, and you might not be ready to commit those files. Now that you know git stash, you can use it instead:

$ git stash
Saved working directory and index state WIP on master: b25fe34 Cleaned up when no TOKEN is present. Added ignored tasks
HEAD is now at <SHA> <commit message>
$ git pull
Updating <SHA1>..<SHA2>
  <more info here>
$ git stash pop
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
   <rest of stash pop output trimmed>

It’s entirely possible that doing the git stash pop command will produce a merge conflict. If that’s the case, you’ll need to hand-edit the conflict to resolve it, and then you can proceed. We’ll discuss resolving merge conflicts below.

Comparing Revisions: git diff

The git diff command is a powerful feature that you’ll find yourself using quite frequently. I looked up the list of things it can compare and was surprised by the list. Try typing git diff --help if you’d like to see for yourself. I won’t cover all of those use cases here, as many of them aren’t too common.

This section has several use cases with the diff command, which displays on the command line. The next section shows how you can set Git up to use a visual diff tool like Meld, Windiff, BeyondCompare, or even extensions in your IDE. The options for diff and difftool are the same, so most of the discussion in this section will apply there too, but it’s easier to show the output on the command line version.

The most common use of git diff is to see what you have modified in your working directory:

$ echo "I'm editing file3 now" >> file3
$ git diff
diff --git a/file3 b/file3
index faf2282..c5dd702 100644
--- a/file3
+++ b/file3
@@ -1,3 +1,4 @@
{other contents of files3}
+I'm editing file3 now

As you can see, diff shows you the diffs in a “patch” format right on the command line. Once you work through the format, you can see that the + characters indicate that a line has been added to the file, and, as you’d expect, the line I'm editing file3 now was added to file3.

The default options for git diff are to show you what changes are in your working directory that are not in your index or in HEAD. If you add the above change to the index and then do diff, it shows that there are no diffs:

$ git add file3
$ git diff
[no output here]

I found this confusing for a while, but I’ve grown to like it. To see the changes that are in the index and staged for the next commit, use the --staged option:

$ git diff --staged
diff --git a/file3 b/file3
index faf2282..c5dd702 100644
--- a/file3
+++ b/file3
@@ -1,3 +1,4 @@
+I'm editing file3 now

The git diff command can also be used to compare any two commits in your repo. This can show you the changes between two SHAs:

$ git diff b3e9b4d 387dcfc
diff --git a/file3 b/file3
deleted file mode 100644
index faf2282..0000000
--- a/file3
+++ /dev/null
@@ -1,3 +0,0 @@

You can also use branch names to see the full set of changes between one branch and another:

$ git diff master tmp
diff --git a/file1 b/file1
index e212970..04dbd7b 100644
--- a/file1
+++ b/file1
@@ -1 +1,2 @@
+editing file1

You can even use any mix of the revision naming methods we looked at above:

$ git diff master^ master
diff --git a/file3 b/file3
new file mode 100644
index 0000000..faf2282
--- /dev/null
+++ b/file3
@@ -0,0 +1,3 @@

When you compare two branches, it shows you all of the changes between two branches. Frequently, you only want to see the diffs for a single file. You can restrict the output to a file by listing the file after a -- (two minuses) option:

$ git diff HEAD~3 HEAD
diff --git a/file1 b/file1
index e212970..04dbd7b 100644
--- a/file1
+++ b/file1
@@ -1 +1,2 @@
+editing file1
diff --git a/file2 b/file2
index 89361a0..91c5d97 100644
--- a/file2
+++ b/file2
@@ -1,2 +1,3 @@
+editing file2
diff --git a/file3 b/file3
index faf2282..c5dd702 100644
--- a/file3
+++ b/file3
@@ -1,3 +1,4 @@
+I'm editing file3 now
$ git diff HEAD~3 HEAD -- file3
diff --git a/file3 b/file3
index faf2282..c5dd702 100644
--- a/file3
+++ b/file3
@@ -1,3 +1,4 @@
+I'm editing file3 now

There are many, many options for git diff, and I won’t go into them all, but I do want to explore another use case, which I use frequently, showing the files that were changed in a commit.

In your current repo, the most recent commit on master added a line of text to file1. You can see that by comparing HEAD with HEAD^:

$ git diff HEAD^ HEAD
diff --git a/file1 b/file1
index e212970..04dbd7b 100644
--- a/file1
+++ b/file1
@@ -1 +1,2 @@
+editing file1

That’s fine for this small example, but frequently the diffs for a commit can be several pages long, and it can get quite difficult to pull out the filenames. Of course, Git has an option to help with that:

$ git diff HEAD^ HEAD --name-only

The --name-only option will show you the list of filename that were changed between two commits, but not what changed in those files.

As I said above, there are many options and use cases covered by the git diff command, and you’ve just scratched the surface here. Once you have the commands listed above figured out, I encourage you to look at git diff --help and see what other tricks you can find. I definitely learned new things preparing this tutorial!

git difftool

Git has a mechanism to use a visual diff tool to show diffs instead of just using the command line format we’ve seen thus far. All of the options and features you looked at with git diff still work here, but it will show the diffs in a separate window, which many people, myself included, find easier to read. For this example, I’m going to use meld as the diff tool because it’s available on Windows, Mac, and Linux.

Difftool is something that is much easier to use if you set it up properly. Git has a set of config options that control the defaults for difftool. You can set these from the shell using the git config command:

$ git config --global diff.tool meld
$ git config --global difftool.prompt false

The prompt option is one I find important. If you do not specify this, Git will prompt you before it launches the external build tool every time it starts. This can be quite annoying as it does it for every file in a diff, one at a time:

$ git difftool HEAD^ HEAD
Viewing (1/1): 'python-git-intro/'
Launch 'meld' [Y/n]: y

Setting prompt to false forces Git to launch the tool without asking, speeding up your process and making you that much better!

In the diff discussion above, you covered most of the features of difftool, but I wanted to add one thing I learned while researching for this article. Do you remember above when you were looking at the git stash show command? I mentioned that there was a way to see what is in a given stash visually, and difftool is that way. All of the syntax we learned for addressing stashes works with difftool:

$ git difftool stash@{1}

As with all stash subcommands, if you just want to see the latest stash, you can use the stash shortcut:

$ git difftool stash

Many IDEs and editors have tools that can help with viewing diffs. There is a list of editor-specific tutorials at the end of the Introduction to Git tutorial.

Changing History

One feature of Git that frightens some people is that it has the ability to change commits. While I can understand their concern, this is part of the tool, and, like any powerful tool, you can cause trouble if you use it unwisely.

We’ll talk about several ways to modify commits, but before we do, let’s discuss when this is appropriate. In previous sections you saw the difference between your local repo and a remote repo. Commits that you have created but have not pushed are in your local repo only. Commits that other developers have pushed but you have not pulled are in the remote repo only. Doing a push or a pull will get these commits into both repos.

The only time you should be thinking about modifying a commit is if it exists on your local repo and not the remote. If you modify a commit that has already been pushed from the remote, you are very likely to have a difficult time pushing or pulling from that remote, and your coworkers will be unhappy with you if you succeed.

That caveat aside, let’s talk about how you can modify commits and change history!

git commit --amend

What do you do if you just made a commit but then realize that flake8 has an error when you run it? Or you spot a typo in the commit message you just entered? Git will allow you to “amend” a commit:

$ git commit -m "I am bad at spilling"
[master 63f74b7] I am bad at spilling
 1 file changed, 4 insertions(+)
$ git commit --amend -m "I am bad at spelling"
[master 951bf2f] I am bad at spelling
 Date: Tue May 22 20:41:27 2018 -0600
 1 file changed, 4 insertions(+)

Now if you look at the log after the amend, you’ll see that there was only one commit, and it has the correct message:

$ git log
commit 951bf2f45957079f305e8a039dea1771e14b503c
Author: Jim Anderson <>
Date:   Tue May 22 20:41:27 2018 -0600

    I am bad at spelling

commit c789957055bd81dd57c09f5329c448112c1398d8
Author: Jim Anderson <>
Date:   Tue May 22 20:39:17 2018 -0600

    new message
[rest of log deleted]

If you had modified and added files before the amend, those would have been included in the single commit as well. You can see that this is a handy tool for fixing mistakes. I’ll warn you again that doing a commit --amend modifies the commit. If the original commit was pushed to a remote repo, someone else may already have based changes on it. That would be a mess, so only use this for commits that are local-only.

git rebase

A rebase operation is similar to a merge, but it can produce a much cleaner history. When you rebase, Git will find the common ancestor between your current branch and the specified branch. It will then take all of the changes after that common ancestor from your branch and “replay” them on top of the other branch. The result will look like you did all of your changes after the other branch.

This can be a little hard to visualize, so let’s look at some actual commits. For this exercise, I’m going to use the --oneline option on the git log command to cut down on the clutter. Let’s start with a feature branch you’ve been working on called my_feature_branch. Here’s the state of that branch:

 $ git log --oneline
143ae7f second feature commit
aef68dc first feature commit
2512d27 Common Ancestor Commit

You can see that the --oneline option, as you might expect, shows just the SHA and the commit message for each commit. Your branch has two commits after the one labeled 2512d27 Common Ancestor Commit.

You need a second branch if you’re going to do a rebase and master seems like a good choice. Here’s the current state of the master branch:

$ git log --oneline master
23a558c third master commit
5ec06af second master commit
190d6af first master commit
2512d27 Common Ancestor Commit

There are three commits on master after 2512d27 Common Ancestor Commit. While you still have my_feature_branch checked out, you can do a rebase to put the two feature commits after the three commits on master:

$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: first feature commit
Applying: second feature commit
$ git log --oneline
cf16517 second feature commit
69f61e9 first feature commit
23a558c third master commit
5ec06af second master commit
190d6af first master commit
2512d27 Common Ancestor Commit

There are two things to notice in this log listing:

1) As advertised, the two feature commits are after the three master commits.

2) The SHAs of those two feature commits have changed.

The SHAs are different because the repo is slightly different. The commits represent the same changes to the files, but since they were added on top of the changes already in master, the state of the repo is different, so they have different SHAs.

If you had done a merge instead of a rebase, there would have been a new commit with the message Merge branch 'master' into my_feature_branch, and the SHAs of the two feature commits would be unchanged. Doing a rebase avoids the extra merge commit and makes your revision history cleaner.

git pull -r

Using a rebase can be a handy tool when you’re working on a branch with a different developer, too. If there are changes on the remote, and you have local commits to the same branch, you can use the -r option on the git pull command. Where a normal git pull does a merge to the remote branch, git pull -r will rebase your commits on top of the changes that were on the remote.

git rebase -i

The rebase command has another method of operation. There is a -i flag you can add to the rebase command that will put it into interactive mode. While this seems confusing at first, it is an amazingly powerful feature that lets you have full control over the list of commits before you push them to a remote. Please remember the warning about not changing the history of commits that have been pushed.

These examples show a basic interactive rebase, but be aware that there are more options and more use cases. The git rebase --help command will give you the list and actually does a good job of explaining them.

For this example, you’re going to imagine you’ve been working on your Python library, committing several times to your local repo as you implement a solution, test it, discover a problem and fix it. At the end of this process you have a chain of commits on you local repo that all are part of the new feature. Once you’ve finished the work, you look at your git log:

$ git log --oneline
8bb7af8 implemented feedback from code review
504d520 added unit test to cover new bug
56d1c23 more flake8 clean up
d9b1f9e restructuring to clean up
08dc922 another bug fix
7f82500 pylint cleanup
a113f67 found a bug fixing
3b8a6f2 First attempt at solution
af21a53 [older stuff here]

There are several commits here that don’t add value to other developers or even to you in the future. You can use rebase -i to create a “squash commit” and put all of these into a single point in history.

To start the process, you run git rebase -i af21a53, which will bring up an editor with a list of commits and some instructions:

pick 3b8a6f2 First attempt at solution
pick a113f67 found a bug fixing
pick 7f82500 pylint cleanup
pick 08dc922 another bug fix
pick d9b1f9e restructuring to clean up
pick 56d1c23 more flake8 clean up
pick 504d520 added unit test to cover new bug
pick 8bb7af8 implemented feedback from code review

# Rebase af21a53..8bb7af8 onto af21a53 (8 command(s))
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
# These lines can be re-ordered; they are executed from top to bottom.
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
# Note that empty commits are commented out

You’ll notice that the commits are listed in reverse order, oldest first. This is the order in which Git will replay the commits on top of af21a53. If you just save the file at this point, nothing will change. This is also true if you delete all the text and save the file.

Also, there are several lines starting with a # reminding you how to edit this file. These comments can be removed but do not need to be.

But you want to squash all of these commits into one so that “future you” will know that this is the commit that completely added the feature. To do that, you can edit the file to look like this:

pick 3b8a6f2 First attempt at solution
squash a113f67 found a bug fixing
s 7f82500 pylint cleanup
s 08dc922 another bug fix
s d9b1f9e restructuring to clean up
s 56d1c23 more flake8 clean up
s 504d520 added unit test to cover new bug
s 8bb7af8 implemented feedback from code review

You can use either the full word for the commands, or, as you did after the first two lines, use the single character version. The example above selected to “pick” the oldest commit and the “squash” each of the subsequent commits into that one. If you save and exit the editor, Git will proceed to put all of those commits into one and then will bring up the editor again, listing all of the commit messages for the squashed commit:

# This is a combination of 8 commits.
# The first commit's message is:
Implemented feature ABC

# This is the 2nd commit message:

found a bug fixing

# This is the 3rd commit message:

pylint cleanup

# This is the 4th commit message:

another bug fix

[the rest trimmed for brevity]

By default a squash commit will have a long commit message with all of the messages from each commit. In your case it’s better to reword the first message and delete the rest. Doing that and saving the file will finish the process, and your log will now have only a single commit for this feature:

$ git log --oneline
9a325ad Implemented feature ABC
af21a53 [older stuff here]

Cool! You just hid any evidence that you had to do more than one commit to solve this issue. Good work! Be warned that deciding when to do a squash merge is frequently more difficult than the actual process. There’s a great article that does a nice job of laying out the complexities.

As you probably guessed, git rebase -i will allow you to do far more complex operations. Let’s look at one more example.

In the course of a week, you’ve worked on three different issues, committing changes at various times for each. There’s also a commit in there that you regret and would like to pretend never happened. Here’s your starting log:

$ git log --oneline
2f0a106 feature 3 commit 3
f0e14d2 feature 2 commit 3
b2eec2c feature 1 commit 3
d6afbee really rotten, very bad commit
6219ba3 feature 3 commit 2
70e07b8 feature 2 commit 2
c08bf37 feature 1 commit 2
c9747ae feature 3 commit 1
fdf23fc feature 2 commit 1
0f05458 feature 1 commit 1
3ca2262 older stuff here

Your mission is to get this into three clean commits and remove that one bad one. You can follow the same process, git rebase -i 3ca2262, and Git presents you with the command file:

pick 0f05458 feature 1 commit 1
pick fdf23fc feature 2 commit 1
pick c9747ae feature 3 commit 1
pick c08bf37 feature 1 commit 2
pick 70e07b8 feature 2 commit 2
pick 6219ba3 feature 3 commit 2
pick d6afbee really rotten, very bad commit
pick b2eec2c feature 1 commit 3
pick f0e14d2 feature 2 commit 3
pick 2f0a106 feature 3 commit 3

Interactive rebase allows your to not only specify what to do with each commit but also lets you rearrange them. So, to get to your three commits, you edit the file to look like this:

pick 0f05458 feature 1 commit 1
s c08bf37 feature 1 commit 2
s b2eec2c feature 1 commit 3
pick fdf23fc feature 2 commit 1
s 70e07b8 feature 2 commit 2
s f0e14d2 feature 2 commit 3
pick c9747ae feature 3 commit 1
s 6219ba3 feature 3 commit 2
s 2f0a106 feature 3 commit 3
# pick d6afbee really rotten, very bad commit

The commits for each feature are grouped together with only one of them being “picked” and the rest “squashed.” Commenting out the bad commit will remove it, but you could have just as easily deleted that line from the file to the same effect.

When you save that file, you’ll get a separate editor session to create the commit message for each of the three squashed commits. If you call them feature 1, feature 2, and feature 3, your log will now have only those three commits, one for each feature:

$ git log --oneline
f700f1f feature 3
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here

Just like any rebase or merge, you might run into conflicts in this process, which you will need to resolve by editing the file, getting the changes correct, git add-ing the file, and running git rebase --continue.

I’ll end this section by pointing out a few things about rebase:

1) Creating squash commits is a “nice to have” feature, but you can still work successfully with Git without using it.

2) Merge conflicts on large interactive rebases can be confusing. None of the individual steps are difficult, but there can be a lot of them

3) We’ve just scratched the surface on what you can do with git rebase -i. There are more features here than most people will ever discover.

git revert vs. git reset: Cleaning Up

Unsurprisingly, Git provides you several methods for cleaning up when you’ve made a mess. These techniques depend on what state your repo is in and whether or not the mess is local to your repo or has been pushed to a remote.

Let’s start by looking at the easy case. You’ve made a commit that you don’t want, and it hasn’t been pushed to remote. Start by creating that commit so you know what you’re looking at:

$ ls >> file_i_do_not_want
$ git add file_i_do_not_want
$ git commit -m "bad commit"
[master baebe14] bad commit
 2 files changed, 31 insertions(+)
 create mode 100644 file_i_do_not_want
$ git log --oneline
baebe14 bad commit
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here

The example above created a new file, file_i_do_not_want, and committed it to the local repo. It has not been pushed to the remote repo yet. The rest of the examples in this section will use this as a starting point.

To manage commits that are on the local repo only, you can use the git reset command. There are two options to explore: --soft and --hard.

The git reset --soft <SHA> command tells Git to move HEAD back to the specified SHA. It doesn’t change the local file system, and it doesn’t change the index. I’ll admit when I read that description, it didn’t mean much to me, but looking at the example definitely helps:

$ git reset --soft HEAD^
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

   new file:   file_i_do_not_want

$ git log --oneline
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here

In the example, we reset HEAD to HEAD^. Remember that ^ tells Git to step back one commit. The --soft option told Git to not change the index or the local file system, so the file_i_do_not_want is still in the index in the “Changes to be committed:” state. The git log command shows that the bad commit was removed from the history, though.

That’s what the --soft option does. Now let’s look at the --hard option. Let’s go back to your original state so that bad commit is in the repo again and try --hard:

$ git log --oneline
2e9d704 bad commit
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here
$ git reset --hard HEAD^
HEAD is now at 443272f feature 2
$ git status
On branch master
nothing to commit, working directory clean
$ git log --oneline
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here

There are several things to notice here. First the reset command actually gives you feedback on the --hard option where it does not on the --soft. I’m not sure of why this is, quite honestly. Also, when we do the git status and git log afterwards, you see that not only is the bad commit gone, but the changes that were in that commit have also been wiped out. The --hard option resets you completely back to the SHA you specified.

Now, if you remember the last section about changing history in Git, it’s dawned on you that doing a reset to a branch you’ve already pushed to a remote might be a bad idea. It changes the history and that can really mess up your co-workers.

Git, of course, has a solution for that. The git revert command allows you to easily remove the changes from a given commit but does not change history. It does this by doing the inverse of the commit you specify. If you added a line to a file, git revert will remove that line from the file. It does this and automatically creates a new “revert commit” for you.

Once again, reset the repo back to the point that bad commit is the most recent commit. First confirm what the changes are in bad commit:

$ git diff HEAD^
diff --git a/file_i_do_not_want b/file_i_do_not_want
new file mode 100644
index 0000000..6fe5391
--- /dev/null
+++ b/file_i_do_not_want
@@ -0,0 +1,6 @@

You can see that we’ve simply added the new file_i_do_not_want to the repo. The lines below @@ -0,0 +1,6 @@ are the contents of that new file. Now, assuming that this time you’ve pushed that bad commit to master and you don’t want your co-workers to hate you, use revert to fix that mistake:

$ git revert HEAD
[master 8a53ee4] Revert "bad commit"
 1 file changed, 6 deletions(-)
 delete mode 100644 file_i_do_not_want

When you run that command, Git will pop up an editor window allowing you to modify the commit message for the revert commit:

Revert "bad commit"

This reverts commit 1fec3f78f7aea20bf99c124e5b75f8cec319de10.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# On branch master
# Changes to be committed:
#  deleted:    file_i_do_not_want

Unlike commit, git revert does not have an option for specifying the commit message on the command line. You can use -n to skip the message editing step and tell Git to simply use the default message.

After we revert the bad commit, our log shows a new commit with that message:

$ git log --oneline
8a53ee4 Revert "bad commit"
1fec3f7 bad commit
443272f feature 2
0ff80ca feature 1
3ca2262 older stuff here

The “bad commit” is still there. It needs to be there because you don’t want to change history in this case. There’s a new commit, however, which “undoes” the changes that are in that commit.

git clean

There’s another “clean up” command that I find useful, but I want to present it with a caution.

Caution: Using git clean can wipe out files that are not committed to the repo that you will not be able to recover.

git clean does what you guess it would: it cleans up your local working directory. I’ve found this quite useful when something large goes wrong and I end up with several files on my file system that I do not want.

In its simple form, git clean simply removes files that are not “under version control.” This means that files that show up in the Untracked files section when you look at git status will be removed from the working tree. There is not a way to recover if you do this accidentally, as those files were not in version control.

That’s handy, but what if you want to remove all of the pyc files created with your Python modules? Those are in your .gitignore file, so they don’t show up as Untracked and they don’t get deleted by git clean.

The -x option tells git clean to remove untracked and ignored files, so git clean -x will take care of that problem. Almost.

Git is a little conservative with the clean command and won’t remove untracked directories unless you tell it to do so. Python 3 likes to create __pycache__directories, and it’d be nice to clean these up, too. To solve this, you would add the -d option. git clean -xd will clean up all of the untracked and ignored files and directories.

Now, if you’ve raced ahead and tested this out, you’ve noticed that it doesn’t actually work. Remember that warning I gave at the beginning of this section? Git tries to be cautious when it comes to deleting files that you can’t recover. So, if you try the above command, you see an error message:

$ git clean -xd
fatal: clean.requireForce defaults to true and neither -i, -n, nor -f given; refusing to clean

While it’s possible to change your git config files to not require it, most people I’ve talked to simply get used to using the -f option along with the others:

$ git clean -xfd
Removing file_to_delete

Again, be warned that git clean -xfd will remove files that you will not be able to get back, so please use this with caution!

Resolving Merge Conflicts

When you’re new to Git, merge conflicts seem like a scary thing, but with a little practice and a few tricks, they can become much easier to deal with.

Let’s start with some of the tricks that can make this easier. The first one changes the format of how conflicts are shown.

diff3 Format

We’ll walk through a simple example to see what Git does by default and what options we have to make it easier. To do this, create a new file,, that looks like this:

def display():
    print("Welcome to my project!")

Add and commit this file to your branch master, and this will be your baseline commit. You’ll create branches that modify this file in different ways, and then you’ll see how to resolve the merge conflict.

You now need to create separate branches that will have conflicting changes. You’ve seen how this is done before, so I won’t describe it in detail:

$ git checkout -b mergebranch
Switched to a new branch 'mergebranch'
$ vi # edit file to change 'project' to 'program'
$ git add
$ git commit -m "change project to program"
[mergebranch a775c38] change project to program
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git status
On branch mergebranch
nothing to commit, working directory clean
$ git checkout master
Switched to branch 'master'
$ vi # edit file to add 'very cool' before project
$ git add
$ git commit -m "added description of project"
[master ab41ed2] added description of project
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git show-branch master mergebranch
* [master] added description of project
 ! [mergebranch] change project to program
*  [master] added description of project
 + [mergebranch] change project to program
*+ [master^] baseline for merging

At this point you have conflicting changes on mergebranch and master. Using the show-branch command we learned in our Intro tutorial, you can see this visually on the command line:

$ git show-branch master mergebranch
* [master] added description of project
 ! [mergebranch] change project to program
*  [master] added description of project
 + [mergebranch] change project to program
*+ [master^] baseline for merging

You’re on branch master, so let’s try to merge in mergebranch. Since you’ve made the changes with the intent of creating a merge conflict, lets hope that happens:

$ git merge mergebranch
CONFLICT (content): Merge conflict in
Automatic merge failed; fix conflicts and then commit the result.

As you expected, there’s a merge conflict. If you look at status, there’s a good deal of useful information there. Not only does it say that you’re in the middle of a merge, You have unmerged paths, but it also shows you which files are modified,

$ git status
On branch master
You have unmerged paths.
  (fix conflicts and run "git commit")

Unmerged paths:
  (use "git add <file>..." to mark resolution)

   both modified:

no changes added to commit (use "git add" and/or "git commit -a")

You have done all that work to get to the point of having a merge conflict. Now you can start learning about how to resolve it! For this first part, you’ll be working with the command line tools and your editor. After that, you’ll get fancy and look at using visual diff tools to solve the problem.

When you open in your editor, you can see what Git produced:

def display():
<<<<<<< HEAD
    print("Welcome to my very cool project!")
    print("Welcome to my program!")
>>>>>>> mergebranch

Git uses diff syntax from Linux to display the conflict. The top portion, between <<<<<<< HEAD and =======, are from HEAD, which in your case is master. The bottom portion, between ======= and >>>>>>> mergebranch are from, you guessed it, mergebranch.

Now, in this very simple example, it’s pretty easy to remember which changes came from where and how we should merge this, but there’s a setting you can enable which will make this easier.

The diff3 setting modifies the output of merge conflicts to more closely approximate a three-way merge, meaning in this case that it will show you what’s in master, followed by what it looked like in the common ancestor, followed by what it looks like in mergebranch:

def display():
<<<<<<< HEAD
    print("Welcome to my very cool project!")
||||||| merged common ancestors
    print("Welcome to my project!")
    print("Welcome to my program!")
>>>>>>> mergebranch

Now that you can see the starting point, “Welcome to my project!”, you can see exactly what change was made on master and what change was made on mergebranch. This might not seem like a big deal on such a simple example, but it can make a huge difference on large conflicts, especially merges where someone else made some of the changes.

You can set this option in Git globally by issuing the following command:

$ git config --global merge.conflictstyle diff3

Okay, so you understand how to see the conflict. Let’s go through how to fix it. Start by editing the file, removing all of the markers Git added, and making the one conflicting line correct:

def display():
    print("Welcome to my very cool program!")

You then add your modified file to the index and commit your merge. This will finish the merge process and create the new node:

$ git add
$ git commit
[master a56a01e] Merge branch 'mergebranch'
$ git log --oneline
a56a01e Merge branch 'mergebranch'
ab41ed2 added description of project
a775c38 change project to program
f29b775 baseline for merging

Merge conflicts can happen while you’re cherry-picking, too. The process when you are cherry-picking is slightly different. Instead of using the git commit command, you use the git cherry-pick --continue command. Don’t worry, Git will tell you in the status message which command you need to use. You can always go back and check that to be sure.

git mergetool

Similar to git difftool, Git will allow you to configure a visual diff tool to deal with three-way merges. It knows about several different tools on different operating systems. You can see the list of tools it knows about on your system by using the command below. On my Linux machine, it shows the following:

$ git mergetool --tool-help
'git mergetool --tool=<tool>' may be set to one of the following:

The following tools are valid, but not currently available:

Some of the tools listed above only work in a windowed
environment. If run in a terminal-only session, they will fail.

Also similar to difftool, you can configure the mergetool options globally to make it easier to use:

$ git config --global merge.tool meld
$ git config --global mergetool.prompt false

The final option, mergetool.prompt, tells Git not to prompt you each time it opens a window. This might not sound annoying, but when your merge involves several files it will prompt you between each of them.


You’ve covered a lot of ground in these tutorials, but there is so much more to Git. If you’d like to take a deeper dive into Git, I can recommend these resources:

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

August 13, 2018 02:00 PM UTC

Test and Code

45: David Heinemeier Hansson - Software Development and Testing, TDD, and exploratory QA

David Heinemeier Hansson is the creator of Ruby on Rails, founder & CTO at Basecamp (formerly 37signals). He's a best selling author, public speaker, and even a Le Mans class winning racing driver.

All of that, of course, is awesome. But that's not why I asked him on the show. In 2014, during a RailsConf keynote, he started a discussion about damage caused by TDD. This was followed by a few blog posts, and then a series of recorded hangouts with Martin Fowler and Kent Beck. This is what I wanted to talk with David about; this unconventional yet practical and intuitive view of how testing and development work together.

It's a great discussion. I think you'll get a lot out of it.

Special Guest: David Heinemeier Hansson.

Sponsored By:

Support Test and Code


<p>David Heinemeier Hansson is the creator of Ruby on Rails, founder &amp; CTO at Basecamp (formerly 37signals). He&#39;s a best selling author, public speaker, and even a Le Mans class winning racing driver. </p> <p>All of that, of course, is awesome. But that&#39;s not why I asked him on the show. In 2014, during a RailsConf keynote, he started a discussion about damage caused by TDD. This was followed by a few blog posts, and then a series of recorded hangouts with Martin Fowler and Kent Beck. This is what I wanted to talk with David about; this unconventional yet practical and intuitive view of how testing and development work together. </p> <p>It&#39;s a great discussion. I think you&#39;ll get a lot out of it.</p><p>Special Guest: David Heinemeier Hansson.</p><p>Sponsored By:</p><ul><li><a rel="nofollow" href="">PyCharm</a>: <a rel="nofollow" href="">If you value your time, you owe it to yourself to try PyCharm. The team has set up a link just for Test &amp; Code listeners. If you use the link [](, you can try PyCharm Professional for free for 3 months. This offer is only good until Sept 1, so don&#39;t forget. Plus using the link (I&#39;ll also have it in the show notes) lets PyCharm know that supporting Test &amp; Code is a good thing.</a></li></ul><p><a rel="payment" href="">Support Test and Code</a></p><p>Links:</p><ul><li><a title="Is TDD dead? - Part 1" rel="nofollow" href="">Is TDD dead? - Part 1</a></li><li><a title="My reaction to &quot;Is TDD Dead?&quot;, including links to the other parts of the video series" rel="nofollow" href="">My reaction to &quot;Is TDD Dead?&quot;, including links to the other parts of the video series</a></li><li><a title="RailsConf 2014 - Keynote: Writing Software by David Heinemeier Hansson - YouTube" rel="nofollow" href="">RailsConf 2014 - Keynote: Writing Software by David Heinemeier Hansson - YouTube</a></li><li><a title="TDD is dead. Long live testing. (DHH)" rel="nofollow" href="">TDD is dead. Long live testing. (DHH)</a></li><li><a title="Test-induced design damage (DHH)" rel="nofollow" href="">Test-induced design damage (DHH)</a></li><li><a title="Slow database test fallacy (DHH)" rel="nofollow" href="">Slow database test fallacy (DHH)</a></li></ul>

August 13, 2018 01:45 PM UTC

Fabio Zadrozny

Profiling pytest startup

I'm a fan of pytest (, yet, it seems that the startup time for running tests locally in the app I'm working on is slowly ramping up, so, I decided to do a profile to see if there was anything I could do to improve that.

The first thing I did was creating a simple test and launching it from the PyDev ( profile view -- it enables any launch done in PyDev to show its performance profile on PyVmMonitor (

Note that this is an integration test that is starting up a big application, so, the total time just to startup all the fixtures which make the application live and shut down the fixtures is 15 seconds (quite a lot IMHO).

The first thing I noticed looking the profile is that 14% of that time seems to be creating a session temp dir:

After investigating a bit more it seems that there is a problem in the way the fixture used make_numbered_dir (it was passing a unicode when it should be a str on Python 2) and make_numbered_dir had an issue where big paths were not removed.

So, pytest always visited my old files every time I launched any test and that accounted for 1-2 seconds (I reported this particular error in:

Ok, down from 15 to 13 seconds after manually removing old files with big paths and using the proper API with str on Py2.

Now, doing a new profile with that change has shown another pytest-related slowdown doing rewrites of test cases. 

This is because of a feature of pytest where it'll rewrite test files to provide a prettier stack trace when there's some assertion failure.

So, I passed --assert=plain to pytest and got 3 more seconds (from 13 down to 10) -- it seems all imports are a bit faster with the import rewrite disabled, so, I got an overall improvement there, not only in that specific part of the code (probably not nice on CI where I want to have more info, but seems like a nice plus locally, where I run many tests manually as I think the saved time for those runs will definitely be worth it even with less info when some assertion fails).

Now, with that disabled the next culprit seems to be getting its plugins to load:

But alas, it uses setuptools and I know from previous experience that it's very hard to improve that (it is very greedy in the way it handles loading metadata, so, stay away unless you're ok in wasting a lot of time on your imports) and the remainder of the time seems to be spread out importing many modules -- the app already tries to load things as lazy as possible... I think I'll be able to improve on that to delay some imports, but Python libraries are really hard to fix as everyone imports everything in the top of the module.

Well, I guess going from 15 s to 10 s with just a few changes is already an improvement in my case for an integrated tests which starts up the whole app (although it could certainly be better...) and I think I'll still be able to trim some of that time doing some more imports lazily -- although that's no longer really pytest-related, so, that's it for this post ;)

August 13, 2018 12:19 PM UTC

Mike Driscoll

Only 2 Days Left for Jupyter Notebook 101

There’s only two days left to join the Kickstarter for my latest book, Jupyter Notebook 101. It’s also one of the best times to help out as you get to help shape the book right now. I always take my reader’s feedback when writing my books into consideration and have added lots of extra information in my books because of their requests.

August 13, 2018 11:28 AM UTC

Red Hat Developers

How to install Python 3 on RHEL

This article shows how to install Python 3, pip, venv, virtualenv, and pipenv on Red Hat Enterprise Linux 7. After following the steps in this article, you should be in a good position to follow many Python guides and tutorials using RHEL.

Using Python virtual environments is a best practice to isolate project-specific dependencies and create reproducible environments. Other tips and FAQs for working with Python and software collections on RHEL 7 are also covered.

There are a number of different ways to get Python 3 installed on RHEL. This article uses Red Hat Software Collections because these give you a current Python installation that is built and supported by Red Hat. During development, support might not seem that important to you. However, support is important to those who have to deploy and operate the applications you write. To understand why this is important, consider what happens when your application is in production and a critical security vulnerability in a core library (for example SSL/TLS) is discovered. This type of scenario is why many enterprises use Red Hat.

Python 3.6 is used in this article. It was the most recent, stable release when this was written. However, you should be able to use these instructions for any of the versions of Python in Red Hat Software Collections including 2.7, 3.4, 3.5, and future collections such as 3.7.

In this article, the following topics are discussed:

  1. TL;DR (summary of steps)
  2. Why use Red Hat Software Collections
  3. Full installation steps with explanations
  4. How to use Python 3 through Red Hat Software Collections
  5. Working with Python virtual environments
    1. Should I use venv or virtualenv or something else?
    2. Using venv
    3. Using virtualenv
    4. Managing application dependencies using pipenv
  6. General tips for working with Python
  7. Tips for working with software collections
    1. Enable the Python collection *before* the virtual environment
    2. How to permanently enable a software collection
    3. How to use Python 3 from RHSCL in the #! (shebang) line of a script
    4. How to tell which software collections are enabled
    5. How to see which software collections are installed
  8. Troubleshooting
  9. More information: Developing in Python on Red Hat Platforms


Here are the basic steps so you can just get going. See below for explanations and more details.

How to install Python 3 on RHEL

  1. Become root.
  2. Enable the rhscl and optional software repos using subscription-manager.
  3. Use yum to install @development. This makes sure you’ve got GCC, make, git, etc. so you can build any modules that contain compiled code.
  4. Use yum to install rh-python36.
  5. Optional: Use yum to install python-tools, numpy, scipy, and six from RHSCL RPMs.

$ su -
# subscription-manager repos --enable rhel-7-server-optional-rpms \
  --enable rhel-server-rhscl-7-rpms
# yum -y install @development
# yum -y install rh-python36

# yum -y install rh-python36-numpy \
 rh-python36-scipy \ 
 rh-python36-python-tools \

# exit

Using Python 3 on RHEL

  1. Under your normal user ID, run scl enable to add python 3 to your path(s).
  2. Create a Python virtual environment and activate it. (Note: your prompt has changed to show the virtual environment.)
  3. Install whatever additional modules you need with pip in an isolated environment without being root.

$ scl enable rh-python36 bash
$ python3 -V
Python 3.6.3

$ python -V  # python now also points to Python3 
Python 3.6.3

$ mkdir ~/pydev
$ cd ~/pydev

$ python3 -m venv py36-venv
$ source py36-env/bin/activate

(py36-venv) $ python3 -m pip install ...some modules...

If you start a new session, here are the steps for using your virtual environment:

$ scl enable rh-python36 bash

$ cd ~/pydev
$ source py36-env/bin/activate


Why use Red Hat Software Collections

The benefit of using Red Hat Software Collections is that you can have multiple versions of Python installed at the same time along with the base Python 2.7 that shipped with RHEL 7. You can easily switch between versions with scl enable.

Note: The latest stable packages for .Net Core, Go, Rust, PHP 7, Ruby 2.5, GCC, Clang/LLVM, Nginx, MongoDB, MariaDB, PostgreSQL, and more are all yum– installable as software collections. So you should take the time to get comfortable with software collections.

Using software collections requires an extra step because you have to enable the collection you want to use. Enabling just adds the necessary paths (PATH, MANPATH, LD_LIBRARY_PATH) to your environment. Once you get the hang of it, software collections are fairly easy to use. It really helps to understand the way that environment-variable changes work in Linux/UNIX. Changes can be made only to the current process. When a child process is created, it inherits the environment of the parent. Any environment changes made in the parent after the child has been created will have no effect on the child. Therefore, the changes made by scl enable will affect only the current terminal session or anything started from it. This article also shows how you can permanently enable a software collection for your user account.


Installation Prerequisites

Install development tools including GCC, make, and git

If you install modules that depend on compiled code you’ll need the tools to compile them. If you haven’t already installed development tools run the following command:

$ su -
# yum install @development

Enable repos with additional developer tools

While the default/base RHEL software repos have many development tools, these are the older versions that are shipped with the OS and are supported for the full 10-year life of the OS. Packages that are updated more frequently and have a different support lifecycle are distributed in other repos that aren’t enabled by default.

Red Hat Software Collections are in the rhscl repo. RHSCL packages have some dependencies on packages in the optional-rpms repo, so you need to enable both.

To enable the additional repos, run the following commands as root:

$ su -
# subscription-manager repos \
 --enable rhel-7-server-optional-rpms \
 --enable rhel-server-rhscl-7-rpms


To see which repos are available for your current subscription, run the following command:

# subscription-manager repos --list

To see which repos are enabled, use --list-enabled:

# subscription-manager repos --list-enabled

Install Python 3

You can now install Python 3.6 (or other versions in RHSCL) with yum:

# yum install rh-python36


Install additional packages

Optionally, you may want to install the following RPM packages that are part of the software collection:


# yum install rh-python36-numpy \
 rh-python36-scipy \ 
 rh-python36-python-tools \


Note: By default system modules will not be used with Python virtual environments. Use the option --system-site-packages when creating the virtual environment to include system modules.


How to use Python 3 (scl enable)

Python 3 is now installed. You no longer need to run under the root user ID. The rest of the commands should be executed using your normal user account.

As previously mentioned, software collections are installed under /opt/rh and aren’t automatically added to your PATH, MANPATH, and LD_LIBRARY_PATH. The command scl enable will make the necessary changes and run a command. Because of the way environment variables work in Linux (and UNIX), the changes will take effect only for the command run by scl enable. You can use bash as the command to start an interactive session. This is one of the most common ways (but not the only way) of working with software collections.

$ scl enable rh-python36 bash
$ python3 -V
Python 3.6.3
$ python -V # python now points to Python 3
Python 3.6.3

$ which python

Note: Enabling the Python collection makes the python in your path, with no version number, point to Python 3. /usr/bin/python will still be Python 2. You can still run Python 2 by typing python2, python2.7, or /usr/bin/python. It is recommended that you use a version number to avoid any ambiguity about what python means. This also applies to other Python commands in .../bin such as pip, pydoc, python-config, pyvenv, and virtualenv. For more information, see PEP 394.

NOTE: See How to permanently enable a software collection below to permanently put Python 3 in your path.


Create a Python virtual environment (best practice)

Using Python virtual environments is a best practice to isolate project-specific dependencies and create reproducible environments. In other words, it’s a way to avoid conflicting dependencies that lead to dependency hell. Using a virtual environment will let you use pip to install whatever modules you need for your project in an isolated directory under your normal user ID. You can easily have multiple projects with different dependencies. To work on a specific project, you activate the virtual environment, which adds the right directories to your path(s).

Using virtual environments along with pip list, pip freeze, and a requirements.txt file gives you a path to a reproducible environment to run your code it. Others that need to run your code can use the requirements.txt file you generate to create a matching environment.

By default, virtual environments will not use any system installed modules, or modules installed under your home directory. From an isolation perspective and for creating reproducible environments this is generally considered the correct behavior. However, you can change that by using the argument --system-site-packages.

Should I use venv or virtualenv or something else?

When you install Python 3 from Red Hat Software Collections, venv, virtualenv, and pip will be installed, so you are ready to install whatever modules you choose. “Installing Python Modules” in the current Python documentation says this:

So for all the recent versions of Python 3, venv is preferred.

If you work with Python 2.7, you’ll need to use virtualenv.

The commands to create the virtual environments differ only in the module name used. Once created, the command to activate the virtual environment is the same.

Note: for virtualenv, using python3.6 -m virtualenv is recommended instead of using the virtualenv command. See Avoid using Python wrapper scripts below for more information.

Create and activate a virtual environment with venv

If you haven’t already done so, enable the rh-python36 collection:

$ scl enable rh-python36 bash

Now create the virtual environment. To avoid any surprises, use an explicit version number for running Python:

$ python3.6 -m venv myproject1

Anytime you need to activate the virtual environment, run the following command.

$ source myproject1/bin/activate

Note: once you’ve activated a virtual environment, your prompt will change to remind you that you are working in a virtual environment. Example:

(myproject1) $ 

Note: When you log in again, or start a new session, you will need to activate the virtual environment using the source command again. Note: you should already have run scl enable before activating the virtual environment.

For more information, see Virtual Environments and Packages in the Python 3 tutorial at

Create and activate a virtual environment with virtualenv

If you haven’t already done so, enable the rh-python36 collection:

$ scl enable rh-python36 bash

Now create the virtual environment. To avoid any surprises, use an explicit version number for running Python:

$ python3.6 -m virtualenv myproject1

Anytime you need to activate the virtual environment, run the following command. Note: you should already have run scl enable before activating the virtual environment.

$ source myproject1/bin/activate

Note: once you’ve activated a virtual environment, your prompt will change to remind you that you are working in a virtual environment. Example:

(myproject1) $ 

Note: When you log in again, or start a new session, you will need to activate the virtual environment using the source command again. Note: you should already have run scl enable before activating the virtual environment.

For more information, see Installing packages using pip and virtualenv in the Python Packaging User Guide.

Managing application dependencies with pipenv

From the Python Packaging User Guide tutorial, Managing Application Dependencies:

“Pipenv is a dependency manager for Python projects. If you’re familiar with Node.js’ npm or Ruby’s bundler, it is similar in spirit to those tools. While pip alone is often sufficient for personal use, Pipenv is recommended for collaborative projects as it’s a higher-level tool that simplifies dependency management for common use cases.”

With pipenv you no longer need to use pip and virtualenv separately. pipenv isn’t currently part of the standard Python 3 library or Red Hat Software Colleciton. You can install it using pip. (Note: see the recommendation below about not running pip install as root.) Since pipenv uses virtualenv to manage environments, you should install pipenv without having any virtual environment activated. However, don’t forget to enable the Python 3 software collection first.

$ scl enable rh-python36 bash # if you haven't already done so
$ python3.6 -m pip install --user pipenv

Creating and using isolated environments with pipenv works a bit differently than venv or virtualenv. A virtual environment will automatically be created if no Pipfile exists in the current directory when you install the first package. However, it’s a good practice to explicitly create an environment with the specific version of Python you want to use.

$ scl enable rh-python36 bash # if you haven't already done so 
$ mkdir -p ~/pydev/myproject2
$ cd ~/pydev/myproject2
$ pipenv --python 3.6
$ pipenv install requests

To activate a Pipenv environment, cd into that directory and run pipenv shell.

$ scl enable rh-python36 bash # if you haven't already done so 
$ cd ~/pydev/myproject2
$ pipenv shell

Pipenv is similar to scl enable in that it doesn’t try to modify the current environment with source, instead it starts a new shell. To deactivate, exit the shell. You can also run a command in the pipenv environment by using pipenv run command.

For more information see:


General tips for working with Python

The python command: Avoid surprises by using a version number

To avoid surprises, don’t type python. Use an explicit version number in the command, such as python3.6 or python2.7.

At a minimum, always use python3 or python2. If you are reading this article, you’ve got more than one version of Python installed on your system. Depending on your path, you might get different versions. Activating and deactivating virtual environments, as well as enabling a software collection, changes your path, so it can be easy to be confused about what version you’ll get from typing python.

The same problem occurs with any of the Python utilities such as pip or pydoc. Using version numbers, for example, pip3.6, is recommended. At a minimum use the major version number: pip3. See the next section for a more robust alternative.

Use which to determine which Python version will be run

Use the which command to determine the full path that will be used when you type a command. This will help you understand which version of python is in your path first and will get run when you type python.


$ which python # before scl enable
$ scl enable rh-python36 bash

$ which python
$ source ~/pydev/myproject1/bin/activate
(myproject1) $ which python


Avoid Python wrapper scripts such as virtualenv: Use the module name

Some Python utilities are put in your path as a wrapper script in a .../bin directory. This is convenient because you can just type pip or virtualenv. Most Python utilities are actually just Python modules with wrapper scripts to start Python and run the code in the module.

The problem with wrapper scripts is the same ambiguity that happens when typing python. Which version of pip or virtualenv you will get when you type the command without a version number? For things to work correctly, there is the additional complication that the utility needs to match the version of Python you intend to be using. Some subtle (hard to diagnose) problems can occur if you wind up unintentionally mixing versions.

Note: There are several directories that wrapper scripts can reside in. Which version you get is dependent on your path, which changes when you enable software collections and/or activate virtual environments. Modules installed with pip --user put their wrapper scripts in ~/.local/bin, which can get obscured by activating the software collection or a virtual environment.

You can avoid the surprises from the path issues by running the module directly from a specific version of Python by using -m modulename. While this involves more typing, it is a much safer approach.


Do not run pip install as root (or with sudo)

Running pip install as root either directly or by using sudo is a bad idea and will cause you problems at some point. Some of the problems that you may encounter are:

Using virtual environments will allow you to isolate the modules you install for each project from the modules that are part of the Python installation from Red Hat. Using virtual environments is considered a best practice to create isolated environments that provide the dependencies needed for a specific purpose. You don’t need to use --user when running pip in a virtual environment since it will default to installing in the virtual environment, which you should have write access to.

If you aren’t using virtual environments, or need a module/tool to be available outside of a virtual environments, use pip --user to install modules under your home directory.

In case you think this is overly dire, see this xkcd comic. Don’t forget to hover so you see the alt text.


Use virtual environments instead of pip --user

Some guides recommend using pip --user. While this is preferred over running pip as root, using virtual environments is much better practice for properly isolating the modules you need for a given project or set of projects. pip --user installs use ~/.local, which can be obscured by enabling software collections and/or activating virtual environments. For modules that install wrapper scripts in ~/.local/bin, this can cause a mismatch between the wrapper script and the module.

The exception to this advice is modules and tools that you need to use outside of virtual environments. The primary example is pipenv. You should use pip install --user pipenv to install pipenv. That way, you’ll have pipenv in your path without any virtual environments.


Don’t use the system Python for your own projects

The Python version installed in /usr/bin/python and /usr/bin/python2 is part of the operating system. RHEL was tested with a specific Python release (2.7.5) that will be maintained for the full ten-year supported life of the OS. Many of the built-in administration tools are actually written in Python. Trying to change the version of Python in /usr/bin might actually break some of the OS functionality.

At some point, you might want to run your code on a different version of the OS. That OS will likely have a different version of Python installed as /usr/bin/python, /usr/bin/python2, or even /usr/bin/python3. The code you write may have dependencies on a specific version that can be best managed through virtual environments and/or software collections.

The one exception to the above is if you are writing system administration tools. In that case, you should use the Python in /usr/bin because it has the correct modules and libraries installed for the APIs in the OS. Note: If you are writing system administration or management tools in Python, you might want to take a look at Ansible. Ansible is written in Python, uses Jinja2 for templating, and provides higher-level abstractions for many system tasks.

Tip: If you need to work with Python 2.7, install the python27 software collection. Follow the installation steps above but use python27 instead of rh-python36. You can enable both collections at the same time, so you’ll have both the newer python2.7 and python3.6 in your path. Note: the collection you enable last is the one that will be first in your path, which determines the version you get when you type a command like python or pip without an explicit version number.

Don’t change or overwrite /usr/bin/python, /usr/bin/python2, or /usr/bin/python2.7

As mentioned above, the system Python is part of Red Hat Enterprise Linux 7 and is used by critical system utilities such as yum. (Yes, yum is written in Python.) So overwriting the system Python is likely to break your system—badly. If you try to compile Python from source, do not do a make install (as root) without using a different prefix or it will overwrite /usr/bin/python.


Software collection tips

Enable the Python collection *before* the virtual environment

You should always enable the Python software collection before using any of Python virtual environment utilities to create or activate an environment. In order for things to work correctly, you need to have your desired version of Python in your path because it will be needed by the Python virtual environment. A number of problems, some of which are subtle, come up if you try to enable/activate in the wrong order.

Example for venv:

$ scl enable rh-python36 bash
$ python3.6 -m venv myproject1
$ source myproject1/bin/activate

When reactivating later in a new shell:

$ scl enable rh-python36 bash
$ source myproject1/bin/activate

Example for virtualenv:

$ scl enable rh-python36 bash
$ python3.6 -m virtualenv myproject1
$ source myproject1/bin/activate

When reactivating later in a new shell:

$ scl enable rh-python36 bash
$ source myproject1/bin/activate

How to permanently enable a software collection

To permanently add Python 3 to your path(s), you can add an scl_source command to the “dot files” for your specific user ID. The benefit of this approach is that the collection is already enabled at every login. If you are using a graphical desktop, everything that you start from the menu will already have the collection enabled.

There are a few caveats with this approach:

Using your preferred text editor, add the following line to your ~/.bashrc:

# Add RHSCL Python 3 to my login environment
source scl_source enable rh-python36

Note: you could also add the scl_source line to the start of a build script to select the desired Python for the build. If your build script isn’t written as a shell/bash script, you could just wrap it in a shell script that has the source scl_source command and then runs your build script.


How to use Python 3 from RHSCL in the #! (shebang) line of a script

You can create a script that will use Python from the software collection without a requirement for scl enable to be manually run first. This can be done by using /usr/bin/scl enable as the interpreter for the script:

#!/usr/bin/scl enable rh-python36 -- python3
import sys

version = "Python %d.%d" % (sys.version_info.major, sys.version_info.minor)
print("You are running Python",version)

Note: You may be tempted to try using just the full path to .../root/usr/bin/python without the scl enable. In many cases, this won’t work. The behavior is dependent on the specific software collection. For most collections, this will fail with a shared library error, since LD_LIBRARY_PATH isn’t set correctly. The python27 collection doesn’t give an error, but it finds the wrong shared library, so you get the wrong version of Python, which can be surprising. However, rh-python36 can be referenced directly without setting LD_LIBRARY_PATH, but it is currently the only Python collection that works that way. There is no guarantee that future collections will work the same way.

How to see which software collections are installed

You can use the command scl -l to see what software collections are installed. This will show all software collections that are installed, whether they are enabled or not.

$ scl -l

How to tell which software collections are enabled

The environment variable X_SCLS contains a list of the software collections that are currently enabled.

$ echo $X_SCLS
$ for scl in $X_SCLS; do echo $scl; done

In scripts, you can use scl_enabled collection-name to test if a specific collection is enabled.

How can I find a list of Red Hat Software Collections and how long they are supported?

See Red Hat Software Collections Product Life Cycle on the Red Hat Customer Portal. It has a list of Red Hat Software Collections packages and support information.

You can also check the release notes for the most recent release of Red Hat Software Collections.

Find additional RPM packages and see other available versions

You can use yum search to search for additional packages and see the other versions that are available:

To search for other packages that are part of the rh-python36 collection:

# yum search rh-python36

Starting with the Python 3.4 collection, the collection and package names are all prefixed with rh-. So you can use the following command to see all of the rh-python packages and, therefore, see what collections are available.

# yum search rh-python

Note: to see the available packages in the Python 2.7 collection, search for python27.

# yum search python27

You can, of course, just search for python and get a list of every available RPM that has python in the name or description. It will be a very long list, so it’s best to redirect the output to a file and use grep or a text editor to search the file. The packages that start with python- (without a version number) are part of the base RHEL Python 2.7.5 packages that are installed in /usr/bin.



Python: error while loading shared libraries

This error occurs when you are trying to run a binary but the shared libraries it depends on can’t be found. Typically this occurs when trying to run python from a software collection without enabling it first. In addition to setting PATH, scl enable also sets LD_LIBRARY_PATH. This adds the directory containing the software collection’s shared objects to the library search path.

To see what environment variables are modified, take a look at /opt/rh/rh-python/enable.

$ cat /opt/rh/rh-python36/enable 
export PATH=/opt/rh/rh-python36/root/usr/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/opt/rh/rh-python36/root/usr/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export MANPATH=/opt/rh/rh-python36/root/usr/share/man:$MANPATH
export PKG_CONFIG_PATH=/opt/rh/rh-python36/root/usr/lib64/pkgconfig${PKG_CONFIG_PATH:+:${PKG_CONFIG_PATH}}
export XDG_DATA_DIRS="/opt/rh/rh-python36/root/usr/share:${XDG_DATA_DIRS:-/usr/local/share:/usr/share}"

Wrong version of Python when running python

First, running python with no version number is likely to give you an unexpected version of Python at some point. The result is dependent on your PATH, which depends on whether you’ve enabled the software collection and/or activated the virtual environment. If you use a version number such as python3.6 and you haven’t enabled/activated the right environment, you’ll get a clean and easy-to-understand “command not found” error.

Second, you can also get the wrong version if you’ve forgotten to enable the software collection. Enabling the software collection puts the collection’s /bin directory in your path first, so it will hide all of the other versions of commands with the same name.

The software collection needs to be enabled even if you give the full path to the python binary. For most of the collections, you’ll get a shared library error (see above) without the library path being set correctly. However, if you try this with the python27 collection, you’ll get Python 2.7.5 (the default version) instead of Python 2.7.13 as you’d expect. This is because the shared library dependency is satisfied out of /lib instead of from the software collection, so you pick up the system Python.

Error running pip: ImportError cannot import name ‘main’

If you run pip upgrade --user pip, as some guides suggest, the pip command will no longer work. The problem is a path issue combined with an incompatibility between versions. The user installation of pip placed a new pip command in ~/.local/bin. However, ~/.local/bin is in your path *after* the software collection. So you get the older wrapper script that is incompatible with the newer module.

This can be worked around in several ways:

Note: To uninstall the upgraded pip that was installed in ~/.local, run the following command under your regular user ID (not root):

$ python3.6 -m pip uninstall pip


Can’t find virtualenv3.6

The rh-python36 software collection includes the virtualenv wrapper script but does not have a link for virtualenv3.6. There are two workarounds for this, but first I should point out that venv is now the Python 3 preferred tool for virtual environments.

The preferred workaround is to avoid the wrapper script entirely and invoke the module directly:

$ python3.6 -m virtualenv myproject1

Alternatively, you could create your own symlink in your ~/bin directory:

$ ln -s /opt/rh/rh-python36/root/usr/bin/virtualenv ~/bin/virtualenv3.6

More information: Developing in Python on Red Hat Platforms

Nick Coghlan and Graham Dumpleton gave a talk Developing in Python on Red Hat Platforms at DevNation 2016. The talk is chock full of information and still very relevant. They include information on building Python applications using containers, using s2i, and deploying to Red Hat OpenShift. I recommend watching the video or at least reviewing the slides.


After reading this article you’ve learned:



The post How to install Python 3 on RHEL appeared first on RHD Blog.

August 13, 2018 11:00 AM UTC

Simple is Better Than Complex

How to Use Bootstrap 4 Forms With Django

This is a quick tutorial to get you start with django-crispy-forms and never look back. Crispy-forms is a great application that gives you control over how you render Django forms, without breaking the default behavior. This tutorial is going to be tailored towards Bootstrap 4, but it can also be used with older Bootstrap versions as well as with the Foundation framework.

The main reason why I like to use it on my projects is because you can simply render a Django form using `` and it will be nicely rendered with Bootstrap 4, with very minimal setup. It’s a really life saver.


Install it using pip:

pip install django-crispy-forms

Add it to your INSTALLED_APPS and select which styles to use:




Setup Bootstrap

You can either download the latest Bootstrap 4 version at In that case, go to download page and get the Compiled CSS and JS version.

Or you can use the hosted Bootstrap CDN:

<link rel="stylesheet" href="" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
<script src="" integrity="sha384-ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script>

For simplicity, I will be using the CDN version. Here is my base.html template that will be referenced in the following examples:

<!doctype html>
<html lang="en">
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <link rel="stylesheet" href="" integrity="sha384-MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO" crossorigin="anonymous">
    <title>Django People</title>
    <div class="container">
      <div class="row justify-content-center">
        <div class="col-8">
          <h1 class="mt-2">Django People</h1>
          <hr class="mt-0 mb-4">
          {% block content %}
          {% endblock %}

I only added the CSS file because we won’t be using any JavaScript feature.

Basic Usage

Suppose we have a model named Person as follows:

from django.db import models

class Person(models.Model):
    name = models.CharField(max_length=130)
    email = models.EmailField(blank=True)
    job_title = models.CharField(max_length=30, blank=True)
    bio = models.TextField(blank=True)

Let’s say we wanted to create a view to add new Person objects. In that case we could use the built-in CreateView:

from django.views.generic import CreateView
from .models import Person

class PersonCreateView(CreateView):
    model = Person
    fields = ('name', 'email', 'job_title', 'bio')

Without any further change, Django will try to use a template named people/person_form.html. In that case “people” is the name of my Django app:


{% extends 'base.html' %}

{% block content %}
  <form method="post">
    {% csrf_token %}
    {{ form }}
    <button type="submit" class="btn btn-success">Save person</button>
{% endblock %}

This is a very basic form rendering, and as it is, Django will render it like this, with no style, just plain form fields:


To render the same form using Bootstrap 4 CSS classes you can do the following:


{% extends 'base.html' %}

{% load crispy_forms_tags %}

{% block content %}
  <form method="post" novalidate>
    {% csrf_token %}
    {{ form|crispy }}
    <button type="submit" class="btn btn-success">Save person</button>
{% endblock %}

Now the result, much better:

Bootstrap Form

There are some cases where you may want more freedom to render your fields. You can do so by rendering the fields manually and using the as_crispy_field template filter:

{% extends 'base.html' %}

{% load crispy_forms_tags %}


{% block content %}
  <form method="post" novalidate>
    {% csrf_token %}
    <div class="row">
      <div class="col-6">
        {{|as_crispy_field }}
      <div class="col-6">
        {{|as_crispy_field }}
    {{ form.job_title|as_crispy_field }}
    {{|as_crispy_field }}
    <button type="submit" class="btn btn-success">Save person</button>
{% endblock %}

And the result is something like the screen shot below:

Bootstrap Form

Form Helpers

The django-crispy-forms app have a special class named FormHelper to make your life easier and to give you complete control over how you want to render your forms.

Here is an example of an update view:

from django import forms
from crispy_forms.helper import FormHelper
from crispy_forms.layout import Submit
from people.models import Person

class PersonForm(forms.ModelForm):
    class Meta:
        model = Person
        fields = ('name', 'email', 'job_title', 'bio')

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.helper = FormHelper()
        self.helper.form_method = 'post'
        self.helper.add_input(Submit('submit', 'Save person'))

The job is done inside the __init__() method. The rest is just a regular Django model form. Here I’m defining that this form should handle the request using the POST method and the form should have an submit button with label “Save person”.

Now our view, just regular Django code:

from django.views.generic import UpdateView
from people.models import Person
from people.forms import PersonForm

class PersonUpdateView(UpdateView):
    model = Person
    form_class = PersonForm
    template_name = 'people/person_update_form.html'

Then in our template:


{% extends 'base.html' %}

{% load crispy_forms_tags %}

{% block content %}
  {% crispy form %}
{% endblock %}

Here we can simply call the {% crispy %} template tag and pass our form instance as parameter.

And that’s all you need to render the form:

Bootstrap Form


That’s pretty much it for the basics. Honestly that’s about all that I use. Usually I don’t even go for the FormHelper objects. But there are much more about it. If you are interested, you can check their official documentation:

If you are not sure about where you should create a certain file, or want to explore the sample project I created for this tutorial, you can grab the source code on GitHub at

August 13, 2018 10:00 AM UTC