skip to navigation
skip to content

Planet Python

Last update: June 25, 2017 10:46 AM

June 24, 2017

Stephen Ferg

Python Decorators

In August 2009, I wrote a post titled Introduction to Python Decorators. It was an attempt to explain Python decorators in a way that I (and I hoped, others) could grok.

Recently I had occasion to re-read that post. It wasn’t a pleasant experience — it was pretty clear to me that the attempt had failed.

That failure — and two other things — have prompted me to try again.

There is an old saying to the effect that “Every stick has two ends, one by which it may be picked up, and one by which it may not.” I believe that most explanations of decorators fail because they pick up the stick by the wrong end.

In this post I will show you what the wrong end of the stick looks like, and point out why I think it is wrong. And I will show you what I think the right end of the stick looks like.


The wrong way to explain decorators

Most explanations of Python decorators start with an example of a function to be decorated, like this:

def aFunction():
    print("inside aFunction")

and then add a decoration line, which starts with an @ sign:

def aFunction():
    print("inside aFunction")

At this point, the author of the introduction often defines a decorator as the line of code that begins with the “@”. (In my older post, I called such lines “annotation” lines. I now prefer the term “decoration” line.)

For instance, in 2008 Bruce Eckel wrote on his Artima blog

A function decorator is applied to a function definition by placing it on the line before that function definition begins.

and in 2004, Phillip Eby wrote in an article in Dr. Dobb’s Journal

Decorators may appear before any function definition…. You can even stack multiple decorators on the same function definition, one per line.

Now there are two things wrong with this approach to explaining decorators. The first is that the explanation begins in the wrong place. It starts with an example of a function to be decorated and an decoration line, when it should begin with the decorator itself. The explanation should end, not start, with the decorated function and the decoration line. The decoration line is, after all, merely syntactic sugar — is not at all an essential element in the concept of a decorator.

The second is that the term “decorator” is used incorrectly (or ambiguously) to refer both to the decorator and to the decoration line. For example, in his Dr. Dobb’s Journal article, after using the term “decorator” to refer to the decoration line, Phillip Eby goes on to define a “decorator” as a callable object.

But before you can do that, you first need to have some decorators to stack. A decorator is a callable object (like a function) that accepts one argument—the function being decorated.

So… it would seem that a decorator is both a callable object (like a function) and a single line of code that can appear before the line of code that begins a function definition. This is sort of like saying that an “address” is both a building (or apartment) at a specific location and a set of lines (written in pencil or ink) on the front of a mailing envelope. The ambiguity may be almost invisible to someone familiar with decorators, but it is very confusing for a reader who is trying to learn about decorators from the ground up.


The right way to explain decorators

So how should we explain decorators?

Well, we start with the decorator, not the function to be decorated.

We start with the basic notion of a function — a function is something that generates a value based on the values of its arguments.

We note that in Python, functions are first-class objects, so they can be passed around like other values (strings, integers, objects, etc.).

We note that because functions are first-class objects in Python, we can write functions that both (a) accept function objects as argument values, and (b) return function objects as return values. For example, here is a function foobar that accepts a function object original_function as an argument and returns a function object new_function as a result.

def foobar(original_function):

    # make a new function
    def new_function():
        # some code

    return new_function

We define “decorator”.

A decorator is a function (such as foobar in the above example) that takes a function object as an argument, and returns a function object as a return value.

So there we have it — the definition of a decorator. Anything else that we say about decorators is a refinement of, or an expansion of, or an addition to, this definition of a decorator.

We show what the internals of a decorator look like. Specifically, we show different ways that a decorator can use the original_function in the creation of the new_function. Here is a simple example.

def verbose(original_function):

    # make a new function that prints a message when original_function starts and finishes
    def new_function(*args, **kwargs):
        print("Entering", original_function.__name__)
        original_function(*args, **kwargs)
        print("Exiting ", original_function.__name__)

    return new_function

We show how to invoke a decorator — how we can pass into a decorator one function object (its input) and get back from it a different function object (its output). In the following example, we pass the widget_func function object to the verbose decorator, and we get back a new function object to which we assign the name talkative_widget_func.

def widget_func():
    # some code

talkative_widget_func = verbose(widget_func)

We point out that decorators are often used to add features to the original_function. Or more precisely, decorators are often used to create a new_function that does roughly what original_function does, but also does things in addition to what original_function does.

And we note that the output of a decorator is typically used to replace the original function that we passed in to the decorator as an argument. A typical use of decorators looks like this. (Note the change to line 4 from the previous example.)

def widget_func():
    # some code

widget_func = verbose(widget_func)

So for all practical purposes, in a typical use of a decorator we pass a function (widget_func) through a decorator (verbose) and get back an enhanced (or souped-up, or “decorated”) version of the function.

We introduce Python’s “decoration syntax” that uses the “@” to create decoration lines. This feature is basically syntactic sugar that makes it possible to re-write our last example this way:

def widget_func():
    # some code

The result of this example is exactly the same as the previous example — after it executes, we have a widget_func that has all of the functionality of the original widget_func, plus the functionality that was added by the verbose decorator.

Note that in this way of explaining decorators, the “@” and decoration syntax is one of the last things that we introduce, not one of the first.

And we absolutely do not refer to line 1 as a “decorator”. We might refer to line 1 as, say, a “decorator invocation line” or a “decoration line” or simply a “decoration”… whatever. But line 1 is not a “decorator”.

Line 1 is a line of code. A decorator is a function — a different animal altogether.


Once we’ve nailed down these basics, there are a few advanced features to be covered.

Ten — A decorators cookbook

The material that we’ve covered up to this point is what any basic introduction to Python decorators would cover. But a Python programmer needs something more in order to be productive with decorators. He (or she) needs a catalog of recipes, patterns, examples, and commentary that describes / shows / explains when and how decorators can be used to accomplish specific tasks. (Ideally, such a catalog would also include examples and warnings about decorator gotchas and anti-patterns.) Such a catalog might be called “Python Decorator Cookbook” or perhaps “Python Decorator Patterns”.

So that’s it. I’ve described what I think is wrong (well, let’s say suboptimal) about most introductions to decorators. And I’ve sketched out what I think is a better way to structure an introduction to decorators.

Now I can explain why I like Matt Harrison’s e-book Guide to: Learning Python Decorators. Matt’s introduction is structured in the way that I think an introduction to decorators should be structured. It picks up the stick by the proper end.

The first two-thirds of the Guide hardly talk about decorators at all. Instead, Matt begins with a thorough discussion of how Python functions work. By the time the discussion gets to decorators, we have been given a strong understanding of the internal mechanics of functions. And since most decorators are functions (remember our definition of decorator), at that point it is relatively easy for Matt to explain the internal mechanics of decorators.

Which is just as it should be.

Revised 2012-11-26 — replaced the word “annotation” with “decoration”, following terminology ideas discussed in the comments.

June 24, 2017 06:52 PM


On Starting with Summer Training at #dgplug

I started out with a very vague idea, of learning programming last year.

I went to Pycon India, fell in love with the community, decided to learn software, and came home all charged up.
(Btw, I was so intimidated, I did not speak to a single soul)

The plan was to sort personal issues, tackle a couple of major work projects so that I could then focus on learning, clear the decks and go full steam ahead come April.

While I made headway, I was also missing the hum and bustle of Pycon that had so charged me, but I did remember one session I attended, that had left me smiling was a sponsored talk of all things, by a certain Mr. Das. Off the cuff, naturally, warmly delivered.

So as I was looking for … someone to talk to, somewhere to belong, who comes along but Santa Das.

While that trip didn't quite happen due to personal reasons, we still kept in touch.
(Why he would do that with a newbie, know nothing like me, I do not know. The man has a large heart)

And when the new session of #dgplug was announced, I jumped at the chance!

To close, here’s a little something, something about me 1

  1. Yes, I am obviously hiding my big, fat tummy in the pic :) 2
  2. I’m like a poor man’s, still failing James Altucher.
  3. Yes, I’m a lot older than most of you :) 3
  4. I’ve been at this IT thing a long time. (since 1997, in fact) 4
  5. And yes, only now do I get the bright idea to learn software.
  6. I love the fact, that I get you to be my plus-minus-equal
  7. You folks make me feel all warm and enthusiastic and welcoming and make me feel like I found my tribe!
  8. I’m still head over heels in love with my better half

I look to learn so much from you and know so much more of you over the coming months.
I wish you all make good art!

  1. My grandma says that :)

  2. dropped 7 kgs to 89. Only another 20 to go!

  3. not necessarily wiser :P

  4. land line telephone fixer man,hardware tech support at small firm, hardware tech support at huge firm, freelance engineer, consulting engineer, consulting manager.

June 24, 2017 06:33 PM

Weekly Python StackOverflow Report

(lxxix) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-06-24 17:55:47 GMT

  1. Loop through a list in Python and modify it - [40/4]
  2. How can I take two tuples to produce a dictionary? - [12/2]
  3. How to receive an update notification when a user enables 2-step verification? - [10/0]
  4. Python: Byte code of a compiled script differs based on how it was compiled - [7/1]
  5. Check if a string contains the list elements - [6/6]
  6. Creating a dictionary for each word in a file and counting the frequency of words that follow it - [6/5]
  7. Pandas: Count the first consecutive True values - [6/4]
  8. Does declaring variables in a function called from __init__ still use a key-sharing dictionary? - [6/2]
  9. What exactly does super() return in Python 3? - [6/2]
  10. Scipy sparse matrix exponentiation: a**16 is slower than a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a? - [6/1]

June 24, 2017 06:11 PM

Peter Bengtsson

How to do performance micro benchmarks in Python

Suppose that you have a function and you wonder, "Can I make this faster?" Well, you might already have thought that and you might already have a theory. Or two. Or three. Your theory might be sound and likely to be right, but before you go anywhere with it you need to benchmark it first. Here are some tips and scaffolding for doing Python function benchmark comparisons.


  1. Internally, Python will warm up and it's likely that your function depends on other things such as databases or IO. So it's important that you don't test function1 first and then function2 immediately after because function2 might benefit from a warm up painfully paid for by function1. So mix up the order of them or cycle through them enough that they all pay for or gain from warm ups.

  2. Look at the median first. The mean (aka. average) is often tainted by spikes and these spikes of slow-down can be caused by your local Spotify client deciding to reindex itself or something some such. Sometimes those spikes matter. For example, garbage collection is inevitable and will have an effect that matters.

  3. Run your functions many times. So many times that the whole benchmark takes a while. Like tens of seconds or more. Also, if you run it significantly long it's likely that all candidates gets punished by the same environmental effects such as garbage collection or CPU being reassinged to something else intensive on your computer.

  4. Try to take your benchmark into different, and possibly more realistic environments. For example, don't rely on reading a file like /Users/peterbe/only/on/my/macbook when, likely, the end destination for your code is an Ubuntu server in AWS. Write your code so that it's easy to copy and paste around, like into a vi/jed editor in an ssh session somewhere.

  5. Sanity check each function before benchmarking them. No need for pytest or anything fancy but just make sure that you test them in some basic way. But the assertion testing is likely to add to the total execution time so don't do it when running the functions.

  6. Avoid "prints" inside the time measured code. A print() is I/O and an "external resource" that can become very unfair to compare CPU bound performance.

  7. Don't fear testing many different functions. If you have multiple ideas of doing a function differently, it's cheap to pile them on. But be careful how you "report" because if there are many different ways of doing something you might accidentally compare different fruit without noticing.

  8. Make sure your functions take at least one parameter. I'm no Python core developer or C hacker but I know there are "murks" within a compiler and interpreter that might do what a regular memoizer might done. Also, the performance difference can be reversed on tiny inputs compared to really large ones.

  9. Be humble with the fact that 0.01 milliseconds difference when doing 10,000 iterations is probably not worth writing a more complex and harder-to-debug function.

The Boilerplate

Let's demonstrate with an example:

# The functions to compare
import math

def f1(degrees):
    return math.cos(degrees)

def f2(degrees):
    e = 2.718281828459045
    return (
        (e**(degrees * 1j) + e**-(degrees * 1j)) / 2

# Assertions
assert f1(100) == f2(100) == 0.862318872287684
assert f1(1) == f2(1) == 0.5403023058681398

# Reporting
import time
import random
import statistics

functions = f1, f2
times = {f.__name__: [] for f in functions}

for i in range(100000):  # adjust accordingly so whole thing takes a few sec
    func = random.choice(functions)
    t0 = time.time()
    t1 = time.time()
    times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))

Let's break that down a bit.

You run that and get something like this:

FUNCTION: f1 Used 49990 times
    MEDIAN 0.0
    MEAN   0.00045161219591330375
    STDEV  0.0011268475946446341
FUNCTION: f2 Used 50010 times
    MEDIAN 0.00095367431640625
    MEAN   0.0009188626294516487
    STDEV  0.000642871632138125

More Examples

The example above already broke one of the tenets in that these functions were simply too fast. Doing rather basic mathematics is just too fast to compare with such a trivial benchmark. Here are some other examples:

Remove duplicates from list without losing order

# The functions to compare

def f1(seq):
    checked = []
    for e in seq:
        if e not in checked:
    return checked

def f2(seq):
    checked = []
    seen = set()
    for e in seq:
        if e not in seen:
    return checked

def f3(seq):
    checked = []
    [checked.append(i) for i in seq if not checked.count(i)]
    return checked

def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

def f5(seq):
    def generator():
        seen = set()
        for x in seq:
            if x not in seen:
                yield x

    return list(generator())

# Assertion
import random

def _random_seq(length):
    seq = []
    for _ in range(length):
    return seq

L = list('abca')
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L) == list('abc')
L = _random_seq(10)
assert f1(L) == f2(L) == f3(L) == f4(L) == f5(L)

# Reporting
import time
import statistics

functions = f1, f2, f3, f4, f5
times = {f.__name__: [] for f in functions}

for i in range(3000):
    seq = _random_seq(i)
    for _ in range(len(functions)):
        func = random.choice(functions)
        t0 = time.time()
        t1 = time.time()
        times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))


FUNCTION: f1 Used 3029 times
    MEDIAN 0.6871223449707031
    MEAN   0.6917867380307822
    STDEV  0.42611748137761174
FUNCTION: f2 Used 2912 times
    MEDIAN 0.054955482482910156
    MEAN   0.05610262627130026
    STDEV  0.03000829926668248
FUNCTION: f3 Used 2985 times
    MEDIAN 1.4472007751464844
    MEAN   1.4371055654145566
    STDEV  0.888658217522005
FUNCTION: f4 Used 2965 times
    MEDIAN 0.051975250244140625
    MEAN   0.05343245816673035
    STDEV  0.02957275548477728
FUNCTION: f5 Used 3109 times
    MEDIAN 0.05507469177246094
    MEAN   0.05678296204202234
    STDEV  0.031521596461048934


def f4(seq):
    seen = set()
    return [x for x in seq if x not in seen and not seen.add(x)]

Fastest way to count the number of lines in a file

# The functions to compare
import codecs
import subprocess

def f1(filename):
    count = 0
    with, encoding='utf-8', errors='ignore') as f:
        for line in f:
            count += 1
    return count

def f2(filename):
    with, encoding='utf-8', errors='ignore') as f:
        return len(

def f3(filename):
    return int(subprocess.check_output(['wc', '-l', filename]).split()[0])

# Assertion
filename = 'big.csv'
assert f1(filename) == f2(filename) == f3(filename) == 9999

# Reporting
import time
import statistics
import random

functions = f1, f2, f3
times = {f.__name__: [] for f in functions}

filenames = '', 'hacker_news_data.txt', 'yarn.lock', 'big.csv'
for _ in range(200):
    for fn in filenames:
        for func in functions:
            t0 = time.time()
            t1 = time.time()
            times[func.__name__].append((t1 - t0) * 1000)

for name, numbers in times.items():
    print('FUNCTION:', name, 'Used', len(numbers), 'times')
    print('\tMEDIAN', statistics.median(numbers))
    print('\tMEAN  ', statistics.mean(numbers))
    print('\tSTDEV ', statistics.stdev(numbers))


FUNCTION: f1 Used 800 times
    MEDIAN 5.852460861206055
    MEAN   25.403797328472137
    STDEV  37.09347378640582
FUNCTION: f2 Used 800 times
    MEDIAN 0.45299530029296875
    MEAN   2.4077045917510986
    STDEV  3.717931526478758
FUNCTION: f3 Used 800 times
    MEDIAN 2.8804540634155273
    MEAN   3.4988239407539368
    STDEV  1.3336427480808102


def f2(filename):
    with, encoding='utf-8', errors='ignore') as f:
        return len(


No conclusion really. Just wanted to point out that this is just a hint of a decent start when doing performance benchmarking of functions.

There is also the timeit built-in for "provides a simple way to time small bits of Python code" but it has the disadvantage that your functions are not allowed to be as complex. Also, it's harder to generate multiple different fixtures to feed your functions without that fixture generation effecting the times.

There's a lot of things that this boilerplate can improve such as sorting by winner, showing percentages comparisons against the fastests, ASCII graphs, memory allocation differences, etc. That's up to you.

June 24, 2017 01:50 PM

June 23, 2017

Patrick Kennedy

Using Docker for Flask Application Development (not just Production!)


I’ve been using Docker for my staging and production environments, but I’ve recently figured out how to make Docker work for my development environment as well.

When I work on my personal web applications, I have three environments:

While having a development environment that is significantly different (ie. not using Docker) from the staging/production environments is not an issue, I’ve really enjoyed the switch to using Docker for development.

The key aspects that were important to me when deciding to switch to Docker for my development environment were:

  1. Utilize the Flask development server instead of a production web server (Gunicorn)
  2. Allow easy access to my database (Postgres)
  3. Maintain my unit/integration testing capability

This blog post shows how to configure Docker and Docker Compose for creating a development environment that you can easily use on a day-to-day basis for developing a Flask application.

For reference, my Flask project that is the basis for this blog post can be found on GitLab.


The architecture for this Flask application is illustrated in the following diagram:

Docker Application Architecture

Each key component has its own sub-directory in the repository:

$ tree
├── docker-compose.yml
├── nginx
│   ├── Dockerfile
├── postgresql
│   └── Dockerfile  * Not included in git repository
└── web
    ├── Dockerfile
    ├── instance
    ├── project
    ├── requirements.txt

Configuration of Dockerfiles and Docker Compose for Production

The setup for my application utilizes separate Dockerfiles for the web application, Nginx, and Postgres; these services are integrated together using Docker Compose.

Web Application Service

Originally, I had been using the python-*:onbuild image for my web application image, as this seemed like a convenient and reasonable option (it provided the standard configurations for a python project). However, in reading the notes in the python page on Docker Hub, the use of the python-*:onbuild images are not recommended anymore.

Therefore, I created a Dockerfile that I use for my web application:

FROM python:3.6.1
MAINTAINER Patrick Kennedy <>

# Create the working directory
RUN mkdir -p /usr/src/app/web
WORKDIR /usr/src/app/web

# Install the package dependencies (this step is separated
# from copying all the source code to avoid having to
# re-install all python packages defined in requirements.txt
# whenever any source code change is made)
COPY requirements.txt /usr/src/app/web
RUN pip install --no-cache-dir -r requirements.txt

# Copy the source code into the container
COPY . /usr/src/app/web

It may seem odd or out of sequence to copy the requirements.txt file from the local system into the container separately from the entire repository, but this is intentional. If you copy over the entire repository and then ‘pip install’ all the packages in requirements.txt, any change in the repository will cause all the packages to be re-installed (this can take a long time and is unnecessary) when you build this container. A better approach is to first just copy over the requirements.txt file and then run ‘pip install’. If changes are made to the repository (not to requirements.txt), then the cached intermediate container (or layer in your service) will be utilized. This is a big time saver, especially during development. Of course, if you make a change to requirements.txt, this will be detected during the next build and all the python packages will be re-installed in the intermediate container.

Nginx Service

Here is the Dockerfile that I use for my Nginx service:

FROM nginx:1.11.3
RUN rm /etc/nginx/nginx.conf
COPY nginx.conf /etc/nginx/
RUN rm /etc/nginx/conf.d/default.conf
COPY family_recipes.conf /etc/nginx/conf.d/

There is a lot of complexity when it comes to configuring Nginx, so please refer to my blog post entitled ‘How to Configure Nginx for a Flask Web Application‘.

Postgres Service

The Dockerfile for the postgres service is very simple, but I actually use a python script ( to auto-generate it based on the credentials of my postgres database. The structure of the Dockerfile is:

FROM postgres:9.6

# Set environment variables
ENV POSTGRES_USER <postgres_user>
ENV POSTGRES_PASSWORD <postgres_password>
ENV POSTGRES_DB <postgres_database>

Docker Compose

Docker Compose is a great tool for connecting different services (ie. containers) to create a fully functioning application. The configuration of the application is defined in the docker-compose.yml file:

version: '2'

    restart: always
    build: ./web
      - "8000"
      - /usr/src/app/web/project/static
    command: /usr/local/bin/gunicorn -w 2 -b :8000 project:app
      - postgres

    restart: always
    build: ./nginx
      - "80:80"
      - /www/static
      - web
      - web

    image: postgres:9.6
      - /var/lib/postgresql
    command: "true"

    restart: always
    build: ./postgresql
      - data
      - "5432"

The following commands need to be run to build and then start these containers:

  docker-compose build
  docker-compose -f docker-compose.yml up -d

Additionally, I utilize a script to re-initialize the database, which is frequently used in the staging environment:

  docker-compose run --rm web python ./instance/

To see the application, utilize your favorite web browser and navigate to http://ip_of_docker_machine/ to access the application; this will often be The command ‘docker-machine ip’ will tell you the IP address to use.

Changes Needed for Development Environment

The easiest way to make the necessary changes for the development environment is to create the changes in the docker-compose.override.yml file.

Docker Compose automatically checks for docker-compose.yml and docker-compose.override.yml when the ‘up’ command is used. Therefore, in development use ‘docker-compose up -d’ and in production or staging use ‘docker-compose -f docker-compose.yml up -d’ to prevent the loading of docker-compose.override.yml.

Here are the contents of the docker-compose.override.yml file:

version: '2'

    build: ./web
      - "5000:5000"
      - FLASK_DEBUG=1
      - ./web/:/usr/src/app/web
    command: flask run --host=

      - "5432:5432"

Each line in the docker-compose.override.yml overrides the applicable setting from

Web Application Service

For the web application container, the web server is being switched from Gunicorn (used in production) to the Flask development server. The Flask development server allows auto-reloading of the application whenever a change is made and has debugging capability right in the browser when exceptions occurs. These are create features to have during development. Additionally, port 5000 is now accessible from the web application container. This allows the developer to gain access to the Flask web server by navigating to http://ip_of_docker_machine:5000.

Postgres Service

For the postgres container, the only change that is made is to allow access to port 5432 by the host machine instead of just other services. For reference, here is a good explanation of the use of ‘ports’ vs. ‘expose’ from Stack Overflow.

This change allows direct access to the postgres database using the psql shell. When accessing the postgres database, I prefer specifying the URI:

psql postgresql://<username>:<password>@<postgres_database>

This allows you access to the postgres database, which will come in really handy at some point during development (almost a guarantee).

Nginx Service

While there are no override commands for the Nginx service, this service will be basically ignored during development, as the web application is accessed directly through the Flask web server by navigating to http://ip_of_docker_machine:5000/. I have not found a clear way to disable a service, so the Nginx service is left untouched.

Running the Development Application

The following commands should be run to build and run the containers:

docker-compose stop    # If there are existing containers running, stop them
docker-compose build
docker-compose up -d

Since you are running in a development environment with the Flask development server, you will need to navigate to http://ip_of_docker_machine:5000/ to access the application (for example, The command ‘docker-machine ip’ will tell you the IP address to use.

Another helpful command that allows quick access to the logs of a specific container is:

docker-compose logs <service>

For example, to see the logs of the web application, run ‘docker-compose logs web’. In the development environment, you should see something similar to:

$ docker-compose logs web
Attaching to flaskrecipeapp_web_1
web_1 | * Running on (Press CTRL+C to quit)
web_1 | * Restarting with stat
web_1 | * Debugger is active!
web_1 | * Debugger pin code: ***-***-***


Docker is an amazing product that I have really come to enjoy using for my development environment. I really feel that using Docker makes you think about your entire architecture, as Docker provides such an easy way to start integrating complex services, like web services, databases, etc.

Using Docker for a development environment does require a good deal of setup, but once you have the configuration working, it’s a great way for quickly developing your application while still having that one foot towards the production environment.


Docker Compose File (version 2) Reference

Dockerizing Flask With Compose and Machine – From Localhost to the Cloud
NOTE: This was the blog post that got me really excited to learn about Docker!

Docker Compose for Development and Production – GitHub – Antonis Kalipetis
Also, check out Antonis’ talk from DockerCon17 on YouTube.

Overview of Docker Compose CLI

Docker Command Reference

Start or Re-start Docker Machine:
$ docker-machine start default
$ eval $(docker-machine env default)

Build all of the images in preparation for running your application:
$ docker-compose build

Using Docker Compose to run the multi-container application (in daemon mode):
$ docker-compose up -d
$ docker-compose -f docker-compose.yml up -d

View the logs from the different running containers:
$ docker-compose logs
$ docker-compose logs web # or whatever service you want

Stop all of the containers that were started by Docker Compose:
$ docker-compose stop

Run a command in a specific container:
$ docker-compose run –rm web python ./instance/
$ docker-compose run web bash

Check the containers that are running:
$ docker ps

Stop all running containers:
$ docker stop $(docker ps -a -q)

Delete all running containers:
$ docker rm $(docker ps -a -q)

Delete all untagged Docker images
$ docker rmi $(docker images | grep “^” | awk ‘{print $3}’)

June 23, 2017 03:31 PM

Django Weekly

Django Weekly Issue 44 - Django vs Flask, Translation, Kubernetes, Google Authentication and more

Worthy Read

This is the first part of a series about Django performance optimization. It will cover logging, debug toolbar, for testing, Silk etc.

This analysis is a comparison of 2 python frameworks, Flask and Django. It discusses their features and how their technical philosophies impact software developers. It is based on my experience using both, as well as time spent personally admiring both codebases.
web framework

Gartner’s Recommendations for Long-Term Pipeline Success.

After reading this article, you have a basic understanding of translating a Django app.

Health checks are a great way to help Kubernetes help your app to have high availability, and that includes Django apps.

Illustration of request processing in Django from browser to back.

How to build a simple google authentication app on Django framework.

Automation in Django is a developer dream. Tedious work such as creating database backup, reporting annual KPI, or even blasting email could be made a breeze. Through Celery?—?a well-known software in Python for delegating task?—?such action made possible.

Django For Beginners Book


django-admin-env-notice - 24 Stars, 0 Fork
Visually distinguish environments in Django Admin. Based on great advice from post: 5 ways to make Django Admin safer by hakibenita.

Scrum - 0 Stars, 0 Fork
Now work in a far more efficient and organized manner! The project allows users to list their tasks in a scrum order, monitor and update their progress.

django-base - 0 Stars, 0 Fork
A Dockerized Django project template with NGINX and PostgreSQL ready to go

June 23, 2017 12:50 PM


PyData EuroPython 2017

We are excited to announce a complete PyData track at EuroPython 2017 in Rimini, Italy from the 9th to 16th July.


PyData EuroPython 2017

The PyData track will be part of EuroPython 2017, so you won’t need to buy an extra ticket to attend. Mostly talks and trainings are scheduled for Wednesday, Thursday and Friday (July 12-14), with a few on other days as well.

We will have over 40 talks, 5 trainings, and 2 keynotes dedicated to PyData. If you’d like to attend PyData EuroPython 2017, please register for EuroPython 2017 soon.


EuroPython 2017 Team
EuroPython Society
EuroPython 2017 Conference

June 23, 2017 10:29 AM

Fabio Zadrozny

mu-repo: Dealing with multiple git repositories

It's been a while since I've commented about mu-repo, so, now that 1.6.0 is available, I decided to give some more details on the latest additions ;)

-- if you're reading this and don't know what mu-repo is, it's a tool (done in Python) which helps when dealing with multiple git repositories (providing a way to call git commands on multiple repositories at once, along some other bells and whistles). has more details.

The last 2 major things that were introduced where:

1. A workflow for creating code-reviews in multiple repositories at once.

2. The possibility of executing non-git commands on multiple repositories at once.

For #1, the command mu open-url was created. Mostly, it'll compare the current branch against a different branch and open browser tabs making replacements in the url passed with the name of the repository ( has more info and examples on how to use this for common git hosting platforms).

For #2, it's possible to execute a given command in the multiple tracked repositories by using the mu sh command. Mostly, call mu sh and pass the command you want to issue in the multiple tracked repositories.

e.g.: calling mu sh python develop will call python develop on each of the tracked repository directories.

That's it... enjoy!

June 23, 2017 09:07 AM

Brad Lucas

Python Virtualenv


Virtualenv supports the creation of isolated Python environments. This allows you to create your project with all of it's dependencies in one place. Not only does this allow for a simpler deployment path when you release your project but it also makes trying different versions of libraries and experimenting safer.

The following is a good intro for virtualenv.


Step 1 is to install virtualenv. When you run the following you'll be installing virtualenv on your machine.

$ pip install virtualenv

You should be able to execute the command virtualenv afterwards.

When you can you can move on.


Here is how I use virtualenv

I create my project directory such as:

$ cd /home/brad/projects/
$ mkdir example
$ cd example

Then I setup a virutal environment for this project with the following command:

$ virtualenv env

This differs from the site mentioned above. I do the same for every project so when I'm in each of my project directories they all have an env directory. Then everything is the same. This keeps things simple and easily remembered.

To start the virtual env you run the following from the project directory

$ source ./env/bin/activate

Once your virtual environment is activated anything you install will be put into the environment. If you look you'll notice new directories inside of ./env/lib/python2.7/site-packages after each install.

Now, install what you need. For example,

$ pip install zipline

To see all the libraries in your environment you can run pip list.

$ pip list

When you run python in your active environment you'll have access to all the libraries you've just installed.

Lastly, to get out of the virtual env you can run deactivate.

$ deactivate

June 23, 2017 04:00 AM

Hynek Schlawack

Sharing Your Labor of Love: PyPI Quick and Dirty

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

June 23, 2017 12:00 AM

June 22, 2017

Tarek Ziade

Advanced Molotov example

Last week, I blogged about how to drive Firefox from a Molotov script using Arsenic.

It is pretty straightforward if you are doing some isolated interactions with Firefox and if each worker in Molotov lives its own life.

However, if you need to have several "users" (==workers in Molotov) running in a coordinated way on the same web page, it gets a little bit tricky.

Each worker is its coroutine and triggers the execution of one scenario by calling the coroutine that was decorated with @scenario.

Let's consider this simple use case: we want to run five workers in parallel that all visit the same etherpad lite page with their own Firefox instance through Arsenic.

One of them is adding some content in the pad and all the others are waiting on the page to check that it is updated with that content.

So we want four workers to wait on a condition (=pad written) before they make sure and check that they can see it.

Moreover, since Molotov can call a scenario many times in a row, we need to make sure that everything was done in the previous round before changing the pad content again. That is, four workers did check the content of the pad.

To do all that synchronization, Python's asyncio offers primitives that are similar to the one you would use with threads. asyncio.Event can be used for instance to have readers waiting for the writer and vice-versa.

In the example below, a class wraps two Events and exposes simple methods to do the syncing by making sure readers and writer are waiting for each other:

class Notifier(object):
    def __init__(self, readers=5):
        self._current = 1
        self._until = readers
        self._readers = asyncio.Event()
        self._writer = asyncio.Event()

    def _is_set(self):
        return self._current == self._until

    async def wait_for_writer(self):
        await self._writer.wait()

    async def one_read(self):
        if self._is_set():
        self._current += 1
        if self._current == self._until:

    def written(self):

    async def wait_for_readers(self):
        await self._readers.wait()

Using this class, the writer can call written() once it has filled the pad and the readers can wait for that event by calling wait_for_writer() which blocks until the write event is set.

one_read() is then called for each read. This second event is used by the next writer to make sure it can change the pad content after every reader did read it.

So how do we use this class in a Molotov test? There are several options and the simplest one is to create one Notifier instance per run and set it in a variable:

async def example(session):
    get_var = molotov.get_var
    notifier = get_var('notifier' + str(session.step),
    wid = session.worker_id

    if wid != 4:
        # I am NOT worker 4! I read the pad

        # wait for worker #4 to edit the pad
        await notifier.wait_for_writer()

        # <.. pad reading here...>

        # notify that we've read it
        await notifier.one_read()
        # I am worker 4! I write in the pad
        if session.step > 1:
            # waiting for the previous readers to have finished
            # before we start a new round
            previous_notifier = get_var('notifier' + str(session.step))
            await previous_notifier.wait_for_readers()

        # <... writes in the pad...>

        # informs that the write task was done

A lot is going on in this scenario. Let's look at each part in detail. First of all, the notifier is created as a var via set_var(). Its name contains the session step.

The step value is incremented by Molotov every time a worker is running a scenario, and we can use that value to create one distinct Notifier instance per run. It starts at 1.

Next, the session.worker_id value gives each distinct worker a unique id. If you run molotov with 5 workers, you will get values from 0 to 4.

We are making the last worker (worker id== 4) the one that will be in charge of writing in the pad.

For the other workers (=readers), they just use wait_for_writer() to sit and wait for worker 4 to write the pad. worker 4 notifies them with a call to written().

The last part of the script allows Molotov to run the script several times in a row using the same workers. When the writer starts its work, if the step value is superior to one, it means that we have already run the test at least one time.

The writer, in that case, gets back the Notifier from the previous run and verifies that all the readers did their job before changing the pad.

All of this syncing work sound complicated, but once you understand the pattern, it let you run advanced scenario in Molotov where several concurrent "users" need to collaborate.

You can find the full script at

June 22, 2017 10:00 PM

Stephen Ferg

Unicode for dummies — Encoding

Another entry in an irregular series of posts about Unicode.
Typos fixed 2012-02-22. Thanks Anonymous, and Clinton, for reporting the typos.

This is a story about encoding and decoding, with a minor subplot involving Unicode.

As our story begins — on a dark and stormy night, of course — we find our protagonist deep in thought. He is asking himself “What is an encoding?”

What is an encoding?

The basic concepts are simple. First, we start with the idea of a piece of information — a message — that exists in a representation that is understandable (perspicuous) to a human being. I’m going to call that representation “plain text”. For English-language speakers, for example, English words printed on a page, or displayed on a screen, count as plain text.

Next, (for reasons that we won’t explore right now) we need to be able to translate a message in a plain-text representation into some other representation (let’s call that representation the “encoded text”), and we need to be able to translate the encoded text back into plain text. The translation from plain text to encoded text is called “encoding”, and the translation of encoded text back into plain text is called “decoding”.

There are three points worth noting about this process.

The first point is that no information can be lost during encoding or decoding. It must be possible for us to send a message on a round-trip journey — from plain text to encoded text, and then back again from encoded text to plain text — and get back exactly the same plain text that we started with. That is why, for instance, we can’t use one natural language (Russian, Chinese, French, Navaho) as an encoding for another natural language (English, Hindi, Swahili). The mappings between natural languages are too loose to guarantee that a piece of information can make the round-trip without losing something in translation.

The requirement for a lossless round-trip means that the mapping between the plain text and the encoded text must be very tight, very exact. And that brings us to the second point.

In order for the mapping between the plain text and the encoded text to be very tight — which is to say: in order for us to be able to specify very precisely how the encoding and decoding processes work — we must specify very precisely what the plain text representation looks like.

Suppose, for example, we say that plain text looks like this: the 26 upper-case letters of the Anglo-American alphabet, plus the space and three punctuation symbols: period (full stop), question mark, and dash (hyphen). This gives us a plain-text alphabet of 30 characters. If we need numbers, we can spell them out, like this: “SIX THOUSAND SEVEN HUNDRED FORTY-THREE”.

On the other hand, we may wish to say that our plain text looks like this: 26 upper-case letters, 26 lower-case letters, 10 numeric digits, the space character, and a dozen types of punctuation marks: period, comma, double-quote, left parenthesis, right parenthesis, and so on. That gives us a plain-text alphabet of 75 characters.

Once we’ve specified exactly what a plain-text representation of a message looks like — a finite sequence of characters from our 30-character alphabet, or perhaps our 75-character alphabet — then we can devise a system (a code) that can reliably encode and decode plain-text messages written in that alphabet. The simplest such system is one in which every character in the plain-text alphabet has one and only one corresponding representation in the encoded text. A familiar example is Morse code, in which “SOS” in plain text corresponds to

                ... --- ...

in encoded text.

In the real world, of course, the selection of characters for the plain-text alphabet is influenced by technological limitations on the encoded text. Suppose we have several available technologies for storing encoded messages: one technology supports an encoded alphabet of 256 characters, another technology supports only 128 encoded characters, and a third technology supports only 64 encoded characters. Naturally, we can make our plain-text alphabet much larger if we know that we can use a technology that supports a larger encoded-text alphabet.

And the reverse is also true. If we know that our plain-text alphabet must be very large, then we know that we must find — or devise — a technology capable of storing a large number of encoded characters.

Which brings us to Unicode.


Unicode was devised to be a system capable of storing encoded representations of every plain-text character of every human language that has ever existed. English, French, Spanish. Greek. Arabic. Hindi. Chinese. Assyrian (cuneiform characters).

That’s a lot of characters.

So the first task of the Unicode initiative was simply to list all of those characters, and count them. That’s the first half of Unicode, the Universal Character Set. (And if you really want to “talk Unicode”, don’t call plain-text characters “characters”. Call them “code points”.)

Once you’ve done that, you’ve got to figure out a technology for storing all of the corresponding encoded-text characters. (In Unicode-speak, the encoded-text characters are called “code values”.)

In fact Unicode defines not one but several methods of mapping code points to code values. Each of these methods has its own name. Some of the names start with “UTF”, others start with “UCS”: UTF-8, UTF-16, UTF-32, UCS-2, UCS-4, and so on. The naming convention is “UTF-” and “UCS-” Some (e.g. UCS-4 and UTF-32) are functionally equivalent. See the Wikipedia article on Unicode.

The most important thing about these methods is that some are fixed-width encodings and some are variable-width encodings. The basic idea is that the fixed-width encodings are very long — UCS-4 and UTF-32 are 4 bytes (32 bits) long — long enough to hold the the biggest code value that we will ever need.

In contrast, the variable-width encodings are designed to be short, but expandable. UTF-8, for example, can use as few as 8 bits (one byte) to store Latin and ASCII characters code points. But it also has a sort of “continued on the next byte” mechanism that allows it to use 2 bytes or even 4 bytes if it needs to (as it might, for Chinese characters). For Western programmers, that means that UTF-8 is both efficient and flexible, which is why UTF-8 is the de facto standardard encoding for exchanging Unicode text.

There is, then, no such thing as THE Unicode encoding system or method. There are several encoding methods, and if you want to exchange text with someone, you need explicitly to specify which encoding method you are using.

Is it, say, this.

Or this.

Or something else.

Which brings us back to something I said earlier.

Why encode something in Unicode?

At the beginning of this post I said

We start with the idea of a piece of information — a message — that exists in a representation that is understandable (perspicuous) to a human being.

Next, (for reasons that we won’t explore right now) we need to be able to translate a message in a plain-text representation into some other representation. The translation from plain text to encoded text is called “encoding”, and the translation of encoded text back into plain text is called “decoding”.

OK. So now it is time to explore those reasons. Why might we want to translate a message in a plain-text representation into some other representation?

One reason, of course, is that we want to keep a secret. We want to hide the plain text of our message by encrypting and decrypting it — basically, by keeping the algorithms for encoding and decoding secret and private.

But that is a completely different subject. Right now, we’re not interested in keeping secrets; we’re Python programmers and we’re interested in Unicode. So:

Why — as a Python programmer — would I need to be able to translate a plain-text message into some encoded representation… say, a Unicode representation such as UTF-8?

Suppose you are happily sitting at your PC, working with your favorite text editor, writing the standard Hello World program in Python (specifically, in Python 3+). This single line is your entire program.

                   print("Hello, world!")

Here, “Hello, world!” is plain text. You can see it on your screen. You can read it. You know what it means. It is just a string and you can (if you wish) do standard string-type operations on it, such as taking a substring (a slice).

But now suppose you want to put this string — “Hello, world!” — into a file and save the file on your hard drive. Perhaps you plan to send the file to a friend.

That means that you must eject your poor little string from the warm, friendly, protected home in your Python program, where it exists simply as plain-text characters. You must thrust it into the cold, impersonal, outside world of the file system. And out there it will exist not as characters, but as mere 1’s and 0’s, a jumble of dits and dots, charged and uncharged particles. And that means that your happy little plain-text string must be represented by some specific configuration of 1s and 0s, so that when somebody wants to retrieve that collection of 1s and 0s and convert it back into readable plain text, they can.

The process of converting a plain text into a specific configuration of 1s and 0s is a process of encoding. In order to write a string to a file, you must encode it using some encoding system (such as UTF-8). And to get it back from a file, you must read the file and decode the collection of 1s and 0s back into plain text.

The need to encode/decode strings when writing/reading them from/to files isn’t something new — it is not an additional burden imposed by Python 3’s new support for Unicode. It is something you have always done. But it wasn’t always so obvious. In earlier versions of Python, the encoding scheme was ASCII. And because, in those olden times, ASCII was pretty much the only game in town, you didn’t need to specify that you wanted to write and read your files in ASCII. Python just assumed it by default and did it. But — whether or not you realized it — whenever one of your programs wrote or read strings from a file, Python was busy behind the scene, doing the encoding and decoding for you.

So that’s why you — as a Python programmer — need to be able to encode and decode text into, and out of, UTF-8 (or some other encoding: UTF-16, ASCII, whatever). You need to encode your strings as 1s and 0s so you can put those 1s and 0s into a file and send the file to someone else.

What is plain text?

Earlier, I said that there were three points worth noting about the encoding/decoding process, and I discussed the first two. Here is the third point.

The distinction between plain text and encoded text is relative and context-dependent.

As programmers, we think of plain text as being written text. But it is possible to look at matters differently. For instance, we can think of spoken text as the plain text, and written text as the encoded text. From this perspective, writing is encoded speech. And there are many different encodings for speech as writing. Think of Egyptian hieroglyphics, Mayan hieroglyphics, the Latin alphabet, the Greek alphabet, Arabic, Chinese ideograms, wonderfully flowing Devanagari देवनागरी, sharp pointy cuneiform wedges, even shorthand. These are all written encodings for the spoken word. They are all, as Thomas Hobbes put it, “Marks by which we may remember our thoughts”.

Which reminds us that, in a different context, even speech itself — language — may be regarded as a form of encoding. In much of early modern philosophy (think of Hobbes and Locke) speech (or language) was basically considered to be an encoding of thoughts and ideas. Communication happens when I encode my thought into language and say something — speak to you. You hear the sound of my words and decode it back into ideas. We achieve communication when I successfully transmit a thought from my mind to your mind via language. You understand me when — as a result of my speech — you have the same idea in your mind as I have in mine. (See Ian Hacking, Why Does Language Matter to Philosophy?)

Finally, note that in other contexts, the “plain text” isn’t even text. Where the plain text is soundwaves (e.g. music), it can be encoded as an mp3 file. Where the plain text is an image, it can be encoded as a gif, or png, or jpg file. Where the plain text is a movie, it can be encoded as a wmv file. And so on.

Everywhere, we are surrounded by encoding and decoding.


I’d like to recommend Eli Bendersky’s recent post on The bytes/str dichotomy in Python 3, which prodded me — finally — to put these thoughts into writing. I especially like this passage in his post.

Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we’re living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don’t care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).

I strongly recommend Charles Petzold’s wonderful book Code: The Hidden Language of Computer Hardware and Software.

And finally, I’ve found Stephen Pincock’s Codebreaker: The History of Secret Communications a delightful read. It will tell you, among many other things, how the famous WWII Navaho codetalkers could talk about submarines and dive bombers… despite the fact that there are no Navaho words for “submarine” or “dive bomber”.

June 22, 2017 09:27 PM

Philip Semanchuk

Analyzing the Anglo-Saxonicity of the Baby BNC


This is a followup to an earlier post about using Python to measure the “Anglo-Saxonicity” of a text. I’ve used my code to analyze the Baby version of the British National Corpus, and I’ve found some interesting results.

How to Measure Anglo-Saxonicity – With a Ruler or Yardstick?


Thanks to a suggestion from Ben Sizer, I decided to analyze the British National Corpus. I started with the ‘baby’ corpus which, as you might imagine, is smaller than the full corpus.

It’s described as a “100 million word snapshot of British English at the end of the twentieth century“. It categorizes text samples into four groups: academic, conversations, fiction, and news. Below are stack plots showing the percentage of Anglo-Saxon, non-Anglo-Saxon, and unknown words for each document in each of the four groups. The Y axis shows the percentage of words in each category. The numbers along the X axis identify individual documents within the group.

I’ve deliberately given the charts non-specific names of Group A, B, C, and D so that we can play a game. :-)

Before we get to the game, here’s the averages for each group in table form. (The numbers might not add exactly to 100% due to rounding.)

Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%)
Group A 67.0 17.7 15.3
Group B 56.1 25.8 18.1
Group C 72.9 13.2 13.9
Group D 58.6 22.0 19.3

Keep in mind that “unknown” words represent shortcomings in my database more than anything else.

The Game

The Baby BNC is organized into groups of academic, conversations, fiction, and news. Groups A, B, C, and D each represent one of those groups. Which do you think is which?

Click below to reveal the answer to the game and a discussion of the results.


Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%)
A = Fiction 67.0 17.7 15.3
B = Academic 56.1 25.8 18.1
C = Conversations 72.9 13.2 13.9
D = News 58.6 22.0 19.3


With the hubris that only 20/20 hindsight can provide, I’ll say that I don’t find these numbers terribly surprising. Conversations have the highest proportion of Anglo-Saxon (72.9%) and the lowest of non-Anglo-Saxon (13.2%). Conversations are apt to use common words, and the 100 most common words in English are about 95% Anglo-Saxon. The relatively fast pace of conversation doesn’t encourage speakers to pause to search for those uncommon words lest they bore their listener or lose their chance to speak. I think the key here is not the fact that conversations are spoken, but that they’re impromptu. (Impromptu if you’re feeling French, off-the-cuff if you’re more Middle-English-y, or extemporaneous if you want to go full bore Latin.)

Academic writing is on the opposite end of the statistics, with the lowest portion of Anglo-Saxon words (56.1%) and the highest non-Anglo-Saxon (25.8%). Academic writing tends to be more ambitious and precise. Stylistically, it doesn’t shy away from more esoteric words because its audience is, by definition, well-educated. It doesn’t need to stick to the common core of English to get its point across. In addition, those who shaped academia were the educated members of society, and for many centuries education was tied to the church or limited to the gentry, and both spoke a lot of Latin and French. That has probably influenced even the modern day culture of academic writing.

Two of the academic samples managed to use fewer than half Anglo-Saxon words. They are a sample from Colliding Plane Waves in General Relativity (a subject Anglo-Saxons spent little time discussing, I’ll wager) and a sample from The Lancet, the British medical journal (49% and 47% Anglo-Saxon, respectively). It’s worth noting that these samples also displayed highest and 5th highest percentage of words of unknown etymology (26% and 21%, respectively) of the 30 samples in this category. A higher proportion of unknowns depresses the results in the other two categories.

Fiction rests in the middle of this small group of 4 categories, and I’m a little surprised that the percentage of Anglo-Saxon is as high as it is. I feel like fiction lends itself to the kind of description that tends to use more non-Anglo-Saxon words, but in this sample it’s not all that different from conversation.

News stands out for having barely more Anglo-Saxon words than academic writing, and also the highest percentage of words of unknown etymological origin. The news samples are drawn principally from The Independent, The Guardian, The Daily Telegraph, The Belfast Telegraph, The Liverpool Daily Post and Echo, The Northern Echo, and The Scotsman. It would be interesting to analyze each of these groups independently to see if they differ significantly.


My hypothesis that conversations have a high percentage of Anglo-Saxon words because they’re off-the-cuff rather than because they’re spoken is something I can challenge with another experiment. Speeches are also spoken, but they’re often written in advance, without the pressure of immediacy, so the author would have time to reach for a thesaurus. I predict speeches will have an Anglo-Saxon/non-Anglo-Saxon profile closer to that of fiction than of either of the extremes in this data. It might vary dramatically based on speaker and audience, so I’ll have to choose a broad sample to smooth out biases.

I would also like to work with the American National Corpus.

Stay tuned, and let me know in the comments if you have observations or suggestions!





June 22, 2017 06:46 PM

Django Weblog

DjangoCon US Schedule Is Live

We are less than two months away from DjangoCon US in Spokane, WA, and we are pleased to announce that our schedule is live! We received an amazing number of excellent proposals, and the reviewers and program team had a difficult job choosing the final talks. We think you will love them. Thank you to everyone who submitted a proposal or helped to review them.

Tickets for the conference are still on sale! Check out our website for more information on which ticket type to select. We have also announced our tutorials. They are $150 each, and may be purchased at the same place as the conference tickets.

DjangoCon US will be held August 13-18 at the gorgeous Hotel RL in downtown Spokane. Our hotel block rate expires July 11, so reserve your room today!

June 22, 2017 05:46 PM

Mike Driscoll

Book Review: Software Architecture with Python

Packt Publishing approached me about being a technical reviewer for the book, Software Architecture with Python by Anand Balachandran Pillai. It sounded pretty interesting so I ended up doing the review for Packt. They ended up releasing the book in April 2017.

Quick Review

  • Why I picked it up: Packt Publishing asked me to do a technical review of the book
  • Why I finished it: Frankly because this was a well written book covering a broad range of topics
  • I’d give it to: Someone who is learning how to put together a large Python based project or application

  • Book Formats

    You can get this as a physical soft cover, Kindle on Amazon or various other eBook formats via Packt Publishing’s website.

    Book Contents

    This book has 10 chapters and is 556 pages long.

    Full Review

    The focus of this book is to educate the reader on how they might design and architect a highly scalable, robust application in Python. The first chapter starts off by going over the author’s ideas on the “principles of software architecture” and what they are. This chapter has no code examples whatsoever. It is basically all theory and basically sets up what the rest of the book will be covering.

    Chapter two is all about writing readable code that is easy to modify. It teaches some techniques for writing readable code and touches on recommendations regarding documentation, PEP8, refactoring, etc. It also teaches the fundamentals of writing modifiable code. Some of the techniques demonstrated in this chapter include abstracting common services, using inheritance, and late binding. It also discusses the topic of code smells.

    Chapter three clocks in at almost 50 pages and is focused on making testable code. While you can’t really teach testing in just one chapter, it does talk about such things as unit testing, using nose2 and py.test, code coverage, mocking and doctests. There is also a section on test driven development.

    In chapter four, we learn about getting good performance from our code. This chapter is about timing code, code profiling and high performance containers in Python. It covers quite a few modules / packages, such as cProfile, line profiler, memory profiler, objgraph and Pympler.

    For chapter five, we dig into the topic of writing applications that can scale. This chapter has some good examples and talks about the differences between concurrency, parallelism, multithreading vs multiprocessing and Python’s new asyncio module. It also discusses the Global Interpreter Lock (GIL) and how it effects Python’s performance in certain situations. Finally the reader will learn about scaling for the web and using queues, such as Celery.

    If you happen to be interested in security in Python, then chapter 6 is for you. It covers various types of security vulnerabilities in software in general and then talks about what the author sees as security problems in Python itself. It also discusses various coding strategies that help the developer write secure code.

    Chapter seven delves in to the subject of design patterns and is over 70 pages long. You will learn about such things as the singleton, factory, prototype, adapter, facade, proxy, iterator, observer and state patterns. This chapter does a nice job of giving an overview of design patterns, but I think a book that focuses a chapter per design pattern would be really interesting and really help drive the point home.

    Moving on, we get to chapter 8 which talks about “architectural patterns”. Here we learn about Model View Controller (MVC), which is pretty popular in the web programming sphere. We also learn a bit about event driven programming using twisted, eventlet, greenlet and Gevent. I usually think of a user interface using something like PyQt or wxPython when I think of event driven programming, but either way the concepts are the same. There is also a section on microservices in this chapter.

    Chapter nine’s focus is on deploying your Python applications. Here you will learn about using pip, virtualenv, PyPI, and PyPA. You will also learn a little about Fabric and Ansible in this chapter.

    The last chapter covers the techniques for debugging your applications. He starts with the basic print statement and moves on to using mocks and the logging module. He also talks about using pdb and similar tools such as iPdb and pdb++. The chapter is rounded out with sections on the trace module, the lptrace package and the strace package.

    This book is a bit different from your usual Python book in that it’s not really focused on the beginner. Instead we have professional software developer with nearly two decades of experience outlining some of the techniques he has used in creating his own applications at big companies. While there are a few minor grammatical issues here and there, overall I found this to be a pretty interesting book. I’m not saying that because I was a technical reviewer of the book. I have panned some of the books I have been a technical reviewer for in the past. This one is actually quite good and I would recommend it to anyone who wants to learn more about real world software development. It’s also good for people who want to learn about concurrency or design patterns.

    Software Architecture with Python

    by Anand Balachandran Pillai

    Amazon, Packt Publishing

    Other Book Reviews

    June 22, 2017 05:15 PM


    EuroPython 2017: Call for on-site volunteers

    Would you like to be more than a participant and contribute to make this 2017 edition of EuroPython a smooth success? Help us!

    We have a few tasks that are open for attendees who would like to volunteer: fancy helping at the registration desk? Willing to chair a session? Find out how you can contribute and which task you can commit to. 


    What kind of qualifications do you need?

    English is a requirement. More languages are an advantage. Check our webpage or write us for any further information. 

    The conference ticket is a requirement. We cannot give you a free ticket, but we would like to thank you with one of our volunteer perks.

    How do you sign up?

    You can sign up for activities on our EuroPython 2017 Volunteer App.

    We really appreciate your help!


    EuroPython 2017 Team
    EuroPython Society
    EuroPython 2017 Conference

    June 22, 2017 03:29 PM


    PyCharm Edu 4 EAP: Integration with Stepik for Educators

    PyCharm Educational Edition rolls out an Early Access Program update – download PyCharm Edu 4 EAP2 (build 172.3049).

    Integration with Stepik for Educators

    In 2016 we partnered with Stepik, a learning management and MOOC platform, to announce the Adaptive Python course. But if you want to create your own Python course with the help of PyCharm Edu, integration with Stepik may help you easily keep up your learning materials and share them with your students.

    Let’s take a simple example based on the Creating a Course with Subtasks tutorial and look at the integration features in more detail.

    Uploading a New Course

    Assume you’ve created a new course, added some lessons and checked the tasks:

    Screen Shot 2017-06-22 at 12.47.00

    Now you want to test the new course and share it with your students. Using Stepik as course platform is a great choice, thanks to integration with PyCharm Edu. First, you’ll need to create an account and log in:

    Screen Shot 2017-06-22 at 12.49.05

    Going back to PyCharm Edu, you can now see a special Stepik icon in the Status Bar:


    Use the link Log in to Stepik to be redirected to and authorize PyCharm Edu:

    Screen Shot 2017-06-22 at 12.50.01

    The Stepik Status Bar icon will be enabled after you authorize the course:


    Now you can upload the course to Stepik:

    Screen Shot 2017-06-22 at 12.52.26

    Screen Shot 2017-06-22 at 12.54.46

    Updating a Course

    Once a course is created and uploaded to Stepik, you can always add or change lessons or add subtasks to it, as we do in our example:

    Screen Shot 2017-06-22 at 16.02.29

    The whole course, a lesson or just a single task can be updated any time you want to save you changes on Stepik:

    Screen Shot 2017-06-22 at 16.08.06

    Screen Shot 2017-06-22 at 16.08.42

    Sharing a Course with Learners

    Stepik allows educators to manage their courses: you can make your course visible to everyone, or invite your students privately (students need to have a Stepik account):

    Screen Shot 2017-06-22 at 13.17.25

    Learners that have been invited to join the course can go to PyCharm Edu Welcome Screen | Browse Courses and log in to Stepik with a special link:


    The course is now available in the list:


    There you go. Let us know how you like this workflow! Share your feedback here in the comments or report your findings on YouTrack, to help us improve PyCharm Edu.

    To get all EAP builds as soon as we publish them, set your update channel to EAP (go to Help | Check for Updates, click the ‘Updates’ link, and then select ‘Early Access Program’ in the drop-down). To keep all your JetBrains tools updated, try JetBrains Toolbox App!

    Your PyCharm Edu Team

    June 22, 2017 02:52 PM

    Python Anywhere

    The PythonAnywhere API: beta now available for all users

    We've been slowly developing an API for PythonAnywhere, and we've now enabled it so that all users can try it out if they wish. Head over to your accounts page and find the "API Token" tab to get started.

    The API is still very much in beta, and it's by no means complete! We've started out with a few endpoints that we thought we ourselves would find useful, and some that we needed internally.

    screenshot of api page Yes, I have revoked that token :)

    Probably the most interesting functionality that it exposes at the moment is the ability to create, modify, and reload a webapp remotely, but there's a few other things in there as well, like file and console sharing. You can find a full list of endpoints here:

    We're keen to hear your thoughts and suggestions!

    June 22, 2017 12:12 PM

    A. Jesse Jiryu Davis

    New Driver Features for MongoDB 3.6

    At MongoDB World this week, we announced the big features in our upcoming 3.6 release. I’m a driver developer, so I’ll share the details about the driver improvements that are coming in this version. I’ll cover six new features—the first two are performance improvements with no driver API changes. The next three are related to a new idea, “MongoDB sessions”, and for dessert we’ll have the new Notification API.

    Wire Protocol Compression

    Since 3.4, MongoDB has used wire protocol compression for traffic between servers. This is especially important for secondaries streaming the primary’s oplog: we found that oplog data can be compressed 20x, allowing secondaries to replicate four times faster in certain scenarios. The server uses the Snappy algorithm, which is a good tradeoff between speed and compression.

    In 3.6 we want drivers to compress their conversations with the server, too. Some drivers can implement Snappy, but since zLib is more widely available we’ve added it as an alternative. When the driver and server connect they negotiate a shared compression format. (If you’re a security wonk and you know about CRIME, rest easy: we never compress messages that include the username or password hash.)

    In the past, we’ve seen that the network is the bottleneck for some applications running on bandwidth-constrained machines, such as small EC2 instances or machines talking to very distant MongoDB servers. Compressing traffic between the client and server removes that bottleneck.


    This is feature #2. We’re introducing a new wire protocol message called OP_MSG. This will be a modern, high-performance replacement for all our messy old wire protocol. To explain our motive, let’s review the history.

    Ye Olde Wire Protocol

    First, we had three kinds of write messages, all unacknowledged, and also a message for disposing a cursor:

    Ye Olde Wire Protocol

    There were also two kinds of messages that expected a reply from the server: one to create a cursor with a query, and another to get more results from the cursor:

    Ye Olde Wire Protocol

    Soon we added another kind of message: commands. We reused OP_QUERY, defining a command as a query on the fake $cmd collection. We also realized that our users wanted acknowledged writes, which we implemented as a write message immediately followed by a getLastError command. That brings us to this picture of Ye Olde Wire Protocol:

    Ye Olde Wire Protocol

    This protocol served us remarkably well for years: it implemented the features we wanted and it was quite fast. But its messiness made our lives hard when we wanted to innovate. We couldn’t add all the features we wanted to the wire protocol, so long as we were stuck with these old message types.

    Middle Wire Protocol

    In MongoDB 2.6 through 3.2 we unified all the message types. Now we have the Middle Wire Protocol, in which everything is a command:

    Middle Wire Protocol

    This is the wire protocol we use now. It’s uniform and flexible and has allowed us to rapidly add features, but it has some disadvantages:

    Why is it less efficient? Let’s see how a bulk insert is formatted in the Middle Wire Protocol:

    Middle Wire Protocol

    The “insert” command has a standard message header, followed by the command body as a single BSON document. In order to include a batch of documents in the command body, they must be subdocuments of a BSON array. This is a bit expensive for the client to assemble, and for the server to disassemble before it can insert the documents. The same goes for query replies: the server must assemble a BSON array of query results, and the client must disassemble it.

    Compare this to Ye Olde Wire Protocol’s OP_INSERT message:

    Ye Olde Wire Protocol

    Ye Olde Wire Protocol’s bulk insert message was simple and efficient: just a message header followed by a stream of documents catenated end-to-end with no delimiter. How can we get back to this simpler time?

    Modern Wire Protocol

    This winter, we’ll release MongoDB 3.6 with a new wire protocol message:

    Modern Wire Protocol

    For the first time, both the client and the server use the same message type. The new OP_MSG will combine the best of the old and the middle wire protocols. It was mainly designed by Mathias Stearn, a MongoDB veteran who applied the lessons he learned from our past protocols to make a robust new format that will stand the test of time.


    OP_MSG is the one message type to rule them all. We’ll implement it in stages. For 3.6, drivers will use document streams for efficient bulk writes. In subsequent releases the server will use document streams for query results, and we’ll add exhaust cursors, checksums, and unacknowledged writes. OP_MSG is extensible, so it will probably be our last big protocol change: all future changes to the protocol will work within the OP_MSG framework.

    You’ll get faster network communications from compression and OP_MSG as soon as you upgrade your driver and server, without any code changes. The remaining four features, however, do require new code.


    The new versions of all our drivers will introduce “sessions”, an idea that MongoDB drivers have never had before. Here’s how you’ll use a session in PyMongo:

    client = MongoClient('mongodb://srv1,srv2,srv3/?replicaSet=rs')
    # Start a session.
    with client.start_session() as session:
        collection = session.my_db.my_collection

    Now, instead of getting your database and collection objects from the MongoClient, you can choose to get them from a session instead. So far I haven’t shown you anything useful, though. Let’s look at two new features that make use of sessions.

    Retryable Writes

    Currently, statements like this are very convenient with MongoDB but they’re a bit unsafe:

    collection.update({'_id': 1}, {'$inc': {'x': 1}})

    Here’s the problem: what happens if your program tries to send this “update” message to the server, and it gets a network error before it can read the response? Perhaps there was a brief network glitch, or perhaps the primary stepped down. It’s not always possible to know whether the server received the update before the network error or not, so you can’t safely retry it; you risk executing the update twice.

    Last year at MongoDB World, I gave a 35-minute talk about how to make your operations idempotent so they could be safely retried, and I wrote a 3000-word article. All that is about to be obsolete. In MongoDB 3.6, the most common write messages will be retryable, safely and automatically.

    # Start a session for retryable writes.
    with client.start_session(retry_writes=True) as session:
        collection = session.my_db.my_collection
        result = collection.insert_one({'_id': 1, 'n': 0})
        result = collection.update_one({'_id': 1},
                                       {'$inc': {'n': 1}})
        result = collection.replace_one({'_id': 1},
                                        {'n': 42})
        result = collection.delete_one({'_id': 1})

    If any of these write operations fails from a network error, the driver automatically retries it once. This is safe to do now, because MongoDB 3.6 stores an operation ID and the outcome of any write in a session. If the driver sends the same operation twice, the server ignores the second operation, but replies with the stored outcome of the first attempt. This means we can do an update like $inc exactly once, even if there’s a flaky network or a primary stepdown.

    Most of the write methods you use day-to-day will be retryable in 3.6, like the CRUD methods above or the three findAndModify wrappers:

    # Start a session for retryable writes.
    with client.start_session(retry_writes=True) as session:
        collection = session.my_db.my_collection
        old_doc = collection.find_one_and_update({'_id': 2},
                                                 {'$inc': {'n': 1}})
        old_doc = collection.find_one_and_replace({'_id': 2},
                                                  {'n': 42})
        old_doc = collection.find_one_and_delete({'_id': 2})

    Operations that affect many documents, like update_many or delete_many, won’t be retryable for a while. But bulk inserts will be retryable in 3.6:

    # Start a session for retryable writes.
    with client.start_session(retry_writes=True) as session:
        collection = session.my_db.my_collection
        # insert_many is ok if "ordered" is True, the default.
        result = collection.insert_many([{'_id': 1}, {'_id': 2}])

    Causally Consistent Reads

    Besides retryable writes, sessions also allow “causally consistent” reads: mainly, this means you can reliably read your writes and do monotonic reads, even when reading from secondaries.

    # Start a session for causally consistent reads.
    with client.start_session(causally_consistent_reads=True):
        collection = session.my_db.my_collection
        result = collection.insert_one({'_id': 3})
        secondary_collection = session.my_db.get_collection(
        # Guaranteed to be available on secondary,
        # may block until then.
        doc = secondary_collection.find_one({'_id': 3})

    Right now, you can’t guarantee that between the time your program writes to the primary and when it reads from a secondary, that the secondary has replicated the write. It’s unpredictable whether querying the secondary will return fresh data or stale, so programs that need this guarantee must only read from the primary. Furthermore, if you spread your reads among secondaries, then results from different secondaries might jump back and forth in time.

    In 3.6, a causally consistent session will let you read your writes and guarantee monotonically increasing reads from secondaries. It even works in sharded clusters!

    The design, by Misha Tyulenev, Randolph Tan, and Andy Schwerin, uses a Lamport Clock to partially order events across all servers in a sharded cluster. Whenever the client sends a write operation to a server, the server notes the Lamport Clock value when the write was executed, and returns that value to the client. Then, if the client’s next message is a query, it asks the server to return query data that is causally after that Lamport Clock value:

    Causal ordering of events by Lamport Clock values

    If you query a secondary that hasn’t yet caught up to that point in time, according to the Lamport Clock, then your query blocks until the secondary replicates to that point. Yes, I know that in the diagram above we call the same value either “logicalTime”, “operationTime”, or “clusterTime” depending on the context. It’s complicated. Here’s the gist: this feature ensures that you can read your writes when reading from secondaries, and that each secondary query returns data causally after the previous one.

    Notification API

    Now it’s time for dessert! MongoDB 3.6 will have a new event notification API. You can subscribe to changes in a collection or a database, and you can filter or transform the events using an aggregation pipeline before you process them in your program:

    cursor = client.my_db.my_collection.changes([
        {'$match': {
            'operationType': {'$in': ['insert', 'replace']}
        {'$match': {
            'newDocument.n': {'$gte': 1}
    # Loops forever.
    for change in cursor:

    This code shows how you’ll use the API in PyMongo. The collection object will have a new method, “changes”, which takes an aggregation pipeline. In this example we listen for all inserts and replaces in a collection, and filter these events to include only documents whose “n” field is greater than or equal to 1. The result of the “changes” method is a cursor that emits changes as they occur.

    In the past we’ve seen that MongoDB users deploy a Redis server on the side, just for the sake of event notifications, or they use Meteor to get notifications from MongoDB. In 3.6, these notifications will be a MongoDB builtin feature. If you were maintaining both Redis and MongoDB, soon MongoDB alone will meet your needs. And if you want event notifications but you aren’t using Javascript and don’t want to rely on Meteor, you can write your own event system in any language without reinventing Meteor.

    Event notifications have two main uses. First, if you’re keeping a second system like Lucene synchronized with MongoDB, you don’t have to tail the oplog anymore. The event notification system is a well-designed API that lets you subscribe to changes which you can apply to your second system. The other use is for collaboration: when one user changes a shared piece of data, you want to update other users’ views. The new event notification API will make collaborative applications easy to build.

    I’m a coder, not a bullshitter, so I wouldn’t say this if it weren’t true: MongoDB 3.6 is the most significant advance in client-server features since I started working here more than five years ago. Compression will remove bottlenecks for bandwidth-limited systems, and OP_MSG paves the way for much more efficient queries and bulk operations. Retryable writes and causally consistent reads add guarantees that MongoDB users have long wished for. And the new event notification API makes MongoDB a natural platform for collaborative applications.

    June 22, 2017 08:12 AM

    Python Meeting Düsseldorf - 2017-06-28

    The following text is in German, since we're announcing a regional user group meeting in Düsseldorf, Germany.


    Das nächste Python Meeting Düsseldorf findet an folgendem Termin statt:

    28.06.2017, 18:00 Uhr
    Raum 1, 2.OG im Bürgerhaus Stadtteilzentrum Bilk
    Düsseldorfer Arcaden, Bachstr. 145, 40217 Düsseldorf


    Bereits angemeldete Vorträge

    Matthias Endler
            "Grumpy - Python to Go source code transcompiler and runtime"

    Tom Engemann
            "BeautifulSoup als Test framework für HTML"

    Jochen Wersdörfer
            "Machine Learning: Kategorisierung von FAQs"

    Linus Deike
            "Einführung in Machine Learning: Qualitätsprognose aus Sensordaten erstellen"

    Andreas Bresser
            "Bilderkennung mit OpenCV"

    Philipp v.d. Bussche & Marc-Andre Lemburg
            "Telegram Bot als Twitter Interface: TwitterBot"

    Weitere Vorträge können gerne noch angemeldet werden. Bei Interesse, bitte unter melden.

    Startzeit und Ort

    Wir treffen uns um 18:00 Uhr im Bürgerhaus in den Düsseldorfer Arcaden.

    Das Bürgerhaus teilt sich den Eingang mit dem Schwimmbad und befindet sich an der Seite der Tiefgarageneinfahrt der Düsseldorfer Arcaden.

    Über dem Eingang steht ein großes "Schwimm’ in Bilk" Logo. Hinter der Tür direkt links zu den zwei Aufzügen, dann in den 2. Stock hochfahren. Der Eingang zum Raum 1 liegt direkt links, wenn man aus dem Aufzug kommt.

    >>> Eingang in Google Street View


    Das Python Meeting Düsseldorf ist eine regelmäßige Veranstaltung in Düsseldorf, die sich an Python Begeisterte aus der Region wendet.

    Einen guten Überblick über die Vorträge bietet unser PyDDF YouTube-Kanal, auf dem wir Videos der Vorträge nach den Meetings veröffentlichen.

    Veranstaltet wird das Meeting von der GmbH, Langenfeld, in Zusammenarbeit mit Clark Consulting & Research, Düsseldorf:


    Das Python Meeting Düsseldorf nutzt eine Mischung aus (Lightning) Talks und offener Diskussion.

    Vorträge können vorher angemeldet werden, oder auch spontan während des Treffens eingebracht werden. Ein Beamer mit XGA Auflösung steht zur Verfügung.

    (Lightning) Talk Anmeldung bitte formlos per EMail an


    Das Python Meeting Düsseldorf wird von Python Nutzern für Python Nutzer veranstaltet.

    Da Tagungsraum, Beamer, Internet und Getränke Kosten produzieren, bitten wir die Teilnehmer um einen Beitrag in Höhe von EUR 10,00 inkl. 19% Mwst. Schüler und Studenten zahlen EUR 5,00 inkl. 19% Mwst.

    Wir möchten alle Teilnehmer bitten, den Betrag in bar mitzubringen.


    Da wir nur für ca. 20 Personen Sitzplätze haben, möchten wir bitten, sich per EMail anzumelden. Damit wird keine Verpflichtung eingegangen. Es erleichtert uns allerdings die Planung.

    Meeting Anmeldung bitte formlos per EMail an

    Weitere Informationen

    Weitere Informationen finden Sie auf der Webseite des Meetings:


    Viel Spaß !

    Marc-Andre Lemburg,

    June 22, 2017 08:00 AM

    June 21, 2017

    Continuum Analytics News

    It’s Getting Hot, Hot, Hot: Four Industries Turning Up The Data Science Heat

    Wednesday, June 21, 2017
    Christine Doig
    Christine Doig
    Sr. Data Scientist, Product Manager

    Summer 2017 has officially begun. As temperatures continue to rise, so does the use of data science across dozens of industries. In fact, IBM predicts the demand for data scientists will increase by 28 percent in just three short years, and our own survey recently revealed that 96 percent of company executives conclude data science is critical to business success. While it’s clear that health care providers, financial institutions and retail organizations are harnessing the growing power of data science, it’s time for more industries to turn up the data science heat. We take a peek below at some of the up and comers.  
    Aviation and Aerospace
    As data science continues to reach for the sky, it’s only fitting that the aviation industry is also on track to leverage this revolutionary technology. Airlines and passengers generate an abundance of data everyday, but are not currently harnessing the full potential of this information. Through advanced analytics and artificial intelligence driven by data science, fuel consumption, flight routes and air congestion could be optimized to improve the overall flight experience. What’s more, technology fueled by data science could help aviation proactively avoid some of the delays and inefficiencies that burden both staff and passengers—airlines just need to take a chance and fly with it! 

    In addition to aviation, cybersecurity has become an increasingly hot topic during the past few years. The global cost of handling cyberattacks is expected to rise from $400 billion in 2015 to $2.1 trillion by 2019, but implementing technology driven by data science can help secure business data and reduce these attacks. By focusing on the abnormalities, using all available data and automating whenever possible, companies will have a better chance at standing up to threatening attacks. Not to mention, artificial intelligence software is already being used to defend cyber infrastructure. 
    While improving data security is essential, the construction industry is another space that should take advantage of data science tools to improve business outcomes. As an industry that has long resisted change, some companies are now turning to data science technology to manage large teams, improve efficiency in the building process and reduce project delivery time, ultimately increasing profit margins. By embracing data analytics and these new technologies, the construction industry will also have more room to successfully innovate. 
    From aviation to cybersecurity to construction, it’s clear that product-focused industries are on track to leverage data science. But what about the more natural side of things? One example suggests ecologists can learn more about ocean ecosystems through the use of technology driven by data science. Through coding and the use of other data science tools, these environmental scientists found they could conduct better, more effective oceanic research in significantly less time. Our hope is for other scientists to continue these methods and unearth more pivotal information about our planet. 
    So there you have it. Four industries who are beginning to harness the power of data science to help transform business processes, drive innovation and ultimately change the world. Who will the next four be? 



    June 21, 2017 05:56 PM


    PyCharm 2017.2 EAP 4

    The fourth early access program (EAP) version of PyCharm 2017.2 is now available! Go to our website to download it now.

    New in this version:

    Please let us know how you like it! Users who actively report about their experiences with the EAP can win prizes in our EAP competition. To participate: just report your findings on YouTrack, and help us improve PyCharm.

    To get all EAP builds as soon as we publish them, set your update channel to EAP (go to Help | Check for Updates, click the ‘Updates’ link, and then select ‘Early Access Program’ in the dropdown). If you’d like to keep all your JetBrains tools updates, try JetBrains Toolbox!

    -PyCharm Team
    The Drive to Develop

    June 21, 2017 05:00 PM


    Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution

    Python 3 and multi-environment support, new state of the art package dependency solver, and over 450 packages now available free for all users

    Enthought Canopy logoEnthought is pleased to announce the release of Canopy 2.1, a significant feature release that includes Python 3 and multi-environment support, a new state of the art package dependency solver, and access to over 450 pre-built and tested scientific and analytic Python packages completely free for all users. We highly recommend that all current Canopy users upgrade to this new release.

    Ready to dive in? Download Canopy 2.1 here.

    For those currently familiar with Canopy, in this blog we’ll review the major new features in this exciting milestone release, and for those of you looking for a tool to improve your workflow with Python, or perhaps new to Python from a language like MATLAB or R, we’ll take you through the key reasons that scientists, engineers, data scientists, and analysts use Canopy to enable their work in Python.

    First, let’s talk about the latest and greatest in Canopy 2.1!

    1. Support for Python 3 user environments: Canopy can now be installed with a Python 3.5 user environment. Users can benefit from all the Canopy features already available for Python 2.7 (syntax checking, debugging, etc.) in the new Python 3 environments. Python 3.6 is also available (and will be the standard Python 3 in Canopy 2.2).
    2. All 450+ Python 2 and Python 3 packages are now completely free for all users: Technical support, full installers with all packages for offline or shared installation, and the premium analysis environment features (graphical debugger and variable browser and Data Import Tool) remain subscriber-exclusive benefits. See subscription options here to take advantage of those benefits.
    3. Built in, state of the art dependency solver (EDM or Enthought Deployment Manager): the new EDM back end (which replaces the previous enpkg) provides additional features for robust package compatibility. EDM integrates a specialized dependency solver which automatically ensures you have a consistent package set after installation, removal, or upgrade of any packages.
    4. Environment bundles, which allow users to easily share environments directly with co-workers, or across various deployment solutions (such as the Enthought Deployment Server, continuous integration processes like Travis-CI and Appveyor, cloud solutions like AWS or Google Compute Engine, or deployment tools like Ansible or Docker). EDM environment bundles not only allow the user to replicate the set of installed dependencies but also support persistence for constraint modifiers, the list of manually installed packages, and the runtime version and implementation.
    5. Multi-environment support: with the addition of Python 3 environments and the new EDM back end, Canopy now also supports managing multiple Python environments from the user interface. You can easily switch between Python 2.7 and 3.5, or between multiple 2.7 or 3.5 environments. This is ideal especially for those migrating legacy code to Python 3, as it allows you to test as you transfer and also provides access to historical snapshots or libraries that aren’t yet available in Python 3.

    Why Canopy is the Python platform of choice for scientists and engineers

    Since 2001, Enthought has focused on making the scientific Python stack accessible and easy to use for both enterprises and individuals. For example, Enthought released the first scientific Python distribution in 2004, added robust and corporate support for NumPy on 64-bit Windows in 2011, and released Canopy 1.0 in 2013.

    Since then, with its MATLAB-like experience, Canopy has enabled countless engineers, scientists and analysts to perform sophisticated analysis, build models, and create cutting-edge data science algorithms. Canopy’s all-in-one package distribution and analysis environment for Python has also been widely adopted in organizations who want to provide a single, unified platform that can be used by everyone from data analysts to software engineers.

    Here are five of the top reasons that people choose Canopy as their tool for enabling data analysis, data modelling, and data visualization with Python:

    1. Canopy provides a complete, self-contained installer that gets you up and running with Python and a library of scientific and analytic tools – fast

    Canopy has been designed to provide a fast installation experience which not only installs the Canopy analysis environment but also the Python version of your choice (e.g. 2.7 or 3.5) and a core set of curated Python packages. The installation process can be executed in your home directory and does not require administrative privileges.

    In just minutes, you’ll have a fully working Python environment with the primary tools for doing your work pre-installed: Jupyter, Matplotlib, NumPy and SciPy optimized with the latest MKL from Intel, Matplotlib, Scikit-learn, and Pandas, plus instant access to over 450 additional pre-built and tested scientific and analytic packages to customize your toolset.

    No command line, no complex multi-stage setups! (although if you do prefer a flat, standalone command line interface for package and environment management, we offer that too via the EDM tool)

    2. Access to a curated, quality assured set of packages managed through Canopy’s intuitive graphical package manager

    The scientific Python ecosystem is gigantic and vibrant. Enthought is continuously updating its Enthought Python Distribution package set to provide the most recent “Enthought approved” versions of packages, with rigorous testing and quality assessment by our experts in the Python packaging ecosystem before release.

    Our users can’t afford to take chances with the stability of their software and applications, and using Canopy as their gateway to the Python ecosystem helps take the risk out of the “wild west” of open source software. With more than 450 tested, pre-built and approved packages available in the Enthought Python Distribution, users can easily access both the most current stable version as well as historical versions of the libraries in the scientific Python stack.

    Consistent with our focus on ease-of-use, Canopy provides a graphical package manager to easily search, install and remove packages from the user environment. You can also easily roll back to earlier versions of a package. The underlying EDM back end takes care of complex dependency management when installing, updating, and removing packages to ensure nothing breaks in the process.

    3. Canopy is designed to be extensible for the enterprise

    Canopy not only provides a consistent Python toolset for all 3 major operating systems and support for a wide variety of use cases (from data science to data analysis to modelling and even application development), but it is also extensible with other tools.

    Canopy can easily be integrated with other software tools in use at enterprises, such with Excel via PyXLL or with LabVIEW from National Instruments using the Python Integration Toolkit for LabVIEW. The built-in Canopy Data Import Tool helps you automate your data ingestion steps and automatically import tabular data files such as CSVs into Pandas DataFrames.

    But it doesn’t stop there. If an enterprise has Python embedded in another software application, Canopy can be directly connected to that application to provide coding and debugging capabilities. Canopy itself can even be customized or embedded to provide a sophisticated Python interface for your applications. Contact us to learn more about these options.

    Finally, in addition to accessing the libraries in the Enthought Python Distribution from Canopy, users can use the same tools to share and deploy their own internal, private packages by adding the Enthought Deployment Server.  The Enthought Deployment Server also allows enterprises to have a private,  onsite copy of the full Enthought Python Distribution on their own approved servers and compliant with their existing security protocols.


    5. Canopy’s straightforward analysis environment, specifically tailored to the needs and workflow of scientists, analysts, and engineers

    Three integrated features of the Canopy analysis environment combine to create a powerful, yet streamlined platform: (1) a code editor, (2) an interactive graphical debugger with variable browser, and (3) an IPython window.

    Finally, access to package documentation at your fingertips in Canopy is a great benefit to faster coding. Canopy not only integrates online documentation and examples for many of the most used packages for data visualization, numerical analysis, machine learning, and more, but also let you easily extract and execute code from that documentation to get started working with starter code quickly.

    We’re very excited for this major release and all of the new capabilities that it will enable for both individuals and enterprises, and encourage you to download or update to the new Canopy 2.1 today.

    Have feedback on your experience with Canopy?

    We’d love to hear about it! Contact the product development team at

    Additional Resources:

    Blog: New Year, New Enthought Products (Jan 2017)

    Blog: Enthought Presents the Canopy Platform at the 2017 American Institute of Chemical Engineers (AIChE) Spring Meeting

    Product pages:

    The post Enthought Announces Canopy 2.1: A Major Milestone Release for the Python Analysis Environment and Package Distribution appeared first on Enthought Blog.

    June 21, 2017 04:30 PM


    New Course: Deep Learning in Python (first Keras 2.0 online course!)

    Hello there! We have a special course released today: Deep Learning in Python by Dan Becker. This happens to be one of the first online interactive course providing instructions in keras 2.0, which now supports Tensorflow integration and this new API will be consistent in the coming years. So you've come to the right deep learning course.

    About the course:

    Artificial neural networks (ANN) are a biologically-inspired set of models that facilitate computers learning from observed data. Deep learning is a set of algorithms that use especially powerful neural networks. It is one of the hottest fields in data science, and most state-of-the-art results in robotics, image recognition and artificial intelligence (including the famous AlphaGo) use deep learning. In this course, you'll gain hands-on, practical knowledge of how to use neural networks and deep learning with Keras 2.0, the latest version of a cutting edge library for deep learning in Python.

    Take me to Chapter 1!

    Deep Learning in Python features interactive exercises that combine high-quality video, in-browser coding, and gamification for an engaging learning experience that will make you a master in deep learning in Python!

    What you'll learn:

    In the first chapter, you'll become familiar with the fundamental concepts and terminology used in deep learning, and understand why deep learning techniques are so powerful today. You'll build simple neural networks yourself and generate predictions with them. You can take this chapter here for free.

    In chapter 2, you'll learn how to optimize the predictions generated by your neural networks. You'll do this using a method called backward propagation, which is one of the most important techniques in deep learning. Understanding how it works will give you a strong foundation to build from in the second half of the course.

    In the third chapter, you'll use the keras library to build deep learning models for both regression as well as classification! You'll learn about the Specify-Compile-Fit workflow that you can use to make predictions and by the end of this chapter, you'll have all the tools necessary to build deep neural networks!

    Finally, you'll learn how to optimize your deep learning models in keras. You'll learn how to validate your models, understand the concept of model capacity, and experiment with wider and deeper networks. Enjoy!

    Dive into Deep Learning Today

    June 21, 2017 02:10 PM


    EuroPython 2017: Conference App available

    We are pleased to announce our very own mobile app for the EuroPython 2017 conference:


    EuroPython 2017 Conference App

    Engage with the conference and its attendees

    The mobile app gives you access to the conference schedule (even offline), helps you in planing your conference experience (create your personal schedule) and provides a rich social engagement platform for all attendees.

    You can create a profile within the app (or link this to your existing social accounts), share messages and photos, and easily reach out to other fellow attendees - all from within the app.

    Vital for all EuroPython attendees

    We will again use the conference app to keep you updated by sending updates of the schedule and inform you of important announcements via push notifications, so please consider downloading it.

    Many useful features

    Please see our EuroPython 2017 Conference App page for more details on features and guides on how to use them.

    Don’t forget to get your EuroPython ticket

    If you want to join the EuroPython fun, be sure to get your tickets as soon as possible, since ticket sales have picked up quite a bit after we announced the schedule.


    EuroPython 2017 Team
    EuroPython Society
    EuroPython 2017 Conference

    June 21, 2017 11:56 AM