skip to navigation
skip to content

Planet Python

Last update: May 23, 2018 01:46 PM

May 23, 2018

Reinout van Rees

Djangocon: can packaging improve django deployments? - Markus Zapke-Gründemann

(One of my summaries of a talk at the 2018 european djangocon.)

Markus started created websites at the end of the 1990s. He calls himself an open source software developers. Open source is how he learned programming, so he likes giving back.

How can packaging make deployments easier, faster and more reliable?

He showed a django project structure, modified for packaging it with python. Extra files are, setup.cfg, And the actual project code is in a subdirectory, as that's what likes. If you have apps, he suggests to put them in an yourproject/apps/ subdirectory.

Many people use requirements.txt files to list their dependencies. They only use for libraries or apps that they publish. But why not use the also for projects? There's an install_requires key you can use for mentioning your dependencies.

Note that if your setuptools is new enough, you can put everything into setup.cfg instead of having it all in a weird function in This includes the dependencies. Your will now look like this:

from setuptools import setup


As we're deploying a project (instead of a library), we can even pin the requirements to specific versions.

He mentioned bumpversion to update your version number.

Regarding the it lists the files you want to include in the package. You can install the check-manifest package to test your manifest.

Building the package: python bdist_wheel. You get a python wheel out of this.

What he uses for deployment is pip-tools. There are more modern alternatives, but he thinks they're not ready yet. With pip-tools he generates a constraints.txt. Basically a requirements.txt, but then for the dependencies of the packages in your requirements.txt. You can pass the constraints file to pip install with -c constraints.txt.

How to serve the package? You can use "devpi" to host your own private python package index.

How to change settings? Use environment variables. Except for the secrets. Use a vault for this if possible. There are some python packages that can help you with environment variables:


  • If you package your django project as a python package, you're hosting provider independent.
  • And you use tools you already know: it prevents the not-invented-here syndrome.
  • It improves deployment to many servers.
  • The same release is used everywhere: dev, CI, staging, production.
  • And: rollback is easy.
  • Nice: a built distribution requires no build steps.

His slides are at

Some notes/comments from myself:

  • I'm starting to like pipenv, which does the requirements/constraints handling automatically. And in a more elegant way. The big advantage is that you don't have to remember to pass the correct arguments all the time. Much safer that way (in my opinion).
  • He mentioned "bumpversion". I myself wrote and use zest.releaser, which also updates your changelog, has a plugin mechanism, etc. It is widely used.

Photo explanation: when illuminated, this is what you see of the office.

May 23, 2018 10:45 AM

Djangocon: it's not a bug, it's a bias - Anna-Livia Gomart

(One of my summaries of a talk at the 2018 european djangocon.)

If you have a bad experience with a program, is it a bug of can there be a bias in the software? Apple's Siri, at first, had answers for questions like "where do I hide a body?", but it didn't have an answer for "where can I get an abortion?"...

Are algorithms neutral or can they be biased? You can have biased programs, biased users and biased data.

Programs are made by programmers. Programmers are humans. Humans have experiences. Experience is often colored...

We, as programmers, can read the algorithms. So it is our role (or our duty) to critique them. Look at computer games. Often there is "procedural rhetoric" in it: some worldview or agenda that is embedded in the game's enviroment or rules.

Facebook has a definition of "friend". So in fact they're re-defining "friend"...

A problem you often have: normalcy. As designer/programmer/manager, you have certain assumptions. These might not be true for the user. "Everyone lives in a EU city and has 4G". "No-one has a last name called Null". "Everybody reads English perfectly".

An important bias: the negativity bias. Negative reviews are notice much more than positive reviews. One bad review can ruin a small business.

Machines can be biased, too. Say, you notice that you yourself are biased when hiring people. That's not good. So you let the computer make the decision based on machine learning. You feed it the list of current employees..... And you have a biased algorithm.

To make it worse... Imaginge you turning it into a website that is used by many companies.... Suddenly there's an algorithm trained on biased data that makes or breaks people.

How do you discover biases? Catch alienating experiences, for instance. Look in support emails. "My last name really is one character long".

What helps? Have a diverse team. Various backgrounds. Various ages. Various life experiences. And check your normalcy. Check if your assumptions are true. (There apparently is a list on github about "things developers get wrong", but I can't find the right one).

Photo explanation: I'm making interiors, too, so I need to paint figures. 1:87

May 23, 2018 10:45 AM

Djangocon: writing code.... evolve it instead - Emma Gordon

(One of my summaries of a talk at the 2018 european djangocon.)

Emma is member of the Cambridge programmers' study group. One of the topic there at one time was machine learning. Cool. But she started thinking about computers learning to write code theirselves. What would that mean for us, programmers, regarding job security?

Automation/mechanisation has eliminated lots of jobs already. Self-checkouts at supermarkets. Agricultural mechanisation. Will the same happen to programmers?

IBM created a huge computer called "Watson" that could win at some game show (jeopardy). Machine learning, natural language processing, etc. What is interesting about that, for Emma, is that "Watson" could be used for "reading" medical journal papers. The computer could amass knowledge (especially of edge cases) which a human doctor could not do: how many papers can you read per week? Same with legal stuff.

Human doctors and lawyers won't be replaced, but parts of their jobs could be done/assisted by computers.

So what about us, programmers? Code is text.

You could look at genetic algorithms. Random search, guided random search. It takes its inspiration from biology. You could generate random strings and hope they'll match this talk's title (for instance). You need a "fitness function" that tells you whether the random solution is any good (note that this is quite different from regular biology, as we're moving towards a single "good" solution).

Programmatically, you do some randomness, some combination of reasonable candidates and you loop through it. She showed a demo that actually did find the title.

But..... Emma still had to type in the title herself in the fitness function. Generating program code that you don't know yet... You at least needs a language for it.

"That ought to be a solved problem". Yes it is. Kory Becker has a website ( A minimal turing complete language that uses just a few symbols (+-><[]^ (and I probably missed one)). She showed the standard "hello world" program and it looked like +++++++^[<--++ for about 400 characters.

Unreadable for us, but it is something we could feed to a genetic algorithm. She showed a demo where she tried to evolve to the two-character word hi. It took a horrid long time to finish.

So our jobs are fine.

You can look at But take it with a grain of salt. Computer programmer is at risk with 48%. But software developers are safe with 4.2%. That's because they treat "programmer" as the one that builds the programs designed by the "software developer".

What is easy to automate? Rote and routine jobs. What is hard to automate? Varied/unpredictable tasks. Human interaction. Creativity.

Take self-driving cars. You're on the public road system. There can be road works. There can be a football rolling unto the road. Lots of variation. That is hard.

There have been changes in jobs because of automation in the past.

  • Positive: increased productivity, so decreased costs, so increased demand, so more jobs.

    Negative: lower wages.

  • Positive: there's creation of new job types! If you don't have to be a factory worker anymore, you've got people available for other jobs.

    Negative: growing inequality. The ones losing their jobs probably aren't the ones most qualified for newer jobs that probably need more skills.

  • Positive: productivity increase per worker needed to offset ageing population.

    Negative: potential mass unemployment.

There have been people thinking about what society's response could be:

  • Universal income?
  • Re-training? Lifelong education?
  • Tax on robot productivity?

There are also moral and legal issues?

  • Privacy concerns.
  • Legal: who is responsible if a self-driving car kills someone? The driver? The company? The programmer?

Lots of thinkwork to be done.

Photo explanation: I'm constructing a new building for my model railway.

May 23, 2018 10:45 AM

May 22, 2018

Python Sweetness

Mitogen for Ansible status, 23 May

This is the third update on the status of developing Mitogen for Ansible.

Too long, didn’t read

A beta is coming soon! Aside from async tasks, the master branch is looking great. Since last update there have been many features and fixes, but with important forks in the road ahead, particularly around efficient support for many-host. Read on..

Just tuning in?

Done: File Transfer

File transfer previously worked by constructing one RPC representing the complete file, which for large files resulted in an explosion in memory usage on each machine as the message was enqueued and transferred, with communication at each hop blocked until the message was delivered. This has required a rewrite since the original code was written, but a simple solution proved elusive.

Today file transfer is all but solved: files are streamed in 128KiB-sized messages, using a dedicated service that aggregates pending transfers by their most directly connected stream, serving one file at a time before progressing to the next transfer. An initial burst of 128KiB chunks is generated to fill a link with a 1MiB BDP, with further chunks sent as acknowledgements begin to arrive from the receiver. As an optimization, files 32KiB or smaller are still delivered in a single RPC, avoiding one roundtrip in a common scenario.

Compared to sftp(1) or scp(1), the new service has vastly lower setup overhead (1 RTT vs. 5) and far better safety properties, ensuring concurrent use of the API by unrelated ansible-playbook runs cannot create a situation where an inconsistent file may be observed by users, or a corrupt file is deployed with no indication a problem exists.

Since file transfer is implemented in terms of Mitogen’s message bus, it is agnostic to Connection Delegation, allowing streaming file transfers between proxied targets regardless of how the connection is set up.

Some minor problems remain: the scheduler cannot detect a timed out transfer, risking a cascading hang when Connection Delegation is in use. This is not a regression compared to previously, as Ansible does not support this operation mode. In both cases during normal operation, the timeout will eventually be noticed when the underlying SSH connection times out.

Connection Delegation

Connection Delegation enables Ansible to use one or more intermediary machines to reach a target machine or container, with connections and code uploads deduplicated at each hop in the path. For an Ansible run against many containers on one target host, only one SSH connection to the target need exist, and module code need only be uploaded once on that connection.

While not yet complete, this feature exists today and works well, however some important functionality is still missing. Presently intermediary connection setup is single threaded, non-Python (i.e. Ansible) module uploads are duplicated, and the code to infer intermediary connection configurations using the APIs available in Ansible is.. hairy at best.

Fixing deduplication and single-threaded connection setup entails starting a service thread pool within each interpreter that will act as an intermediary. This requires some reworking of the nascent service framework, also making it easier to use for non-Ansible programs, and lays the groundwork for Topology-aware File Synchronization.

Custom module_utils

From the department of surprises, this one is a true classic. Ansible supports an undocumented (upstream docs patch) but nonetheless commonly used mechanism for bundling third party modules and overriding built-in support modules as part of the ZIP file deployed to the target. It implements this by virtualizing a core Ansible package namespace: ansible.module_utils, causing what Python finds there to vary on a per-task basis, and crucially, to have its implementation diverge entirely from the equivalent import in the Ansible controller process.

It is suffice to say I nearly lost my mind on discovering this “feature”, not due to the functionality it provides, but the manner in which it opts to provide it. Rather than loading a core package namespace as a regular Python package using Mitogen’s built-in mechanism, every Ansible module must undergo additional dependency scanning using its unique search path, and any dependencies found must correctly override existing loaded modules appearing in the target interpreter’s namespace at runtime.

Given Mitogen’s intended single-reusable-interpreter design, there is no way to support this without tempting strange behaviours appearing across tasks whose ansible.module_utils search path varies. While it is easy to arrange for ansible.module_utils.third_party_module to be installed, it is impossible to uninstall it while ensuring every reference to the previous implementation, including instances of every type defined by it, are extricated from the reusable interpreter post-execution, which is necessary if the next module to use the interpreter imports an entirely distinct implementation of ansible.module_utils.third_party_module.

Today, instead the interpreter forks when an extended or overridden module is found, and a custom importer is used to implement the overrides. This introduces an unavoidable inefficiency when the feature it in use, but it is still far better than always forking, or running the risk of varying module_utils search paths causing unfixable crashes.

Container Connections

To aid a common use-case for Connection Delegation, new connection types were added to support Linux containers and FreeBSD jails. It is now possible to run Ansible within a remote container reached via SSH, solving a common upstream feature request.

Presently although the container must have Python installed, matching Ansible’s existing behaviour, it occurred to me that when the host machine has Python installed, there is no reason why Python needs to exist within the container. This would make a powerful feature made easy through Mitogen’s design, and in a common use case, would support the ability to run auditing/compliance playbooks against app containers that were otherwise never customized for use with Ansible.

Su Become Method Support

Low-hanging fruit from the original crowdfunding plan. Now su(1) may be used for privilege escalation as easily as sudo(1).

Sudo/Su Connection Types

To support testing and somewhat uncommon use cases where a large number of user accounts may be targeted for parallel deployment on a small number of machines, there now exist explicit mitogen_sudo and mitogen_su connection types that, in combination with Connection Delegation, allow a single SSH connection to exist to a remote machine while exposing user accounts as individual (and therefore parallelizable) targets in Ansible inventory.

This sits somewhere between “hack” and “gorgeous”, I really have no idea which, however it does make it simple to exploit Ansible’s parallelism in certain setups, such as traditional web hosting where each customer exists as a UNIX account on a small number of machines.


Unidirectional Routing exists and is always enabled for Ansible. This prohibits what was previously a new communication style available to targets, that, although ideally benign and potentially very powerful, fundamentally altered Ansible’s security model and risked solution acceptance. It was possible for targets to send each other messages, and although permission checks occur on reception and thus should be harmless, represented the ability for otherwise air-gapped networks to be temporarily bridged for the duration of a run.

Secrets Masking

Mitogen supports new Blob() and Secret() string wrappers whose repr() contains a substitute for the actual value. These are employed in the Ansible extension, ensuring passwords and bulk file transfer data are no longer logged when verbose output is enabled. The types are preserved on deserialization, ensuring log messages generated by targets receive identical treatment.

User/misc bug fixes

Asynchronous Tasks (.. again, and again)

Ongoing work on the asynchronous task implementation has caused it to evolve once again, this time to make use of a new subtree detachment feature in the core library. The new approach is about 70% of what is needed for the final design, with one major hitch remaining.

Since an asynchronous task must outlive its parent, it must have a copy of every dependency needed by the module it will execute prior to disconnecting from the parent. This is exorbitantly fiddly work, interacting with many aspects including not least custom module_utils, and represents the last major obstacle in producing a functionally complete extension release.

Industrial grade multiplexing

Mitogen now supports swapping select(2) for epoll(4) or kqueue(2) depending on the host operating system, blasting through the maximum file descriptor limit of select(2), and ensuring this is no longer a hindrance for many-target runs. Children initially use the select(2) multiplexer (tiny and guaranteed available) until they become parents, when the implementation is transparently swapped for the real deal.

In future some interface tweaks are desirable to make full use of the new multiplexers: at least epoll(4) supports options that significantly reduce the system calls necessary to configure it. Although I have not measured a performance regression due to these calls, their presence is bothersome.

Many-Target Performance

Some expected growing pains appeared when real multiplexing was implemented. For testing I adopted a network of VMs running DebOps common.yml, with a quota for up to 500 targets, but so far, it is not possible to approach that without drowning in the kinks that start to appear. While some of these almost certainly lie on the Mitogen side, when profiling with only 40 targets enabled, inefficiencies in Mitogen are buried in the report by extreme inefficiencies present in Ansible itself.

Among the problems:

And with that we reach a nexus: we have almost exhausted what can be accomplished working from the bottom-up, profiling on a micro scale is no longer sufficient to meet project goals, while fixing problems identified through profiling on a macro scale exceeds the project scope. Therefore, (lightning bolts, wild cackles), a new plan emerges..

Branching for a beta

With the exception of async tasks I consider the master branch to be in excellent health - for smaller target counts. For larger runs, wider-reaching work is necessary, but it does not make sense to disrupt the existing design due to it. Therefore master will be branched with the new branch kept open for fixes, not least the final pieces of async, while continuing work in parallel on a new increment.

Extension v2

Vanilla Ansible forks each time it executes a task, with the corresponding action plug-in gaining control of the main thread until completion, upon which all state aside from the task result is lost. When running under the extension, a connection multiplexer process is forked once at startup, and a separate broker thread exists in each forked task subprocess that connects back to the connection multiplexer process over a UNIX socket - necessary in the current design to have a persistent location to manage connections.

The new design comes in the form of a complete reworking of the Ansible linear strategy. Today’s extension wraps Ansible’s strategies while preserving their process and execution model. To implement the enhancements above sensibly, additional persistence is required and it becomes necessary to tackle a strategy implementation head-on.

The old desire for per-CPU connection multiplexers is incorporated, but moves those multiplexers back into Ansible, much like the pre-crowdfund extension. The top-level controller process gains a Mitogen broker thread with per-CPU forked children acting as connection multiplexers, and hosting service threads on which action plug-ins can sleep. Unlike vanilla Ansible, these processes exist for the duration of the run rather than per-task.

From the vantage point of only $ncpus processes, it is easy to fix template precompilation, plug-in path caching, connection caching, target<->worker affinity, and ensuring task variable generation is parallelized. Some sizeable obstacles exist, not least:

Can’t this be done upstream?

It should, but I’ve experimented and there simply isn’t time. If >1 week is reasonable to add missing documentation, there is no hope real patches will land before full-time work must conclude. For upstreaming to happen the onus lies with the 20+ strong permanent team, it’s simply not possible to commit unbounded time to land even trivial changes, a far cry from occasional patches to a privately controlled repository.

At least 16k words have been spent since conversations started around September 2017, and while they bore some fruit over time, few actionable outcomes have resulted, and the detectable levels of team-originated engagement regarding the work has been minimal. There is no expectation of fireworks, however it may be helpful to realize after 3 months no evidence exists of any member testing the code and experiencing success or failure, let alone a report of such.

It’s sufficient to say after so long I find this increasingly troublesome, and while I cannot hope to understand internal priorities, as an outside contributor funded by end users, soliciting engagement on a well-documented enhancement that in some scenarios nets an order of magnitude performance improvement to a commercial product, some rather basic questions come to mind.

Code Quality

There is a final uneasy aspect to upstreaming, and it is that of being left with the task of cleaning up, with no guarantee the mess won’t simply return. Some of this code is in an abject (253 LOC, 37 locals) state (279 LOC, 24 locals) of sin (306 LOC, 38 locals), for 2018 and in a product less than 72 months old, that has been funded almost since inception. While I have begun refactoring the strategy plug-in within the confines of the Mitogen repository, responsibility for benefitting from that work in mainline rests with others.

Until next time!

May 22, 2018 11:32 PM

Reinout van Rees

I'm in Heidelberg for the djangocon 2018 (plus other info)

Yes, I'm in Heidelberg (Germany) for the annual European djangocon. That's not a particularly shocking discovery, as I've been to quite a number :-) There are however some reasons for me to write this entry.

  • The most important reason is that I've set up my old blog software on a brand new laptop. "Old software" means "python 2" and it is "python 3" now. And I've modernized the setup on my server a bit.

    So: this entry is my first test if my setup also works when writing a blog entry. That's not something I want to test when I'm making summaries tomorrow.

  • New laptop? Two months ago I asked advice: mac/linux. Both had advantages.

    What I type this on is a linux laptop. A powerful Dell precision 5520! I'll probably write about my new setup later :-)

  • New laptop? Yes, because of a new job. The company is also quite new (from February this year), it doesn't have a website yet: "Triple S transformations". I'd describe my job as senior developer with a big focus on python and open source.

    Triple S is part of VolkerWessels, the second biggest Dutch construction firm. Our aim is to replace expensive and non-customizable commercial packages with open source throughout the company. With open source, we can adjust everything to our needs, be more effective and lower costs.

    We intend to be a good open source citizen, giving back most of our work. One of my responsibilities will be to help with that. I do have some experience at that within the python/plone/django world :-) Currently everything is a tad hidden away in a private gitlab instance. We'll fix that.

  • Django at the new job? Not immediately, but I'm pretty sure we'll use lots of it. The python part that's now in use is Odoo (formerly OpenERP), so I'll probably look at that first.

  • "Triple S"? 3xs: secure, scalable, simple. A bit of a motto. Security is a necessity. Scalability also, if you start small (for one of the 120 companies belonging to VolkerWesses) and want to roll out the open source goodies for all 120 of them!

  • Hurray, they're paying my ticket. So they're paying for all the nice summaries you're all going to get :-)

    Should you like working for such a new we're-just-starting company with an explicit open source agenda... We're located a few meters to the south of Utrecht (NL). I only started working there halfway last week, so I don't know about our hiring process/desires yet, so mail me if interested and I'll ask around.

    (Note that we're mostly speaking Dutch at the office.)

  • I went to Heidelberg with the train. For those interested in those kinds of things: I took the ICE to Köln, then through the beautiful Eifel region via Gerolstein to Trier. From there along the river Saar to Saarbrücken and on to Mannheim. Then in 10 minutes to Heidelberg.

    On the way back, I'll go along the river Rhein ('Rhine') to Koblenz. Then along the river Lahn to Limburg-am-Lahn. Through the (hopefully beautiful) Westerwald via Westerburg to Au. Then on to Köln and back home again.

Now I'll prepare the template for tomorrow's summaries and go to bed afterwards.

May 22, 2018 08:01 PM

Possbility and Probability

A lite refactoring with PyCharm

Inspired by @levels I decided to try adding a Telegram integration to my current project RemoteMatcher. After seeing it work in production I decided to expand on the idea a little bit and do some refactoring with PyCharm. Let me … Continue reading

The post A lite refactoring with PyCharm appeared first on Possibility and Probability.

May 22, 2018 07:27 PM

Stack Abuse

Implementing LDA in Python with Scikit-Learn

In our previous article Implementing PCA in Python with Scikit-Learn, we studied how we can reduce dimensionality of the feature set using PCA. In this article we will study another very important dimensionality reduction technique: linear discriminant analysis (or LDA). But first let's briefly discuss how PCA and LDA differ from each other.

PCA vs LDA: What's the Difference?

Both PCA and LDA are linear transformation techniques. However, PCA is an unsupervised while LDA is a supervised dimensionality reduction technique.

PCA has no concern with the class labels. In simple words, PCA summarizes the feature set without relying on the output. PCA tries to find the directions of the maximum variance in the dataset. In a large feature set, there are many features that are merely duplicate of the other features or have a high correlation with the other features. Such features are basically redundant and can be ignored. The role of PCA is to find such highly correlated or duplicate features and to come up with a new feature set where there is minimum correlation between the features or in other words feature set with maximum variance between the features. Since the variance between the features doesn't depend upon the output, therefore PCA doesn't take the output labels into account.

Unlike PCA, LDA tries to reduce dimensions of the feature set while retaining the information that discriminates output classes. LDA tries to find a decision boundary around each cluster of a class. It then projects the data points to new dimensions in a way that the clusters are as separate from each other as possible and the individual elements within a cluster are as close to the centroid of the cluster as possible. The new dimensions are ranked on the basis of their ability to maximize the distance between the clusters and minimize the distance between the data points within a cluster and their centroids. These new dimensions form the linear discriminants of the feature set.

Let us now see how we can implement LDA using Python's Scikit-Learn.

Implementing LDA with Scikit-Learn

Like PCA, the Scikit-Learn library contains built-in classes for performing LDA on the dataset. In this section we will apply LDA on the Iris dataset since we used the same dataset for the PCA article and we want to compare results of LDA with PCA. The information about the Iris dataset is available at the following link:

The rest of the sections follows our traditional machine learning pipeline:

Importing Libraries

import numpy as np  
import pandas as pd  

Importing the Dataset

url = ""  
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']  
dataset = pd.read_csv(url, names=names)  

Data Preprocessing

Once dataset is loaded into a pandas data frame object, the first step is to divide dataset into features and corresponding labels and then divide the resultant dataset into training and test sets. The following code divides data into labels and feature set:

X = dataset.iloc[:, 0:4].values  
y = dataset.iloc[:, 4].values  

The above script assigns the first four columns of the dataset i.e. the feature set to X variable while the values in the fifth column (labels) are assigned to the y variable.

The following code divides data into training and test sets:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)  

Feature Scaling

As was the case with PCA, we need to perform feature scaling for LDA too. Execute the following script to do so:

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()  
X_train = sc.fit_transform(X_train)  
X_test = sc.transform(X_test)  

Performing LDA

It requires only four lines of code to perform LDA with Scikit-Learn. The LinearDiscriminantAnalysis class of the sklearn.discriminant_analysis library can be used to Perform LDA in Python. Take a look at the following script:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda = LDA(n_components=1)  
X_train = lda.fit_transform(X_train, y_train)  
X_test = lda.transform(X_test)  

In the script above the LinearDiscriminantAnalysis class is imported as LDA. Like PCA, we have to pass the value for the n_components parameter of the LDA, which refers to the number of linear discriminates that we want to retrieve. In this case we set the n_components to 1, since we first want to check the performance of our classifier with a single linear discriminant. Finally we execute the fit and transform methods to actually retrieve the linear discriminants.

Notice, in case of LDA, the transform method takes two parameters: the X_train and the y_train. However in the case of PCA, the transform method only requires one parameter i.e. X_train. This reflects the fact that LDA takes the output class labels into account while selecting the linear discriminants, while PCA doesn't depend upon the output labels.

Training and Making Predictions

Since we want to compare the performance of LDA with one linear discriminant to the performance of PCA with one principal component, we will use the same Random Forest classifier that we used to evaluate performance of PCA-reduced algorithms.

Execute the following code:

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(max_depth=2, random_state=0), y_train)  
y_pred = classifier.predict(X_test)  

Evaluating the Performance

As always, the last step is to evaluate performance of the algorithm with the help of a confusion matrix and find the accuracy of the prediction. Execute the following script:

from sklearn.metrics import confusion_matrix  
from sklearn.metrics import accuracy_score

cm = confusion_matrix(y_test, y_pred)  
print('Accuracy' + str(accuracy_score(y_test, y_pred)))  

The output of the script above looks like this:

[[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]
Accuracy 1.0  

You can see that with one linear discriminant, the algorithm achieved an accuracy of 100%, which is greater than the accuracy achieved with one principal component, which was 93.33%.

PCA vs LDA: What to Choose for Dimensionality Reduction?

In case of uniformly distributed data, LDA almost always performs better than PCA. However if the data is highly skewed (irregularly distributed) then it is advised to use PCA since LDA can be biased towards the majority class.

Finally, it is beneficial that PCA can be applied to labeled as well as unlabeled data since it doesn't rely on the output labels. On the other hand, LDA requires output classes for finding linear discriminants and hence requires labeled data.

For more info on how you can utilize Python to handle your data science tasks, you should check out more in-depth resources like Data Science in Python, Pandas, Scikit-learn, Numpy, Matplotlib.

May 22, 2018 03:37 PM

Mike Driscoll

Filling PDF Forms with Python

Fillable forms have been a part of Adobe’s PDF format for years. One of the most famous examples of fillable forms in the United States are documents from the Internal Revenue Service. There are lots of government forms that use fillable forms. There are many different approaches for filling in these forms programmatically. The most time consuming method I have heard about is to just recreate the form in ReportLab by hand and then fill it in. Frankly I think this is probably the worst idea, except when your company is in charge of creating the PDFs itself. Then that might be a viable option because you then have complete control over the PDF creation and the inputs that need to go into it.

Creating a Simple Form

We need a simple form to use for our first example. ReportLab has built-in support for creating interactive forms, so let’s use ReportLab to create a simple form. Here is the code:

from reportlab.pdfgen import canvas
from reportlab.pdfbase import pdfform
from reportlab.lib.colors import magenta, pink, blue, green
def create_simple_form():
    c = canvas.Canvas('simple_form.pdf')
    c.setFont("Courier", 20)
    c.drawCentredString(300, 700, 'Employment Form')
    c.setFont("Courier", 14)
    form = c.acroForm
    c.drawString(10, 650, 'First Name:')
    form.textfield(name='fname', tooltip='First Name',
                   x=110, y=635, borderStyle='inset',
                   borderColor=magenta, fillColor=pink, 
                   textColor=blue, forceBorder=True)
    c.drawString(10, 600, 'Last Name:')
    form.textfield(name='lname', tooltip='Last Name',
                   x=110, y=585, borderStyle='inset',
                   borderColor=green, fillColor=magenta, 
                   textColor=blue, forceBorder=True)
    c.drawString(10, 550, 'Address:')
    form.textfield(name='address', tooltip='Address',
                   x=110, y=535, borderStyle='inset',
                   width=400, forceBorder=True)
    c.drawString(10, 500, 'City:')
    form.textfield(name='city', tooltip='City',
                   x=110, y=485, borderStyle='inset',
    c.drawString(250, 500, 'State:')
    form.textfield(name='state', tooltip='State',
                   x=350, y=485, borderStyle='inset',
<strong>                   forceBorder=True)
    c.drawString(10, 450, 'Zip Code:')
    form.textfield(name='zip_code', tooltip='Zip Code',
                   x=110, y=435, borderStyle='inset',
if __name__ == '__main__':

When you run this example, the interactive PDF form looks like this:

Now we are ready to learn one of the ways that we can fill in this form!

Merging Overlays

Jan Chęć wrote an article on Medium that contained several different approaches to this problem of filling in forms in PDFs. The first solution proposed was to take an unfilled form in a PDF and create a separate PDF using ReportLab that has the data we want to us to “fill” this form. The author then used pdfrw to merge the two PDFs together. You could theoretically use PyPDF2 for the merging process too. Let’s go ahead and take a look at how this approach might work using the pdfrw package.

Let’s get started by installing pdfrw:

python -m pip install pdfrw

Now that we have those installed, let’s create a file called We will add two functions to this file. The first function will create our overlay. Let’s check that out:

import pdfrw
from reportlab.pdfgen import canvas
def create_overlay():
    Create the data that will be overlayed on top
    of the form that we want to fill
    c = canvas.Canvas('simple_form_overlay.pdf')
    c.drawString(115, 650, 'Mike')
    c.drawString(115, 600, 'Driscoll')
    c.drawString(115, 550, '123 Greenway Road')
    c.drawString(115, 500, 'Everytown')
    c.drawString(355, 500, 'IA')
    c.drawString(115, 450, '55555')

Here we import the pdfrw package and we also import the canvas sub-module from ReportLab. Then we create a function called create_overlay that creates a simple PDF using ReportLab’s Canvas class. We just use the drawString canvas method. This will take some trial-and-error. Fortunately on Linux and Mac, there are decent PDF Previewer applications that you can use to just keep the PDF open and they will automatically refresh with each change. This is very helpful in figuring out the exact coordinates you need to draw your strings to. Since we created the original form, figuring out the offset for the overlay is actually pretty easy. We already knew where on the page the form elements were, so we can make a good educated guess of where to draw the strings to.

The next piece of the puzzle is actually merging the overlay we created above with the form we created in the previous section. Let’s write that function next:

def merge_pdfs(form_pdf, overlay_pdf, output):
    Merge the specified fillable form PDF with the 
    overlay PDF and save the output
    form = pdfrw.PdfReader(form_pdf)
    olay = pdfrw.PdfReader(overlay_pdf)
    for form_page, overlay_page in zip(form.pages, olay.pages):
        merge_obj = pdfrw.PageMerge()
        overlay = merge_obj.add(overlay_page)[0]
    writer = pdfrw.PdfWriter()
    writer.write(output, form)
if __name__ == '__main__':

Here we open up both the form and the overlay PDFs using pdfrw’s PdfReader classes. Then we loop over the pages of both PDFs and merge them together using PageMerge. At the end of the code, we create an instance of PdfWriter that we use to write the newly merged PDF out. The end result should look like this:

Note: When I ran this code, I did receive some errors on stdout. Here’s an example:

[ERROR] stream /Length attribute (171) appears to be too small (size 470) -- adjusting (line=192, col=1)

As I mentioned, this doesn’t actually prevent the merged PDF from being created. But you might want to keep an eye on these as they might hint at a problem should you have any issues.

Other Ways to Fill Forms

I have read about several other ways to “fill” the fields in these kinds of PDFs. One of them was to take a PDF and save the pages as a series of images. Then draw rectangles at the locations you want to add text and then use your new image as a config file for filling out the PDF. Seems kind of wacky and frankly I don’t want to go to all that work.

A better method would be to open a PDF in a PDF editor where you can add invisible read-only fields. You can label the fields with unique names and then access them via the PDF’s metadata. Loop over the metadata and use ReportLab’s canvas methods to create an overlay again and then merge it in much the same way as before.

I have also seen a lot of people talking about using Forms Data Format or FDF. This is the format that PDFs are supposed to use to hold that data that is to be filled in a PDF. You can use PyPDFtk and PdfJinja to do the form filling. Interestingly, PyPDFtk doesn’t work with image fields, such as where you might want to paste a signature image. You can use PdfJinja for this purpose. However PdfJinja seems to have some limitations when working with checkboxes and radioboxes.

You can read more about these topics at the following links:

Using the pdfforms Package

The package that I think holds the most promise in regards to simplicity to use is the new pdfforms package. It requires that you install a cross-platform application called pdftk though. Fortunately pdftk is free so that’s not really a problem.

You can install pdfforms using pip like this:

python -m pip install pdfforms

To use pdfforms, you must first have it inspect the PDF that contains a form so it knows how to fill it out. You can do the inspection like this:

pdfforms inspect simple_form.pdf

If pdfforms works correctly, it will create a “filled” PDF in its “test” sub-folder. This sub-folder appears next to where pdfforms itself is, not where you run it from. It will fill the form with numbers in a sequential order. These are the field numbers.

The next thing you do is create a CSV file where the first column and row contains the name of the PDF. The other rows in the first column correspond to the field numbers. You enter the numbers of the fields that you want to fill here. Then you enter the data you want to fill use in the form in the third column of your CSV file. The second column is ignored, so you can put a description here. All columns after the third column are also ignored, so these can be used for whatever you want.

For this example, your CSV file might look something like this:

1,first name,Mike
2,last name,Driscoll

Once you have the CSV filled out, you can run the following command to actually fill your form out with your custom data:

pdfforms fill data.csv

The filled PDF will appear in a sub-folder called filled by default.

Now on to the bad news. I wasn’t able to get this to work correctly on Windows or Mac. I got the inspect step to work on Windows, but on Mac it just hangs. On Windows, when I run the fill command it just fails with an error about not finding the PDF to fill.

I think when this package becomes less error-prone, it will be really amazing. The only major downside other than it having issues running is that you need to install a 3rd party tool that isn’t written in Python at all.

Wrapping Up

After looking at the many different options available to the Python developer for filling PDF forms, I think the most straight-forward method is creating the overlay and then merging it to the fillable form PDF using a tool like pdfrw. While this feels a bit like a hack, the other methods that I have seen seem just as hacky and just as time consuming. Once you have the position of one of the cells in the form, you can reasonably calculate the majority of the others on the page.

Additional Reading

May 22, 2018 05:05 AM

May 21, 2018

Peter Bengtsson

Writing a custom Datadog reporter using the Python API

Datadog is an awesome sofware-as-a-service where you can aggregate and visualize statsd metrics sent from an application. For visualizing timings you create a time series graph. It can look something like this:

Time series

This time series looks sane because because it's timings made very very frequently. But what if it happens very rarely. Like once a day. Then, the graph doesn't look very useful. See this example:

Not only is it happening rarely, the amount of seconds is really quite hard to parse. I.e. what's 2.6 million milliseconds (answer is approximately 45 minutes). So to solve that I used the Datadog API. Now I can get a metric of every single point in milliseconds and I can make a little data table with human-readable dates and times.

The end result looks something like this:

|          WHEN           |        TIME AGO        |       TIME TOOK       |
| Mon 2018-05-21T17:00:00 | 2 hours 43 minutes ago | 23 minutes 32 seconds |
| Sun 2018-05-20T17:00:00 | 1 day 2 hours ago      | 20 seconds            |
| Sat 2018-05-19T17:00:00 | 2 days 2 hours ago     | 20 seconds            |
| Fri 2018-05-18T17:00:00 | 3 days 2 hours ago     | 2 minutes 24 seconds  |
| Wed 2018-05-16T20:00:00 | 4 days 23 hours ago    | 38 minutes 38 seconds |

It's not gorgeous and there are a lot of caveats but it's at least really easy to read. See the code here.

I don't think you can run this code since you don't have the same (hardcoded) metrics but hopefully it can serve as an example to whet your appetite.

What I'm going to do next, if I have time, is to run this as a Flask app instead that outputs a HTML table on a Herokup app or something.

May 21, 2018 08:31 PM

Tryton News

Translation Release for series 4.8

Due to a mistake in the process of generating the translations, the initial release of series 4.8 contained some unstranslated strings. We decided to make new set of releases with the correct translations even if it breaks the rule of no database updates for bug fix releases.

If you have already updated your server to the series 4.8, you need to also update the database for this bug fix release. Sorry for the inconvenient.

This release additionaly includes the Spanish Chart of Accounts which has cleaner design and is now available for the 4.8 series.

May 21, 2018 06:00 PM

Python Software Foundation

2018 Python Software Foundation Board Election: What is it and how can I learn more?

Every year the Python Software Foundation announces an open call for nominations for the PSF Board. Following the 2017 PSF members vote, only a subset of the entire board’s seats are open. This year there are four seats available - three (3) seats each with a three year term and one (1) seat that will finish the last two years of a three year term. Nominations for the board are open through May 25th, 2018 23:59:59 AoE.

Who can vote and how can I vote?

Voting for this year’s PSF Board Directors elections are set to begin on June 1st, 2018. To vote in the elections you must be registered as a voting member of the Python Software Foundation (see the FAQ here). You can register on the Membership page at

What does a board member do?

Expectations for board members are outlined on the Python wiki here. Basic requirements for board members include participation in monthly (remote) meetings as well as participation for the 2 to 3 in-person meetings.

Who can run for the PSF board and how can I nominate myself and/or someone else?

Anyone can run for a board member, as outlined by the PSF bylaws (reference Article V). Candidates can be either self-nominated or be nominated by another party. When nominating another person, the nomination requires consent of the potinental nominee.

To enter a nomination the following steps must be completed:

After nominations close, voting will begin on June 1st, 2018. If you wish to vote see voting details above. Additionally, PSF Director Thomas Wouters shared information about the nomination process on Twitter.

Who are the current board members?

The current directors are listed on the PSF website here.

How can I learn more?

Tomorrow on May 22nd the PSF will have an open Slack channel for 24 hours to discuss the election, the PSF, and the responsibilities of the PSF board. Current and outgoing directors will be monitoring the channel to respond to questions as well as PSF staff. You can join the Slack channel here.

May 21, 2018 05:17 PM


Generating Climate Temperature Spirals in Python

Ed Hawkins, a climate scientist, tweeted the following animated visualization in 2017 and captivated the world:

This visualization shows the deviations from the average temperature between 1850 and 1900. It was reshared millions of times over Twitter and Facebook and a version of it was even shown at the opening ceremony for the Rio Olympics.

The visualization is compelling, because it helps viewers understand both the varying fluctuations in temperatures, and the sharp overall increases in average temperatures in the last 30 years.

You can read more about the motivation behind this visualization on Ed Hawkins' website.

In this blog post, we'll walk through how to recreate this animated visualization in Python. We'll specifically be working with pandas (for representing and munging the data) and matplotlib (for visualizing the data). If you're unfamiliar with matplotlib, we recommend going through the Exploratory Data Visualization and Storytelling Through Data Visualization courses.

We'll use the following libraries in this post:

  • Python 3.6
  • Pandas 0.22
  • Matplotlib 2.2.2

This post is part of our focus on nature data this month. Learn more, and check out our other posts here.

Data cleaning

The underlying data was released by the Met Office in the United Kingdon, which does excellent work on weather and climate forecasting. The dataset can be downloaded directly here.

The openclimatedata repo on Github contains some helpful data-cleaning code in this notebook. You'll need to scroll down to the section titled Global Temperatures.

The following code reads the text file into a pandas data frame:

hadcrut = pd.read_csv(
    usecols=[0, 1],

Then, we need to:

  • split the first column into month and year columns
  • rename the 1 column to value
  • select and save all but the first column (0)
hadcrut['year'] = hadcrut.iloc[:, 0].apply(lambda x: x.split("/")[0]).astype(int)
hadcrut['month'] = hadcrut.iloc[:, 0].apply(lambda x: x.split("/")[1]).astype(int)

hadcrut = hadcrut.rename(columns={1: "value"})
hadcrut = hadcrut.iloc[:, 1:]

value year month
0 -0.700 1850 1
1 -0.286 1850 2
2 -0.732 1850 3
3 -0.563 1850 4
4 -0.327 1850 5

To keep our data tidy, let's remove rows containing data from 2018 (since it's the only year with data on 3 months, not all 12 months).

hadcrut = hadcrut.drop(hadcrut[hadcrut['year'] == 2018].index)

Lastly, let's compute the mean of the global temperatures from 1850 to 1900 and subtract that value from the entire dataset. To make this easier, we'll create a multiindex using the year and month columns:

hadcrut = hadcrut.set_index(['year', 'month'])

This way, we are only modifying values in the value column (the actual temperature values). Finally, calculate and subtract the mean temperature from 1850 to 1900 and reset the index back to the way it was before.

hadcrut -= hadcrut.loc[1850:1900].mean()
hadcrut = hadcrut.reset_index()
year month value
0 1850 1 -0.386559
1 1850 2 0.027441
2 1850 3 -0.418559
3 1850 4 -0.249559
4 1850 5 -0.013559

Cartesian versus polar coordinate system

There are a few key phases to recreating Ed's GIF:

  • learning how to plot on a polar cooridnate system
  • transforming the data for polar visualization
  • customizing the aesthetics of the plot
  • stepping through the visualization year-by-year and turning the plot into a GIF

We'll start by diving into plotting in a polar coordinate system.

Most of the plots you've probably seen (bar plots, box plots, scatter plots, etc.) live in the cartesian coordinate system. In this system:

  • x and y (and z) can range from negative infinity to positive infinity (if we're sticking with real numbers)
  • the center coordinate is (0,0)
  • we can think of this system as being rectangular


In contrast, the polar coordinate system is circular and uses r and theta. The r coordinate specifies the distance from the center and can range from 0 to infinity. The theta coordinate specifies the angle from the origin and can range from 0 to 2*pi.


To learn more about the polar coordinate system, I suggest diving into the following links:

Preparing data for polar plotting

Let's first understand how the data was plotted in Ed Hawkins' original climate spirals plot.

The temperature values for a single year span almost an entire spiral / circle. You'll notice how the line spans from January to December, but doesn't connect to January again. Here's just the 1850 frame from the GIF:


This means that we need to subset the data by year and use the following coordinates:

  • r: temperature value for a given month, adjusted to contain no negative values.
    • Matplotlib supports plotting negative values, but not in the way you think. We want -0.1 to be closer to the center than 0.1, which isn't the default matplotlib behavior.
    • We also want to leave some space around the origin of the plot for displaying the year as text.
  • theta: generate 12 equally spaced angle values that span from 0 to 2*pi.

Let's dive into how to plot just the data for the year 1850 in matplotlib, then scale up to all years. If you're unfamiliar with creating Figure and Axes objects in matplotlib, I recommend our Exploratory Data Visualization course.

To generate a matplotlib Axes object that uses the polar system, we need to set the projection parameter to "polar" when creating it.

fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')

Here's what the default polar plot looks like:


To adjust the data to contain no negative temperature values, we need to first calculate the minimum temperature value:


Let's add 1 to all temperature values, so they'll be positive but there's still some space reserved around the origin for displaying text:


Let's also generate 12 evenly spaced values from 0 to 2*pi and use the first 12 as the theta values:

import numpy as np
hc_1850 = hadcrut[hadcrut['year'] == 1850]

fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')

r = hc_1850['value'] + 1
theta = np.linspace(0, 2*np.pi, 12)

To plot data on a polar projection, we still use the Axes.plot() method but now the first value corresponds to the list of theta values and the second value corresponds to the list of r values.

ax1.plot(theta, r)

Here's what this plot looks like:


Tweaking the aesthetics

To make our plot close to Ed Hawkins', let's tweak the aesthetics. Most of the other matplotlib methods we're used to having when plotting normally on a cartesian coordinate system carry over. Internally, matplotlib considers theta to be x and r to be y.

To see this in action, we can hide all of the tick labels for both axes using:


Now, let's tweak the colors. We need the background color within the polar plot to be black, and the color surrounding the polar plot to be gray. We actually used an image editing tool to find the exact black and gray color values, as hex values:

  • Gray: #323331
  • Black: #000100

We can use fig.set_facecolor() to set the foreground color and Axes.set_axis_bgcolor() to set the background color of the plot:


Next, let's add the title using Axes.set_title():

ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 30})

Lastly, let's add the text in the center that specifies the current year that's being visualized. We want this text to be at the origin (0,0), we want the text to be white, have a large font size, and be horizontally center-aligned.

ax1.text(0,0,"1850", color='white', size=30, ha='center')

Here's what the plot looks like now (recall that this is just for the year 1850).


Plotting the remaining years

To plot the spirals for the remaining years, we need to repeat what we just did but for all of the years in the dataset. The one tweak we should make here is to manually set the axis limit for r (or y in matplotlib).

This is because matplotlib scales the size of the plot automatically based on the data that's used. This is why, in the last step, we observed that the data for just 1850 was displayed at the edge of the plotting area. Let's calculate the maximum temperature value in the entire dataset and add a generous amount of padding (to match what Ed did).


We can manually set the y-axis limit using Axes.set_ylim()

ax1.set_ylim(0, 3.25)

Now, we can use a for loop to generate the rest of the data. Let's leave out the code that generates the center text for now (otherwise each year will generate text at the same point and it'll be very messy):

fig = plt.figure(figsize=(14,14))
ax1 = plt.subplot(111, projection='polar')

ax1.set_ylim(0, 3.25)

theta = np.linspace(0, 2*np.pi, 12)

ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 20})

years = hadcrut['year'].unique()

for year in years:
    r = hadcrut[hadcrut['year'] == year]['value'] + 1
#     ax1.text(0,0, str(year), color='white', size=30, ha='center')
    ax1.plot(theta, r)

Here's what that plot looks like:


Customizing the colors

Right now, the colors feel a bit random and don't correspond to the gradual heating of the climate that the original visualization conveys well. In the original visualiation, the colors transition from blue / purple, to green, to yellow. This color scheme is known as a sequential colormap, because the progression of colors reflects some meaning from the data.

While it's easy to specify a color map when creating a scatter plot in matplotlib (using the cm parameter from Axes.scatter(), there's no direct parameter to specify a colormap when creating a line plot. Tony Yu has an excellent short post on how to use a colormap when generating scatter plots, which we'll use here.

Essentially, we use the color (or c) parameter when calling the Axes.plot() method and draw colors from<colormap_name>(index). Here's how we'd use the viridis colormap:

ax1.plot(theta, r, # Index is a counter variable

This will result in the plot having sequential colors from blue to green, but to get to yellow we can actually multiply the counter variable by 2:

ax1.plot(theta, r,*2))

Let's reformat our code to incorporate this sequential colormap.

fig = plt.figure(figsize=(14,14))
ax1 = plt.subplot(111, projection='polar')


for index, year in enumerate(years):
    r = hadcrut[hadcrut['year'] == year]['value'] + 1
    theta = np.linspace(0, 2*np.pi, 12)
    ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 20})
    ax1.set_ylim(0, 3.25)
#     ax1.text(0,0, str(year), color='white', size=30, ha='center')
    ax1.plot(theta, r,*2))

Here's what the resulting plot looks like:


Adding temperature rings

While the plot we have right now is pretty, a viewer can't actually understand the underlying data at all. There's no indication of the underlying temperature values anywhere in the visulaization.

The original visualization had full, uniform rings at 0.0, 1.5, and 2.0 degrees Celsius to help with this. Because we added 1 to every temperature value, we need to do the same thing here when plotting these uniform rings as well.

The blue ring was originally at 0.0 degrees Celsius, so we need to generate a ring where r=1. The first red ring was originally at 1.5, so we need to plot it at 2.5. The last one at 2.0, so that needs to be 3.0.

full_circle_thetas = np.linspace(0, 2*np.pi, 1000)
blue_line_one_radii = [1.0]*1000
red_line_one_radii = [2.5]*1000
red_line_two_radii = [3.0]*1000

ax1.plot(full_circle_thetas, blue_line_one_radii, c='blue')
ax1.plot(full_circle_thetas, red_line_one_radii, c='red')
ax1.plot(full_circle_thetas, red_line_two_radii, c='red')

Lastly, we can add the text specifying the ring's temperature values. All 3 of these text values are at the 0.5*pi angle, at varying distance values:

ax1.text(np.pi/2, 1.0, "0.0 C", color="blue", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 2.5, "1.5 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 3.0, "2.0 C", color="red", ha='center', fontdict={'fontsize': 20})


Because the text for "0.5 C" gets obscured by the data, we may want to consider hiding it for the static plot version.

Generating The GIF Animation

Now we're ready to generate a GIF animation from the plot. An animation is a series of images that are displayed in rapid succession.

We'll use the matplotlib.animation.FuncAnimation function to help us with this. To take advantage of this function, we need to write code that:

  • defines the base plot appearance and properties
  • updates the plot between each frames with new data

We'll use the following required parameters when calling FuncAnimation():

  • fig: the matplotlib Figure object
  • func: the update function that's called between each frame
  • frames: the number of frames (we want one for each year)
  • interval: the numer of milliseconds each frame is displayed (there are 1000 milliseconds in a second)

This function will return a matplotlib.animation.FuncAnimation object, which has a save() method we can use to write the animation to a GIF file.

Here's some skeleton code that reflects the workflow we'll use:

# To be able to write out the animation as a GIF file
import sys
from matplotlib.animation import FuncAnimation

# Create the base plot
fig = plt.figure(figsize=(8,8))
ax1 = plt.subplot(111, projection='polar')

def update(i):
    # Specify how we want the plot to change in each frame.
    # We need to unravel the for loop we had earlier.
    year = years[i]
    r = hadcrut[hadcrut['year'] == year]['value'] + 1
    ax1.plot(theta, r,*2))
    return ax1

anim = FuncAnimation(fig, update, frames=len(years), interval=50)'climate_spiral.gif', dpi=120, writer='imagemagick', savefig_kwargs={'facecolor': '#323331'})    

All that's left now is to re-format our previous code and add it to the skeleton above. We encourage you to do this on your own, to practice programming using matplotlib. Here's what the final animation looks like in lower resolution (to decrease loading time).

Next Steps

In this post, we explored:

  • how to plot on a polar coordinate system
  • how to customize text in a polar plot
  • how to generate GIF animations by interpolating multiple plots

You're able to get most of the way to recreating the excellent climate spiral GIF Ed Hawkins originally released. Here are the few key things that are that we didn't explore, but we strongly encourage you to do so on your own:

  • Adding month values to the outer rim of the polar plot/
  • Adding the current year value in the center of the plot as the animation is created.
    • If you try to do this using the FuncAcnimation() method, you'll notice that the year values are stacked on top of each other (instead of clearing out the previous year value and displaying a new year value).
  • Adding a text signature to the bottom left and bottom right corners of the figure.
  • Tweaking how the text for 0.0 C, 1.5 C, and 2.0 C intersect the static temperature rings.

May 21, 2018 04:11 PM

Real Python

Introduction to Python 3

Python is a high-level, interpreted scripting language developed in the late 1980s by Guido van Rossum at the National Research Institute for Mathematics and Computer Science in the Netherlands. The initial version was published at the alt.sources newsgroup in 1991, and version 1.0 was released in 1994.

Python 2.0 was released in 2000, and the 2.x versions were the prevalent releases until December 2008. At that time, the development team made the decision to release version 3.0, which contained a few relatively small but significant changes that were not backward compatible with the 2.x versions. Python 2 and 3 are very similar, and some features of Python 3 have been backported to Python 2. But in general, they remain not quite compatible.

Both Python 2 and 3 have continued to be maintained and developed, with periodic release updates for both. As of this writing, the most recent versions available are 2.7.15 and 3.6.5. However, an official End Of Life date of January 1, 2020 has been established for Python 2, after which time it will no longer be maintained. If you are a newcomer to Python, it is recommended that you focus on Python 3, as this tutorial will do.

Python is still maintained by a core development team at the Institute, and Guido is still in charge, having been given the title of BDFL (Benevolent Dictator For Life) by the Python community. The name Python, by the way, derives not from the snake, but from the British comedy troupe Monty Python’s Flying Circus, of which Guido was, and presumably still is, a fan. It is common to find references to Monty Python sketches and movies scattered throughout the Python documentation.

Why Choose Python?

If you’re going to write programs, there are literally dozens of commonly used languages to choose from. Why choose Python? Here are some of the features that make Python an appealing choice.

Python has been growing in popularity over the last few years. The 2018 Stack Overflow Developer Survey ranked Python as the 7th most popular and the number one most wanted technology of the year. World-class software development countries around the globe use Python every single day.

According to research by Dice Python is also one of the hottest skills to have and the most popular programming language in the world based on the Popularity of Programming Language Index.

Due to the popularity and widespread use of Python as a programming language, Python developers are sought after and paid well. If you’d like to dig deeper into Python salary statistics and job opportunities, you can do so here.

Python is Interpreted

Many languages are compiled, meaning the source code you create needs to be translated into machine code, the language of your computer’s processor, before it can be run. Programs written in an interpreted language are passed straight to an interpreter that runs them directly.

This makes for a quicker development cycle because you just type in your code and run it, without the intermediate compilation step.

One potential downside to interpreted languages is execution speed. Programs that are compiled into the native language of the computer processor tend to run more quickly than interpreted programs. For some applications that are particularly computationally intensive, like graphics processing or intense number crunching, this can be limiting.

In practice, however, for most programs, the difference in execution speed is measured in milliseconds, or seconds at most, and not appreciably noticeable to a human user. The expediency of coding in an interpreted language is typically worth it for most applications.

Further reading: See this Wikipedia page to read more about the differences between interpreted and compiled languages.

Python is Free

The Python interpreter is developed under an OSI-approved open-source license, making it free to install, use, and distribute, even for commercial purposes.

A version of the interpreter is available for virtually any platform there is, including all flavors of Unix, Windows, macOS, smartphones and tablets, and probably anything else you ever heard of. A version even exists for the half dozen people remaining who use OS/2.

Python is Portable

Because Python code is interpreted and not compiled into native machine instructions, code written for one platform will work on any other platform that has the Python interpreter installed. (This is true of any interpreted language, not just Python.)

Python is Simple

As programming languages go, Python is relatively uncluttered, and the developers have deliberately kept it that way.

A rough estimate of the complexity of a language can be gleaned from the number of keywords or reserved words in the language. These are words that are reserved for special meaning by the compiler or interpreter because they designate specific built-in functionality of the language.

Python 3 has 33 keywords, and Python 2 has 31. By contrast, C++ has 62, Java has 53, and Visual Basic has more than 120, though these latter examples probably vary somewhat by implementation or dialect.

Python code has a simple and clean structure that is easy to learn and easy to read. In fact, as you will see, the language definition enforces code structure that is easy to read.

But It’s Not That Simple

For all its syntactical simplicity, Python supports most constructs that would be expected in a very high-level language, including complex dynamic data types, structured and functional programming, and object-oriented programming.

Additionally, a very extensive library of classes and functions is available that provides capability well beyond what is built into the language, such as database manipulation or GUI programming.

Python accomplishes what many programming languages don’t: the language itself is simply designed, but it is very versatile in terms of what you can accomplish with it.


This section gave an overview of the Python programming language, including:

Python is a great option, whether you are a beginning programmer looking to learn the basics, an experienced programmer designing a large application, or anywhere in between. The basics of Python are easily grasped, and yet its capabilities are vast.

Proceed to the next section to learn how to acquire and install Python on your computer.

Don't miss the follow up tutorial: Click here to join the Real Python Newsletter and you'll know when the next instalment comes out.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

May 21, 2018 02:00 PM

Caktus Consulting Group

PyCon 2018 Recap

Making connections

Before the conference, our team listed “making connections” as one of the main reasons to attend PyCon. We certainly did that, welcoming visitors to the booth and catching up with friends old and new.

Ultimate Tic Tac Toe returned with an upgraded AI to play against. It was a tough one to beat this year! We had a couple of people achieve victory, though.

Winner of Ultimate Tic Tac Toe in front of the Caktus booth at PyCon 2018.

We also gave away two Raspberry Pi 3 kits to lucky winners.


Learning from fellow Pythonistas is another reason our team loves going to PyCon. The keynotes were highlighted as particularly engaging, although there were many mentioned by talk attendees on Twitter. Here are a few:

Look out for the 2018 edition of our PyCon Must-See series, coming soon!

PyLadies auction

The PyLadies auction sold out this year for the first time. Bidding was hot for items ranging from Tesla coil music-makers to cross-stitch samplers and limited-edition prints.

The sold-out room at the PyLadies auction.

Cakti love to support the larger community and this year we were excited to donate an item to the PyLadies auction. This luxurious handwoven scarf, created by a member of the Caktus team, will let its new owner represent Python in style. Thank you to the buyer for supporting PyLadies!

Python-themed scarf, hand-woven by Elizabeth Chabot for the PyLadies auction.

Long live Python

It was another great year at PyCon! Thanks to all of the Python community for participating, and extra thanks to the organizers and volunteers. We appreciate all that you do!

May 21, 2018 01:30 PM


Dynamic Programming - Lifeline of Technical interviews - 2

Discussing solution to some previously asked dynamic programming problems.

May 21, 2018 11:10 AM

Talk Python to Me

#162 Python in Building and Architecture

You often hear about architecture in software. This could be things like microservices, 3-tier apps, or even the dreaded client-server mainframe app. But this episode, we're turning this on its head: It's software in architecture and real-world construction projects with Mark Mendez.

May 21, 2018 08:00 AM

Mike Driscoll

PyDev of the Week: Kai Willadsen

This week we welcome Kai Willadsen (@kywe) as our PyDev of the Week! He is the maintainer of the Meld project, a cross-platform visual diff and merge tool written in Python. You can catch up with Kai on his blog or see what else he is working on via Github. Let’s take a few moments to get to know Kai better.

Can you tell us a little about yourself (hobbies, education, etc):

I did undergrad in computer science + cognitive science, a PhD in complex systems modeling, and a variety of post-doc work before bailing on the academic life. None of the above is even remotely relevant to my current work though!

My non-computer hobbies are basically gardening & chicken keeping. For people in the position to do so: if you’ve never kept chickens, think about it! They are the best.

Why did you start using Python?

Way back during my PhD studies I’d coded my model in C++, and that was fine. However, I got to a point where I needed to be able to try out different model scenarios and experiment with analyses, and writing that all in C++ just got to be too slow. In a few days I got SWIG to generate Python bindings for my model code and soon I was writing anything that wasn’t performance critical in Python. Pretty soon, that was all I was using.

What other programming languages do you know and which is your favorite?

In the distant past I spent real time with C++ (shudder) and did a fair bit with Java and C. I don’t know if I really have a favourite non-Python language; Rust is what I’m currently most interested in, but Go has its place, and C is always… basically pretty okay.

What projects are you working on now?

Almost all my personal coding time is spent on Meld at the moment. I want to get back to a side project I had for generating Python 3.6 typing annotations for GObject introspected libraries (like GTK+). I’ve also started on a simple GTK+ ChromeCast client in Rust, but that’s mostly because I wanted an excuse to learn Rust… and it’s a slow process.

Which Python libraries are your favorite (core or 3rd party)?

I find that the standard library can be a bit hard to love. It’s not that it’s bad! it’s just that a lot of the time it’s not *great*, often due to compatibility considerations and similar constraints. Having said that, the new and shiny `pathlib` is (as of Python 3.6) pretty good.

As for third-party libraries, the pygobject project maintains the introspection bindings that make Python + GTK+ actually work, and I feel that it’s an under-appreciated project. It’s a tricky job they’re doing, but feels good to work with that API. Other libraries that I reach for all the time include Click (nice command line clients), Werkzeug (funky things with HTTP), SQLAlchemy (anything touching a DB) and Cython (my Python isn’t quite fast enough).

How did the meld project come about?

I can’t answer this, because I didn’t start it! The original author was Stephen Kennedy. I’ve been the maintainer for the last decade or so… which tells you something about how old Meld is as a codebase.

What top three things have you learned contributing to open source projects like meld?

Firstly, unless you’ve actually maintained a project, you’d be shocked at how much time all the non-coding tasks take up. Even minor things like monitoring a fairly quiet mailing list and making sure that questions get answered takes real time when you only get a handful of hours a week to spend. Just writing the release notes when putting a new version out can take hours.

Second, growing a real community around a project is a skill, and most people don’t have it (I don’t!). If you have someone who is good at this, treasure them.

Third, you’d like to think that users file bugs, but… they don’t. It’s not uncommon to find that e.g., some version control system I don’t use has been broken for a year and nobody said anything. I suspect this is even more true of applications than of libraries.

Thanks you doing the interview!

May 21, 2018 05:05 AM


General Data Protection Regulation

Ying Li’s PyCon2018 keynote discussed the importance of writing secure software, and the responsibility that we, as developers, have in keeping users safe. While watching, it occurred to me that I haven’t written about the new European Union laws that stem from the General Data Protection Regulation (GDPR) that goes into effect this month. I was involved in several conversations on this topic lately, so here’s some information on the rules, their implications and responses from businesses, and the reactions as the rest of the world tries to implement similar systems.

May 21, 2018 04:00 AM

May 20, 2018


Python: party with Strings

Simplify string processing in Python 3.6.5 and later

May 20, 2018 09:36 PM

Full Stack Python

Monitoring Django Projects with Rollbar

One fast way to scan for exceptions and errors in your Django web application projects is to add a few lines of code to include a hosted monitoring tool.

In this tutorial we will learn to add the Rollbar monitoring service to a web app to visualize any issues produced by our web app. This tutorial will use Django as the web framework to build the web application but there are also tutorials for the Flask and Bottle frameworks as well. You can also check out a list of other hosted and open source tools on the monitoring page.

Our Tools

Python 3 is strongly recommended for this tutorial because Python 2 will no longer be supported starting January 1, 2020. Python 3.6.4 to was used to build this tutorial. We will also use the following application dependencies to build our application:

If you need help getting your development environment configured before running this code, take a look at this guide for setting up Python 3 and Django on Ubuntu 16.04 LTS.

All code in this blog post is available open source on GitHub under the MIT license within the monitor-python-django-apps directory of the blog-code-examples repository. Use and modify the code however you like for your own applications.

Installing Dependencies

Start the project by creating a new virtual environment using the following command. I recommend keeping a separate directory such as ~/venvs/ so that you always know where all your virtualenvs are located.

python3 -m venv monitordjango

Activate the virtualenv with the activate shell script:

source monitordjango/bin/activate

The command prompt will change after activating the virtualenv:

Activate the virtualenv on the command line.

Remember that you need to activate your virtualenv in every new terminal window where you want to use the virtualenv to run the project.

We can now install the Django and Rollbar packages into the activated, empty virtualenv.

pip install django==2.0.4 rollbar==0.13.18

Look for output like the following to confirm the dependencies installed correctly.

Collecting certifi>=2017.4.17 (from requests>=0.12.1->rollbar==0.13.18)
  Downloading certifi-2018.1.18-py2.py3-none-any.whl (151kB)
    100% |████████████████████████████████| 153kB 767kB/s 
Collecting urllib3<1.23,>=1.21.1 (from requests>=0.12.1->rollbar==0.13.18)
  Using cached urllib3-1.22-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests>=0.12.1->rollbar==0.13.18)
  Using cached chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.7,>=2.5 (from requests>=0.12.1->rollbar==0.13.18)
  Using cached idna-2.6-py2.py3-none-any.whl
Installing collected packages: pytz, django, certifi, urllib3, chardet, idna, requests, six, rollbar
  Running install for rollbar ... done
Successfully installed certifi-2018.1.18 chardet-3.0.4 django-2.0.4 idna-2.6 pytz-2018.3 requests-2.18.4 rollbar-0.13.18 six-1.11.0 urllib3-1.22

We have our dependencies ready to go so now we can write the code for our Django project.

Our Django Web App

Django makes it easy to generate the boilerplate code for new projects and apps using the commands. Go to the directory where you typically store your coding projects. For example, on my Mac I use /Users/matt/devel/py/. Then run the following command to start a Django project named djmonitor: startproject djmonitor

The command will create a directory named djmonitor with several subdirectories that you should be familiar with when you've previously worked with Django.

Change directories into the new project.

cd djmonitor

Start a new Django app for our example code.

python startapp billions

Django will create a new folder named billions for our project. Let's make sure our Django URLS work properly before before we write the code for the app.

Now open djmonitor/djmonitor/ and add the highlighted lines so that URLs with the path /billions/ will be routed to the app we are working on.

""" (comments section)
~~from django.conf.urls import include
from django.contrib import admin
from django.urls import path

urlpatterns = [
~~    path('billions/', include('billions.urls')),

Save djmonitor/djmonitor/ and open djmonitor/djmonitor/ Add the billions app to by inserting the highlighted line, which will become line number 40 after insertion:

# Application definition

~~    'billions',

Save and close

Reminder: make sure you change the default DEBUG and SECRET_KEY values in before you deploy any code to production. Secure your app properly with the information from the Django production deployment checklist so that you do not add your project to the list of hacked applications on the web.

Next change into the djmonitor/billions directory. Create a new file named that will be specific to the routes for the billions app within the djmonitor project.

Add the following lines to the currently-blank djmonitor/billions/ file.

from django.conf.urls import url                                                                                                                              
from . import views

urlpatterns = [ 
    url(r'(?P<slug>[\wa-z-]+)', views.they, name="they"),

Save djmonitor/billions/ One more file before we can test that our simple Django app works. Open djmonitor/billions/

from django.core.exceptions import PermissionDenied
from django.shortcuts import render

def they(request, slug):
    if slug and slug == "are":
        return render(request, 'billions.html', {})
        raise PermissionDenied("Hmm, can't find what you're looking for.")

Create a directory for your template files named templates under the djmonitor/billions app directory.

mkdir templates

Within templates create a new file named billions.html that contains the following Django template markup.

<!DOCTYPE html>
    <title>They... are BILLIONS!</title>
    <h1><a href="">They Are Billions</a></h1>
    <img src="">

Alright, all of our files are in place so we can test the application. Within the base directory of your project run the Django development server:

python runserver

The Django development server will start up with no issues other than an unapplied migrations warning.

(monitordjango) $ python runserver
Performing system checks...

System check identified no issues (0 silenced).

You have 14 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python migrate' to apply them.

April 08, 2018 - 19:06:44
Django version 2.0.4, using settings 'djmonitor.settings'
Starting development server at
Quit the server with CONTROL-C.

Only the /billions/ route will successfully hit our billions app. Try to access "http://localhost:8000/billions/are/". We should see our template render with the gif:

Testing local development server at /billions/are/.

Cool, our application successfully rendered a super-simple HTML page with a GIF of one of my favorite computer games. What if we try another path under /billions/ such as "http://localhost:8000/billions/arenot/"?

403 Forbidden error with any path under /billions/ other than /billions/are/.

Our 403 Forbidden is raised, which is what we expected based on our code. That is a somewhat contrived block of code but let's see how we can catch and report this type of error without changing our code at all. This approach will be much easier on us when modifying an existing application than having to refactor the code to report on these types of errors, if we even know where they exist.

Monitoring with Rollbar

Go to the Rollbar homepage in your browser to add their tool to our Django app. in Chrome.

Click the "Sign Up" button in the upper right-hand corner. Enter your email address, a username and the password you want on the sign up page.

Sign up for Rollbar.

After the sign up page you will see the onboarding flow where you can enter a project name and select a programming language. For the project name type in "Full Stack Python" (or whatever project name you are working on) then select that you are monitoring a Python-based application.

Create a project named 'Full Stack Python' and select Python for programming language.

Press the "Continue" button at the bottom to move along. The next screen shows us a few instructions on how to add monitoring.

Configure project using your server-side access token.

Let's change our Django project code to let Rollbar collect and aggregate the errors that pop up in our application.

Re-open djmonitor/djmonitor/ and look for the MIDDLEWARE list. Add rollbar.contrib.django.middleware.RollbarNotifierMiddleware as the last item:

~~    'rollbar.contrib.django.middleware.RollbarNotifierMiddleware',

Do not close just yet. Next add the following lines to the bottom of the file. Change the access_token value to your Rollbar server side access token and root to the directory where you are developing your project.

    'access_token': 'access token from dashboard',
    'environment': 'development' if DEBUG else 'production',
    'branch': 'master',
    'root': '/Users/matt/devel/py/blog-code-examples/monitor-django-apps/djmonitor',
    'patch_debugview': False,

If you are uncertain about what your secret token is, it can be found on the Rollbar onboarding screen or "Settings" -> "Access Tokens" within

Note that I typically store all my environment variables in a .env

We can test that Rollbar is working as we run our application. Run it now using the development server.

python runserver

Back in your web browser press the "Done! Go to Dashboard" button.

If an event hasn't been reported yet we'll see a waiting screen like this one:

Waiting for events data on the dashboard.

Make sure your Django development still server is running and try to go to "http://localhost:8000/billions/arenot/". A 403 error is immediately reported on the dashboard:

403 Forbidden exceptions on the Rollbar dashboard screen.

We even get an email with the error (which can also be turned off if you don't want emails for every error):

Email report on the errors in your Django application.

Alright we now have monitoring and error reporting all configured for our Django application!

What now?

We learned to catch issues in our Django project using Rollbar and view the errors in Rollbar's interface. Next try out Rollbar's more advanced monitoring features such as:

There is plenty more to learn about in the areas of web development and deployments so keep learning by reading about web frameworks. You can also learn more about integrating Rollbar with Python applications via their Python documentation.

Questions? Let me know via a GitHub issue ticket on the Full Stack Python repository, on Twitter @fullstackpython or @mattmakai.

Do you see a typo, syntax issue or wording that's confusing in this blog post? Fork this page's source on GitHub and submit a pull request with a fix or file an issue ticket on GitHub.

May 20, 2018 04:00 AM

May 19, 2018


How do I get started with Python?

Getting started with Python made easy

May 19, 2018 07:59 PM

Weekly Python StackOverflow Report

(cxxvi) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2018-05-19 16:55:29 GMT

  1. What is the best way to interleave two lists? - [17/6]
  2. Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series - [16/1]
  3. Python logic help - Print the top "n" users - [11/6]
  4. Why are attributes lost after copying a Pandas DataFrame - [11/4]
  5. How does Python convert bytes into float? - [11/1]
  6. Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys - [9/7]
  7. How does numpy addition work? - [7/3]
  8. Compare values of a dictionary and return a count of matching values - [7/3]
  9. Why does isinstance return the wrong value only inside a series map? - [7/2]
  10. Is it possible to limit the number of coroutines running corcurrently in asyncio? - [6/2]

May 19, 2018 04:55 PM


EuroPython 2018:  Call for Proposals closes on Sunday

We would like to remind you that our two week call for proposals (CFP) closes on Sunday.


Submit your proposal !

Submissions will be open until Sunday, May 20.

If you’d like to submit a talk, please see our CFP announcement for details:

Submissions are possibe via the CFP page on the EuroPython 2018 website:

We’d like to invite everyone to submit proposals for talks, trainings, panels, etc. Looking at the submission counts, we are especially looking for more trainings submissions (note that you get a free conference ticket and training pass as trainer of a selected training).

Submissions will then go into a talk voting phase where EuroPython attendees of previous years can vote on the submissions. The program work group will then use these votes as basis for the final talk selection and scheduling.

We expect to have the schedule available by end of May.

Please help us spread this reminder by sharing it on your social networks as widely as possible. Thank you !


EuroPython 2018 Team

May 19, 2018 02:14 PM

Python Debugging Tips

When it comes to debugging, there’s a lot of choices that you can make. It is hard to give generi

May 19, 2018 12:00 AM

May 18, 2018

Yasoob Khalid

Python local/global scopes

How’s everyone? I am back with another tip/gotcha which might trip beginner programmers. This one relates to Python scopes so if you are already familiar with them then this article might not be very informative for you. If you are not very well versed in Python scopes then keep reading.

Lets start with a code sample. Save the following code in a file and run it:

command = "You are a LOVELY person!"

def shout():



You are a LOVELY person!
You are a LOVELY person!

Perfect. Working as expected. Now modify it a bit and run the modified code:

command = "You are a LOVELY person!"

def shout():
    command = "HI!"



You are a LOVELY person!

Amazing! Still working fine. Now modify it a tad bit more and run the new code:

command = "You are a LOVELY person!"

def shout():
    command = "HI!"
    command = "You are amazing!!"



You are a LOVELY person!

Umm the output is not as intuitively expected but lets make one last change before we discuss it:

command = "You are a LOVELY person!"

def shout():
    command = "You are amazing!!"



Traceback (most recent call last):
  File "", line 8, in <module>
  File "", line 4, in shout
UnboundLocalError: local variable 'command' referenced before assignment

Woah! What’s that? We do have command declared and initialised in the very first line of our file. This might stump a lot of beginner Python programmers. Once I was also confused about what was happening. However, if you are aware of how Python handles variable scopes then this shouldn’t be new for you.

In the last example which worked fine, a lot of beginners might have expected the output to be:

You are amazing!!

The reason for expecting that output is simple. We are modifying the command variable and giving it the value “You are amazing!!” within the function. It doesn’t work as expected because we are modifying the value of command in the scope of the shout function. The modification stays within that function. As soon as we get out of that function into the global scope, command points to its previous global value.

When we are accessing the value of a variable within a function and that variable is not defined in that function, Python assumes that we want to access the value of a global variable with that name. That is why this piece of code works:

command = "You are a LOVELY person!"

def shout():


However, if we modify the value of the variable or change its assignment in an ambiguous way, Python gives us an error. Look at this previous code:

command = "You are a LOVELY person!"

def shout():
    command = "You are amazing!!"


The problem arises when Python searches for command in the scope of shout and finds it declared and initialized AFTER we are trying to print its value. At that moment, Python doesn’t know which command‘s value we want to print.

We can fix this problem by explicitly telling Python that we want to print the value of the global command and we want to edit that global variable. Edit your code like this:

command = "You are a LOVELY person!"

def shout():
    global command
    command = "You are amazing!!"



You are a LOVELY person!
You are amazing!!

Normally, I try as much as possible to stay away from global variables because if you aren’t careful then your code might give you unexpected outputs. Only use global when you know that you can’t get away with using return values and class variables.

Now, you might ask yourself why does Python “assume” that we are referring to the global variable instead of throwing an error whenever a variable isn’t defined in the function scope? Python docs give a beautiful explanation:

What are the rules for local and global variables in Python?

In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function’s body, it’s assumed to be a local unless explicitly declared as global.

Though a bit surprising at first, a moment’s consideration explains this. On one hand, requiring global for assigned variables provides a bar against unintended side-effects. On the other hand, if global was required for all global references, you’d be using global all the time. You’d have to declare as global every reference to a built-in function or to a component of an imported module. This clutter would defeat the usefulness of the global declaration for identifying side-effects.


I hope this was informative for you. Let me know if you have ever faced this issue before in the comments below.


May 18, 2018 11:32 PM